string function - "format" function proposal

Started by Pavel Stehuleover 15 years ago48 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I am returning back to string functions. For me, the most important
function isn't commited still. There was discussion about "format" or
"sprintf" fuction. So I'll do a small resume.

goal: to get function that helps with formatting a message texts and
helps with building a SQL commands (used as dynamic SQL)

propsals:

* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Itagaki objectives to "format" function:
* there are not possibility put two parameters without a space between
* it is too simple

My objectives to sprintf function:
* it is designed to different environment than SQL - missing support
NULL, missing support for date, timestamp, boolean, ...
* it is too complex, some parameters has different meaning for different tags
* we have a "to_char" function for complex formatting now.

Now I propose a compromise - "format" function with only three tags:

%s .. some string
%i .. SQL identifier
%l .. string literal

using a NULL:

for %s NULL is transformed to empty string - like "concat"
for %i NULL raises an exception
for %l NULL is transformed to ' NULL ' string.

This system is still simple and enough.

Implemented sprintf function can be moved to "sprintf" contrib module.

comments

Regards

Pavel Stehule

#2Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#1)
Re: string function - "format" function proposal

On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i  .. SQL identifier
%l  .. string literal

These are just ideas:

* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')

* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.

New languages' libraries might be of some help. LLs, C#, etc.

--
Itagaki Takahiro

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#2)
Re: string function - "format" function proposal

2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i  .. SQL identifier
%l  .. string literal

These are just ideas:

* Use $n, as like as PREPARE command.
 It allows for us to swap arguments in any order.
 SELECT format('$2 before $1', 'aaa', 'bbb')

what is use case for this feature? I don't see it.

* Call to_char() functions for each placeholder.
 For example,
   format('=={YYYY-MM-DD}==', tm::timestamp)
 is equivalent to
   '==' || to_char(tm, 'YYYY-MM-DD') || '=='
 '{}' prints the input with the default format.

New languages' libraries might be of some help. LLs, C#, etc.

I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?

Regards

Pavel

Show quoted text

--
Itagaki Takahiro

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Pavel Stehule (#3)
Re: string function - "format" function proposal

Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:

2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i  .. SQL identifier
%l  .. string literal

These are just ideas:

* Use $n, as like as PREPARE command.
 It allows for us to swap arguments in any order.
 SELECT format('$2 before $1', 'aaa', 'bbb')

what is use case for this feature? I don't see it.

Translations :-) I haven't had a use for that but I've heard people
implements gettext of sorts in database tables. Maybe that kind of
thing would be of use here.

* Call to_char() functions for each placeholder.
 For example,
   format('=={YYYY-MM-DD}==', tm::timestamp)
 is equivalent to
   '==' || to_char(tm, 'YYYY-MM-DD') || '=='
 '{}' prints the input with the default format.

New languages' libraries might be of some help. LLs, C#, etc.

I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?

What about %{sth}? That way you don't need to escape {. The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08} So the above example is:

format('==%{YYYY-MM-DD}==', tm::timestamp);

Not sure about this to_char stuff though, seems too cute. You can do
the case above like this:

format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#4)
Re: string function - "format" function proposal

2010/8/30 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:

2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i  .. SQL identifier
%l  .. string literal

These are just ideas:

* Use $n, as like as PREPARE command.
 It allows for us to swap arguments in any order.
 SELECT format('$2 before $1', 'aaa', 'bbb')

what is use case for this feature? I don't see it.

Translations :-)  I haven't had a use for that but I've heard people
implements gettext of sorts in database tables.  Maybe that kind of
thing would be of use here.

* Call to_char() functions for each placeholder.
 For example,
   format('=={YYYY-MM-DD}==', tm::timestamp)
 is equivalent to
   '==' || to_char(tm, 'YYYY-MM-DD') || '=='
 '{}' prints the input with the default format.

New languages' libraries might be of some help. LLs, C#, etc.

I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?

What about %{sth}?  That way you don't need to escape {.  The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08}  So the above example is:

then you need escaping too :)

format('==%{YYYY-MM-DD}==', tm::timestamp);

I am not sure if this is correct -but why not

so there are possible combinations

%s .. no quoting, NULL is ''
%{} .. no quoting, NULL is NULL .. like output from to_char
%{}s .. no quoting with formatting, NULL is ''

now I have not idea about nice syntax for positional parameters - maybe
%{...}$1s or we can use a two variants for tags - not positional '%'
and positional '%', so
$1{...}s, %{...}s, $1, %s, $1s, $1{...}, %{...} can be valid tags

Regards

Pavel Stehule

Not sure about this to_char stuff though, seems too cute.  You can do
the case above like this:

format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))

I like an using a format like tag - there are not technical problem -
format can be taken from string and data type parameter can be known
too. But this feature can be some enhancing. The basic features are
NULL handling and right quoting.

Show quoted text

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#5)
Re: string function - "format" function proposal

Hello

attached WIP patch.

I implement only basic format's tags related to SQL: string, value,
literal, sql identifier. These tags are basic, but there are not any
break to implement any other formats or enhance a syntax. The mix with
to_char function is more complex then I expected - so I don't thinking
about it for now (there are more then one to_char function).

I don't found a nice mix for placeholders and positional placeholders
- so I propose a new special function "substitute" (in contrib) where
placeholders are positional. More - we check in function "format" if
all parameters are used - this check isn't related to positional
placeholders, this is reason for separate implementation too:

so some examples:

postgres=# select substitute('second parameter is "$2" and first
parameter is "$1"', 'first parameter', 'second parameter');
substitute
─────────────────────────────────────────────────────────────────────────────────
second parameter is "second parameter" and first parameter is "first parameter"
(1 row)

postgres=# select format('INSERT INTO %i (c1, c2, c3, c4) VALUES
(%v,%v,%v,%v)', 'my tab',1, NULL, true, 'hello');
format
────────────────────────────────────────────────────────────────
INSERT INTO "my tab" (c1, c2, c3, c4) VALUES (1,NULL,t,'hello')
(1 row)

postgres=# select format('SQL identifier %i cannot be a NULL', NULL);
ERROR: SQL identifier cannot be a NULL

postgres=# select format('NULL is %v or empty string "%s"', NULL, NULL);
format
─────────────────────────────────
NULL is NULL or empty string ""
(1 row)

%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literal

I hope so this system is clean, simple, readable and extensible

Regards

Pavel

2010/8/30 Pavel Stehule <pavel.stehule@gmail.com>:

Show quoted text

2010/8/30 Alvaro Herrera <alvherre@commandprompt.com>:

Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:

2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function

Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i  .. SQL identifier
%l  .. string literal

These are just ideas:

* Use $n, as like as PREPARE command.
 It allows for us to swap arguments in any order.
 SELECT format('$2 before $1', 'aaa', 'bbb')

what is use case for this feature? I don't see it.

Translations :-)  I haven't had a use for that but I've heard people
implements gettext of sorts in database tables.  Maybe that kind of
thing would be of use here.

* Call to_char() functions for each placeholder.
 For example,
   format('=={YYYY-MM-DD}==', tm::timestamp)
 is equivalent to
   '==' || to_char(tm, 'YYYY-MM-DD') || '=='
 '{}' prints the input with the default format.

New languages' libraries might be of some help. LLs, C#, etc.

I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?

What about %{sth}?  That way you don't need to escape {.  The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08}  So the above example is:

then you need escaping too :)

format('==%{YYYY-MM-DD}==', tm::timestamp);

I am not sure if this is correct -but why not

so there are possible combinations

%s   .. no quoting, NULL is ''
%{}  .. no quoting, NULL is NULL .. like output from to_char
%{}s .. no quoting with formatting, NULL is ''

now I have not idea about nice syntax for positional parameters - maybe
%{...}$1s or we can use a two variants for tags - not positional '%'
and positional '%', so
$1{...}s, %{...}s, $1, %s, $1s, $1{...}, %{...} can be valid tags

Regards

Pavel Stehule

Not sure about this to_char stuff though, seems too cute.  You can do
the case above like this:

format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))

I like an using a format like tag - there are not technical problem -
format can be taken from string and data type parameter can be known
too. But this feature can be some enhancing. The basic features are
NULL handling and right quoting.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7A.M.
agentm@themactionfaction.com
In reply to: Pavel Stehule (#6)
Re: string function - "format" function proposal

On Aug 31, 2010, at 5:07 PM, Pavel Stehule wrote:

Hello

attached WIP patch.

I implement only basic format's tags related to SQL: string, value,
literal, sql identifier. These tags are basic, but there are not any
break to implement any other formats or enhance a syntax. The mix with
to_char function is more complex then I expected - so I don't thinking
about it for now (there are more then one to_char function).

<snip>

It would be pretty handy if plpgsql EXECUTE could operate like this with USING to support identifiers.

Cheers,
M

#8David Fetter
david@fetter.org
In reply to: Pavel Stehule (#6)
Re: string function - "format" function proposal

On Tue, Aug 31, 2010 at 11:07:40PM +0200, Pavel Stehule wrote:

Hello

attached WIP patch.

I don't see it attached. Is it just me?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#9Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#6)
Re: string function - "format" function proposal

On Wed, Sep 1, 2010 at 6:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I don't found a nice mix for placeholders and positional placeholders

How about %pos$format, used in C-printf()? It might be
only in Linux's libc.

printf("<%2$s> <%1$d>\n", 123, "abc");
=> <abc> <123>
http://linux.die.net/man/3/printf

%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literal

Looks good designed. I have a couple of comments and questions:

* There is no examples for %l. What's the difference from %v and %s?
If it always quotes, how does it work? Like as quote_literal()
or quote_nullable()?

* %v quotes text values (and maybe all non-numeric values) with
single quotes, but doesn't numeric values. How do we determine
the difference? By type oid?

* %v also doesn't quote boolean values, but t and f are not valid.
You should use true and false (or 't' and 'f') for the cases.
(So, your "INSERT INTO" example is broken.)

--
Itagaki Takahiro

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: David Fetter (#8)
1 attachment(s)
Re: string function - "format" function proposal

2010/9/1 David Fetter <david@fetter.org>:

On Tue, Aug 31, 2010 at 11:07:40PM +0200, Pavel Stehule wrote:

Hello

attached WIP patch.

I don't see it attached.  Is it just me?

sorry, it was at 1 ofter midnight

Regards

Pavel

Show quoted text

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

ft02.difftext/x-patch; charset=US-ASCII; name=ft02.diffDownload
*** ./src/backend/tsearch/dict_ispell.c.orig	2010-08-23 09:16:49.000000000 +0200
--- ./src/backend/tsearch/dict_ispell.c	2010-08-31 23:46:00.178669635 +0200
***************
*** 37,113 ****
  				dictloaded = false,
  				stoploaded = false;
  	ListCell   *l;
  
  	d = (DictISpell *) palloc0(sizeof(DictISpell));
  
! 	foreach(l, dictoptions)
  	{
! 		DefElem    *defel = (DefElem *) lfirst(l);
! 
! 		if (pg_strcasecmp(defel->defname, "DictFile") == 0)
  		{
! 			if (dictloaded)
  				ereport(ERROR,
  						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 						 errmsg("multiple DictFile parameters")));
! 			NIImportDictionary(&(d->obj),
! 							 get_tsearch_config_filename(defGetString(defel),
! 														 "dict"));
! 			dictloaded = true;
  		}
! 		else if (pg_strcasecmp(defel->defname, "AffFile") == 0)
  		{
! 			if (affloaded)
! 				ereport(ERROR,
! 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 						 errmsg("multiple AffFile parameters")));
! 			NIImportAffixes(&(d->obj),
! 							get_tsearch_config_filename(defGetString(defel),
! 														"affix"));
! 			affloaded = true;
  		}
! 		else if (pg_strcasecmp(defel->defname, "StopWords") == 0)
  		{
! 			if (stoploaded)
! 				ereport(ERROR,
! 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 						 errmsg("multiple StopWords parameters")));
! 			readstoplist(defGetString(defel), &(d->stoplist), lowerstr);
! 			stoploaded = true;
  		}
  		else
  		{
  			ereport(ERROR,
  					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 					 errmsg("unrecognized Ispell parameter: \"%s\"",
! 							defel->defname)));
  		}
! 	}
  
- 	if (affloaded && dictloaded)
- 	{
- 		NISortDictionary(&(d->obj));
- 		NISortAffixes(&(d->obj));
- 	}
- 	else if (!affloaded)
- 	{
- 		ereport(ERROR,
- 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- 				 errmsg("missing AffFile parameter")));
- 	}
- 	else
- 	{
- 		ereport(ERROR,
- 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- 				 errmsg("missing DictFile parameter")));
  	}
  
  	MemoryContextDeleteChildren(CurrentMemoryContext);
  	
  	MemoryContextStats(CurrentMemoryContext);
  	
  	
- 
  	PG_RETURN_POINTER(d);
  }
  
--- 37,132 ----
  				dictloaded = false,
  				stoploaded = false;
  	ListCell   *l;
+ 	int i;
  
  	d = (DictISpell *) palloc0(sizeof(DictISpell));
+ 	
+ 	d->obj.stream = fopen("/tmp/xxx.ft", "r");
+ 	d->obj.mode = 'r';
  
! 	if (d->obj.mode == 'r')
  	{
! 		readSPDict(d->obj.stream, &d->obj);
! 		readAffix(d->obj.stream, &d->obj);
! 		postProcessAffixes(&d->obj);
! 		readStopList(d->obj.stream, &d->stoplist);
! 	}
! 	else
! 	{
! 		foreach(l, dictoptions)
  		{
! 			DefElem    *defel = (DefElem *) lfirst(l);
! 
! 			if (pg_strcasecmp(defel->defname, "DictFile") == 0)
! 			{
! 				if (dictloaded)
! 					ereport(ERROR,
! 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 							 errmsg("multiple DictFile parameters")));
! 				NIImportDictionary(&(d->obj),
! 								 get_tsearch_config_filename(defGetString(defel),
! 															 "dict"));
! 				dictloaded = true;
! 			}
! 			else if (pg_strcasecmp(defel->defname, "AffFile") == 0)
! 			{
! 				if (affloaded)
! 					ereport(ERROR,
! 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 							 errmsg("multiple AffFile parameters")));
! 				NIImportAffixes(&(d->obj),
! 								get_tsearch_config_filename(defGetString(defel),
! 															"affix"));
! 				affloaded = true;
! 			}
! 			else if (pg_strcasecmp(defel->defname, "StopWords") == 0)
! 			{
! 				if (stoploaded)
! 					ereport(ERROR,
! 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 							 errmsg("multiple StopWords parameters")));
! 				readstoplist(defGetString(defel), &(d->stoplist), lowerstr);
! 				stoploaded = true;
! 				
! 			}
! 			else
! 			{
  				ereport(ERROR,
  						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 						 errmsg("unrecognized Ispell parameter: \"%s\"",
! 								defel->defname)));
! 			}
  		}
! 
! 		if (affloaded && dictloaded)
  		{
! 			NISortDictionary(&(d->obj));
! 			NISortAffixes(&(d->obj));
  		}
! 		else if (!affloaded)
  		{
! 			ereport(ERROR,
! 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 					 errmsg("missing AffFile parameter")));
  		}
  		else
  		{
  			ereport(ERROR,
  					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
! 					 errmsg("missing DictFile parameter")));
  		}
! 		
! 		if (d->obj.stream != NULL && d->obj.mode == 'w')
! 			outStopList(d->obj.stream, &d->stoplist);
  
  	}
  
  	MemoryContextDeleteChildren(CurrentMemoryContext);
  	
  	MemoryContextStats(CurrentMemoryContext);
  	
+ 	fclose(d->obj.stream);
  	
  	PG_RETURN_POINTER(d);
  }
  
*** ./src/backend/tsearch/spell.c.orig	2010-01-02 17:57:53.000000000 +0100
--- ./src/backend/tsearch/spell.c	2010-08-31 23:55:16.054672520 +0200
***************
*** 11,23 ****
   *
   *-------------------------------------------------------------------------
   */
- 
  #include "postgres.h"
  
  #include "tsearch/dicts/spell.h"
  #include "tsearch/ts_locale.h"
  #include "utils/memutils.h"
  
  
  /*
   * Initialization requires a lot of memory that's not needed
--- 11,26 ----
   *
   *-------------------------------------------------------------------------
   */
  #include "postgres.h"
  
  #include "tsearch/dicts/spell.h"
  #include "tsearch/ts_locale.h"
+ #include "tsearch/ts_public.h"
  #include "utils/memutils.h"
  
+ #include <stdio.h>
+ #include <time.h>
+ 
  
  /*
   * Initialization requires a lot of memory that's not needed
***************
*** 28,36 ****
--- 31,367 ----
   */
  static MemoryContext tmpCtx = NULL;
  
+ static void *prealloc_mem = NULL;
+ static Size prealloc_free_size;
+ 
+ static void checkTmpCtx(void);
+ 
  #define tmpalloc(sz)  MemoryContextAlloc(tmpCtx, (sz))
  #define tmpalloc0(sz)  MemoryContextAllocZero(tmpCtx, (sz))
  
+ #define WRITE_BINARY(buff, stream)  \
+ 	do { \
+ 		if (fwrite(&(buff), sizeof(buff), 1, stream) != 1) \
+ 			elog(ERROR, "cannot to write to prepared dictionary file"); \
+ 	} while (0);
+ 
+ #define WRITE_STRING(buff, stream) \
+ 	do { \
+ 		int len = -1; \
+ 		if ((buff) != NULL) \
+ 		{ \
+ 			int len = strlen(buff) + 1; \
+ 			WRITE_BINARY(len, stream); \
+ 			if (fwrite(buff, len, 1, stream) != 1) \
+ 				elog(ERROR, "cannot to write to prepared dictionary file"); \
+ 		} \
+ 		else \
+ 		{ \
+ 			WRITE_BINARY(len, stream); \
+ 		} \
+ 	} while (0);
+ 
+ #define WRITE_BINARY_STRING(buff, size, stream) \
+ 	do { \
+ 		if (fwrite(buff, size, 1, stream) != 1) \
+ 			elog(ERROR, "cannot to write to prepared dictionary file"); \
+ 	} while (0);
+ 
+ #define READ_BINARY(buff, stream)  \
+ 	do { \
+ 		if (fread(&(buff), sizeof(buff), 1, stream) != 1) \
+ 			elog(ERROR, "cannot to load a prepared dictionary file"); \
+ 	} while (0)
+ 
+ #define READ_STRING(target, stream)  \
+ 	do { \
+ 		int len; \
+ 		READ_BINARY(len, stream); \
+ 		if (len != -1) \
+ 		{ \
+ 			target = (char *) palloc(len); \
+ 			if (fread(target, len, 1, stream) != 1) \
+ 				elog(ERROR, "cannot to load a prepared dictionary file"); \
+ 		} \
+ 		else \
+ 			target = NULL; \
+ 	} while (0)
+ 
+ #define READ_BINARY_STRING(buff, size, stream) \
+ 	do { \
+ 		if (fread(buff, size, 1, stream) != 1) \
+ 			elog(ERROR, "cannot to load a prepared dictionary file"); \
+ 	} while(0);
+ 
+ /*
+  * spell dictionary uses a thousands SPNodes. These nodes are never
+  * individually released, so we can pass by memory context managament 
+  * and solve a interesting size of memory.
+  */
+ static SPNode *
+ allocSPNode(int nchar)
+ {
+ 	Size size = MAXALIGN(SPNHDRSZ + nchar * sizeof(SPNodeData));
+ 	void *ret;
+ 
+ 	/* use a prealloc_mem only for small requests */
+ 	if (size > ALLOCSET_DEFAULT_INITSIZE / 3)
+ 		return palloc(size);
+ 
+ 	if (prealloc_mem == NULL || size > prealloc_free_size)
+ 	{
+ 		prealloc_mem = palloc(ALLOCSET_DEFAULT_INITSIZE);
+ 		prealloc_free_size = ALLOCSET_DEFAULT_INITSIZE;
+ 	}
+ 	
+ 	Assert(prealloc_mem != NULL);
+ 	Assert(prealloc_mem == (void *) MAXALIGN(prealloc_mem));
+ 	
+ 	ret = memset(prealloc_mem, 0, size);
+ 	
+ 	/* reduce a used block from preallocated memory */
+ 	prealloc_free_size -= size;
+ 	prealloc_mem = (char *) prealloc_mem + size;
+ 	
+ 	return ret;
+ }
+ 
+ /*
+  * Parsing a spell dictionary is slow, so we must to mimimalize
+  * the number of this task. One possibility is serialisation
+  * and deseralisation of Ispell dictionary.
+  */
+ static void 
+ outSPNode(FILE *stream, SPNode *node)
+ {
+ 	int i;
+ 	uint32 length = node->length;
+ 	
+ 	WRITE_BINARY(length, stream);
+ 	
+ 	for (i = 0; i < node->length; i++)
+ 	{
+ 		SPNodeData *data = &node->data[i];
+ 		uint32 aux = data->val | data->isword <<  8 
+ 				| data->compoundflag << 9 | data->affix << 13;
+ 				
+ 		WRITE_BINARY(aux, stream);
+ 		
+ 		if (data->node)
+ 			outSPNode(stream, data->node);
+ 		else
+ 		{
+ 			length = 0;
+ 			WRITE_BINARY(length, stream);
+ 		}
+ 	}
+ }
+ 
+ static SPNode *
+ readSPNode(FILE *stream)
+ {
+ 	int i;
+ 	uint32 length;
+ 	SPNode *node;
+ 	
+ 	READ_BINARY(length, stream);
+ 	
+ 	/* there are not other node */
+ 	if (length == 0)
+ 		return NULL;
+ 	
+ 	node = allocSPNode(length);
+ 	node->length = length;
+ 	
+ 	for (i = 0; i < node->length; i++)
+ 	{
+ 		SPNodeData *data = &node->data[i];
+ 		uint32 aux;
+ 		
+ 		READ_BINARY(aux, stream);
+ 		
+ 		data->val = aux & 0xFF;
+ 		data->isword = aux >> 8 & 1;
+ 		data->compoundflag = aux >> 9 & 0xF;
+ 		data->affix = aux >> 13 & 0x7FFFF;
+ 		
+ 		data->node = readSPNode(stream);
+ 	}
+ 
+ 	return node;
+ }
+ 
+ static void 
+ outSPDict(FILE *stream, IspellDict *Conf)
+ {
+ 	int i;
+ 
+ 	WRITE_BINARY(Conf->nAffixData, stream);
+ 
+ 	for (i = 0; i < Conf->nAffixData; i++)
+ 	{
+ 		WRITE_STRING(Conf->AffixData[i], stream);
+ 	}
+ 	
+ 	outSPNode(stream, Conf->Dictionary);
+ }
+ 
+ void
+ readSPDict(FILE *stream, IspellDict *Conf)
+ {
+ 	int i;
+ 
+ 	checkTmpCtx();
+ 
+ 	READ_BINARY(Conf->nAffixData, stream);
+ 	
+ 	Conf->AffixData = (char **) palloc(Conf->nAffixData * sizeof(char *));
+ 	
+ 	for (i = 0; i < Conf->nAffixData; i++)
+ 	{
+ 		READ_STRING(Conf->AffixData[i], stream);
+ 	}
+ 	
+ 	Conf->Dictionary = readSPNode(stream);
+ }
+ 
+ static void 
+ outRegisNode(FILE *stream, RegisNode *node)
+ {
+ 	do
+ 	{
+ 		int len = node->len;
+ 		uint32 aux = node->type | node->len << 2;
+ 		
+ 		WRITE_BINARY(len, stream);
+ 		WRITE_BINARY(aux, stream);
+ 		WRITE_BINARY_STRING(&node->data, len, stream);
+ 		
+ 		node = node->next;
+ 		if (!node)
+ 		{
+ 			/* append end tag */
+ 			len = 0;
+ 			WRITE_BINARY(len, stream);
+ 		}
+ 		
+ 	} while (node != NULL);
+ }
+ 
+ static RegisNode *
+ readRegisNode(FILE *stream)
+ {
+ 	int len;
+ 	RegisNode *result = NULL;
+ 	RegisNode *node,
+ 			*prev = NULL;
+ 
+ 	do 
+ 	{
+ 		READ_BINARY(len, stream);
+ 		if (len > 0)
+ 		{
+ 			uint32 aux;
+ 		
+ 			node = (RegisNode *) palloc0(RNHDRSZ + len + 1);
+ 			if (result == NULL)
+ 				result = node;
+ 			else
+ 				prev->next = node;
+ 			
+ 			READ_BINARY(aux, stream);
+ 			node->type = aux & 3;
+ 			node->len = aux >> 2 & 65535;
+ 			READ_BINARY_STRING(node->data, len, stream);
+ 			prev = node;
+ 		}
+ 	} while (len > 0);
+ 
+ 	return result;
+ }
+ 
+ static void
+ outRegis(FILE *stream, Regis *regis)
+ {
+ 	uint32 aux = regis->issuffix | regis->nchar << 1;
+ 	
+ 	WRITE_BINARY(aux, stream);
+ 	outRegisNode(stream, regis->node);
+ }
+ 
+ static void 
+ readRegis(FILE *stream, Regis *regis)
+ {
+ 	uint32 aux;
+ 	
+ 	READ_BINARY(aux, stream);
+ 	regis->issuffix = aux & 1;
+ 	regis->nchar = aux >> 1 & 65535;
+ 	regis->node = readRegisNode(stream);
+ }
+ 
+ static void 
+ outAFFIX(FILE *stream, AFFIX *aff)
+ {
+ 	uint32 aux = aff->flag | aff->type << 8 | aff->flagflags << 9 |
+ 				aff->issimple << 16 | aff->isregis << 17 | aff->replen << 18;
+ 	
+ 	WRITE_BINARY(aux, stream);
+ 	WRITE_STRING(aff->find, stream);
+ 	WRITE_STRING(aff->repl, stream);
+ 	
+ 	if (aff->isregis)
+ 		outRegis(stream, &aff->reg.regis);
+ }
+ 
+ static void
+ readAFFIX(FILE *stream, AFFIX *aff)
+ {
+ 	uint32 aux;
+ 	
+ 	checkTmpCtx();
+ 	
+ 	READ_BINARY(aux, stream);
+ 	aff->flag = aux & 255;
+ 	aff->type = aux >> 8 & 1;
+ 	aff->flagflags = aux >> 9 & 127;
+ 	aff->issimple = aux >> 16 & 1;
+ 	aff->isregis = aux >> 17 & 1;
+ 	aff->replen = (aux >> 18) & 16383;
+ 	
+ 	READ_STRING(aff->find, stream);
+ 	READ_STRING(aff->repl, stream);
+ 	
+ 	if (aff->isregis)
+ 		readRegis(stream, &aff->reg.regis);
+ }
+ 
+ static void
+ outAffix(FILE *stream, IspellDict *Conf)
+ {
+ 	int	i;
+ 
+ 	WRITE_BINARY(Conf->naffixes, stream);
+ 	for (i = 0; i < Conf->naffixes; i++)
+ 	{
+ 		outAFFIX(stream, &Conf->Affix[i]);
+ 	}
+ }
+ 
+ void
+ readAffix(FILE *stream, IspellDict *Conf)
+ {
+ 	int i;
+ 	
+ 	READ_BINARY(Conf->naffixes, stream);
+ 	
+ 	Conf->Affix = (AFFIX *) palloc(Conf->naffixes * sizeof(AFFIX));
+ 	for (i = 0; i < Conf->naffixes; i++)
+ 	{
+ 		readAFFIX(stream, &Conf->Affix[i]);
+ 	}
+ }
+ 
  static void
  checkTmpCtx(void)
  {
***************
*** 63,68 ****
--- 394,424 ----
  	return dst;
  }
  
+ void 
+ outStopList(FILE *stream, StopList *s)
+ {
+ 	int	i;
+ 	
+ 	WRITE_BINARY(s->len, stream);
+ 	for (i = 0; i < s->len; i++)
+ 	{
+ 		WRITE_STRING(s->stop[i], stream);
+ 	}
+ }
+ 
+ void
+ readStopList(FILE *stream, StopList *s)
+ {
+ 	int i;
+ 	
+ 	READ_BINARY(s->len, stream);
+ 	s->stop = (char **) palloc(s->len * sizeof(char *));
+ 	for(i = 0; i < s->len; i++)
+ 	{
+ 		READ_STRING(s->stop[i], stream);
+ 	}
+ }
+ 
  #define MAX_NORM 1024
  #define MAXNORMLEN 256
  
***************
*** 252,258 ****
  	tsearch_readline_end(&trst);
  }
  
- 
  static int
  FindWord(IspellDict *Conf, const char *word, int affixflag, int flag)
  {
--- 608,613 ----
***************
*** 261,266 ****
--- 616,623 ----
  			   *StopHigh,
  			   *StopMiddle;
  	uint8	   *ptr = (uint8 *) word;
+ 	static int xx = 0;
+ 
  
  	flag &= FF_DICTFLAGMASK;
  
***************
*** 270,276 ****
--- 627,635 ----
  		StopHigh = node->data + node->length;
  		while (StopLow < StopHigh)
  		{
+ 
  			StopMiddle = StopLow + ((StopHigh - StopLow) >> 1);
+ 			
  			if (StopMiddle->val == *ptr)
  			{
  				if (*(ptr + 1) == '\0' && StopMiddle->isword)
***************
*** 321,326 ****
--- 680,686 ----
  	}
  
  	Affix = Conf->Affix + Conf->naffixes;
+ 	Affix->mask = pstrdup(mask);
  
  	if (strcmp(mask, ".") == 0)
  	{
***************
*** 878,884 ****
  	if (!nchar)
  		return NULL;
  
! 	rs = (SPNode *) palloc0(SPNHDRSZ + nchar * sizeof(SPNodeData));
  	rs->length = nchar;
  	data = rs->data;
  
--- 1238,1244 ----
  	if (!nchar)
  		return NULL;
  
! 	rs = allocSPNode(nchar);
  	rs->length = nchar;
  	data = rs->data;
  
***************
*** 987,992 ****
--- 1347,1358 ----
  	Conf->Dictionary = mkSPNode(Conf, 0, Conf->nspell, 0);
  
  	Conf->Spell = NULL;
+ 	
+ 	/* serialize a dictionary */
+ 	if (Conf->stream && Conf->mode == 'w') 
+ 	{
+ 		outSPDict(Conf->stream, Conf);
+ 	}
  }
  
  static AffixNode *
***************
*** 1000,1012 ****
  	int			lownew = low;
  	int			naff;
  	AFFIX	  **aff;
! 
  	for (i = low; i < high; i++)
  		if (Conf->Affix[i].replen > level && lastchar != GETCHAR(Conf->Affix + i, level, type))
  		{
  			nchar++;
  			lastchar = GETCHAR(Conf->Affix + i, level, type);
  		}
  
  	if (!nchar)
  		return NULL;
--- 1366,1380 ----
  	int			lownew = low;
  	int			naff;
  	AFFIX	  **aff;
! 	
  	for (i = low; i < high; i++)
+ 	{
  		if (Conf->Affix[i].replen > level && lastchar != GETCHAR(Conf->Affix + i, level, type))
  		{
  			nchar++;
  			lastchar = GETCHAR(Conf->Affix + i, level, type);
  		}
+ 	}
  
  	if (!nchar)
  		return NULL;
***************
*** 1092,1097 ****
--- 1460,1466 ----
  		return;
  
  	Affix->data->aff = (AFFIX **) palloc(sizeof(AFFIX *) * cnt);
+ 	
  	Affix->data->naff = (uint32) cnt;
  
  	cnt = 0;
***************
*** 1130,1135 ****
--- 1499,1555 ----
  
  	if (Conf->naffixes > 1)
  		qsort((void *) Conf->Affix, Conf->naffixes, sizeof(AFFIX), cmpaffix);
+ 
+ 	/* Serialize affix */
+ 	if (Conf->stream && Conf->mode == 'w')
+ 	{
+ 		outAffix(Conf->stream, Conf);
+ 	}
+ 		
+ 	Conf->CompoundAffix = ptr = (CMPDAffix *) palloc(sizeof(CMPDAffix) * Conf->naffixes);
+ 	ptr->affix = NULL;
+ 
+ 	for (i = 0; i < Conf->naffixes; i++)
+ 	{
+ 		Affix = &(((AFFIX *) Conf->Affix)[i]);
+ 		if (Affix->type == FF_SUFFIX && i < firstsuffix)
+ 			firstsuffix = i;
+ 
+ 		if ((Affix->flagflags & FF_COMPOUNDFLAG) && Affix->replen > 0 &&
+ 			isAffixInUse(Conf, (char) Affix->flag))
+ 		{
+ 			if (ptr == Conf->CompoundAffix ||
+ 				ptr->issuffix != (ptr - 1)->issuffix ||
+ 				strbncmp((const unsigned char *) (ptr - 1)->affix,
+ 						 (const unsigned char *) Affix->repl,
+ 						 (ptr - 1)->len))
+ 			{
+ 				/* leave only unique and minimals suffixes */
+ 				ptr->affix = Affix->repl;
+ 				ptr->len = Affix->replen;
+ 				ptr->issuffix = (Affix->type == FF_SUFFIX) ? true : false;
+ 				ptr++;
+ 			}
+ 		}
+ 	}
+ 	ptr->affix = NULL;
+ 	Conf->CompoundAffix = (CMPDAffix *) repalloc(Conf->CompoundAffix, sizeof(CMPDAffix) * (ptr - Conf->CompoundAffix + 1));
+ 
+ 	Conf->Prefix = mkANode(Conf, 0, firstsuffix, 0, FF_PREFIX);
+ 	Conf->Suffix = mkANode(Conf, firstsuffix, Conf->naffixes, 0, FF_SUFFIX);
+ 	mkVoidAffix(Conf, true, firstsuffix);
+ 	mkVoidAffix(Conf, false, firstsuffix);
+ }
+ 
+ 
+ void 
+ postProcessAffixes(IspellDict *Conf)
+ {
+ 	AFFIX	   *Affix;
+ 	size_t		i;
+ 	CMPDAffix  *ptr;
+ 	int			firstsuffix = Conf->naffixes;
+ 	
  	Conf->CompoundAffix = ptr = (CMPDAffix *) palloc(sizeof(CMPDAffix) * Conf->naffixes);
  	ptr->affix = NULL;
  
***************
*** 1172,1177 ****
--- 1592,1598 ----
  			   *StopHigh,
  			   *StopMiddle;
  	uint8 symbol;
+ 	static int xx = 0;
  
  	if (node->isvoid)
  	{							/* search void affixes */
***************
*** 1188,1199 ****
  		{
  			StopMiddle = StopLow + ((StopHigh - StopLow) >> 1);
  			symbol = GETWCHAR(word, wrdlen, *level, type);
! 
  			if (StopMiddle->val == symbol)
  			{
  				(*level)++;
  				if (StopMiddle->naff)
  					return StopMiddle;
  				node = StopMiddle->node;
  				break;
  			}
--- 1609,1622 ----
  		{
  			StopMiddle = StopLow + ((StopHigh - StopLow) >> 1);
  			symbol = GETWCHAR(word, wrdlen, *level, type);
! 			
  			if (StopMiddle->val == symbol)
  			{
  				(*level)++;
  				if (StopMiddle->naff)
+ 				{
  					return StopMiddle;
+ 				}
  				node = StopMiddle->node;
  				break;
  			}
***************
*** 1372,1378 ****
  	while (snode)
  	{
  		int			baselen = 0;
- 
  		/* find possible suffix */
  		suffix = FindAffixes(snode, word, wrdlen, &slevel, FF_SUFFIX);
  		if (!suffix)
--- 1795,1800 ----
***************
*** 1402,1408 ****
  							/* prefix success */
  							int			ff = (prefix->aff[j]->flagflags & suffix->aff[i]->flagflags & FF_CROSSPRODUCT) ?
  							0 : prefix->aff[j]->flag;
- 
  							if (FindWord(Conf, pnewword, ff, flag))
  								cur += addToResult(forms, cur, pnewword);
  						}
--- 1824,1829 ----
***************
*** 1420,1425 ****
--- 1841,1849 ----
  		pfree(forms);
  		return (NULL);
  	}
+ 	
+ 	cur = forms;
+ 	
  	return (forms);
  }
  
*** ./src/include/tsearch/dicts/spell.h.orig	2010-08-31 23:46:38.653669628 +0200
--- ./src/include/tsearch/dicts/spell.h	2010-08-31 23:46:47.469669487 +0200
***************
*** 161,166 ****
--- 161,168 ----
  
  	unsigned char flagval[256];
  	bool		usecompound;
+ 	FILE		*stream;
+ 	char mode;
  } IspellDict;
  
  extern TSLexeme *NINormalizeWord(IspellDict *Conf, char *word);
*** ./src/include/tsearch/ts_public.h.orig	2010-01-02 17:58:09.000000000 +0100
--- ./src/include/tsearch/ts_public.h	2010-08-31 23:46:00.185669425 +0200
***************
*** 78,83 ****
--- 78,87 ----
  			 char *(*wordop) (const char *));
  extern bool searchstoplist(StopList *s, char *key);
  
+ extern void outStopList(FILE *stream, StopList *s);
+ extern void readStopList(FILE *stream, StopList *s);
+ 
+ 
  /*
   * Interface with dictionaries
   */
#11Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#9)
Re: string function - "format" function proposal

2010/9/1 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Wed, Sep 1, 2010 at 6:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I don't found a nice mix for placeholders and positional placeholders

How about %pos$format, used in C-printf()? It might be
only in Linux's libc.

printf("<%2$s> <%1$d>\n", 123, "abc");
=> <abc> <123>
http://linux.die.net/man/3/printf

same syntax I designed and didn't implement, because it isn't too
readable (my opinion). But I am not against - just thinking so
separate function can be better.

%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literal

Looks good designed.  I have a couple of comments and questions:

* There is no examples for %l. What's the difference from %v and %s?
 If it always quotes, how does it work? Like as quote_literal()
 or quote_nullable()?

%l is always quoted

it is designed for syntax: SELECT integer '10';

%s versus %v ... %s is never quoted X %v is quoted when it is
necessary, NULL is showed as empty string for %s and as "NULL" for %v.
%s is used for messages (the behave is same like "concat"), %v is used
for SQL statement building

* %v quotes text values (and maybe all non-numeric values) with
 single quotes, but doesn't numeric values. How do we determine
 the difference? By type oid?

every datatype has typecategory attribute

typname │ typcategory
─────────────────┼─────────────
int8 │ N
int2 │ N
int4 │ N
regproc │ N
oid │ N
float4 │ N
float8 │ N
money │ N
numeric │ N
regprocedure │ N
regoper │ N
regoperator │ N
regclass │ N
regtype │ N
regconfig │ N
regdictionary │ N
cardinal_number │ N

so these types are unquoted

* %v also doesn't quote boolean values, but t and f are not valid.
 You should use true and false (or 't' and 'f') for the cases.
 (So, your "INSERT INTO" example is broken.)

you have a true - it should be fixed

Regards

Pavel

Show quoted text

--
Itagaki Takahiro

#12Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#11)
Re: string function - "format" function proposal

On Wed, Sep 1, 2010 at 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

* %v also doesn't quote boolean values, but t and f are not valid.
 You should use true and false (or 't' and 'f') for the cases.

you have a true - it should be fixed

I found quote_literal() prints boolean values as 'true' or 'false'.
It uses casting to text type rather than calling output function.
OTOH, format functions (and concat funcs) use output functions.

Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.

--
Itagaki Takahiro

#13Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#12)
Re: string function - "format" function proposal

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Wed, Sep 1, 2010 at 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

* %v also doesn't quote boolean values, but t and f are not valid.
 You should use true and false (or 't' and 'f') for the cases.

you have a true - it should be fixed

I found quote_literal() prints boolean values as 'true' or 'false'.
It uses casting to text type rather than calling output function.
OTOH, format functions (and concat funcs) use output functions.

Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.

Personally I prefer casting to text - it allows some later
customizations. And it's more consistent with || operator. So the
functions concat* should to be fixed.

Regards

Pavel

Show quoted text

--
Itagaki Takahiro

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#13)
Re: string function - "format" function proposal

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.

Personally I prefer casting to text -

No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.

iit allows some later
customizations. And it's more consistent with || operator.

I don't buy either of those arguments.

regards, tom lane

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#14)
Re: string function - "format" function proposal

2010/9/6 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.

Personally I prefer casting to text -

No, you need to use the I/O functions.  Not every type is guaranteed to
have a cast to text.

iit allows some later
customizations. And it's more consistent with || operator.

I don't buy either of those arguments.

can we use a both? like plpgsql? First check cast to text, and second
use a IO functions?

Why I think so this is useful - sometimes people asked some GUC for
formatting date, boolean and other. If these functions try to use a
cast to text first, then there is some space for customization via
custom cast functions.

Regards

Pavel Stehule

Show quoted text

                       regards, tom lane

#16Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Tom Lane (#14)
Re: string function - "format" function proposal

On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, you need to use the I/O functions.  Not every type is guaranteed to
have a cast to text.

One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.

If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.

--
Itagaki Takahiro

#17Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#16)
Re: string function - "format" function proposal

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, you need to use the I/O functions.  Not every type is guaranteed to
have a cast to text.

One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.

we can format some types directly - but I like idea of casting to text
because there is space for users.

Pavel

Show quoted text

If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.

--
Itagaki Takahiro

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itagaki Takahiro (#16)
Re: string function - "format" function proposal

Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:

On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No, you need to use the I/O functions.  Not every type is guaranteed to
have a cast to text.

One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.

So? You'd need to quote the values anyway, in general. If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.

If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.

I'm not sure that it's a good idea to have any type-specific special
cases. Failing to quote numeric values will bring in the whole set of
issues about how the parser initially types numeric constants, as in the
other thread over the weekend.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#15)
Re: string function - "format" function proposal

Pavel Stehule <pavel.stehule@gmail.com> writes:

Why I think so this is useful - sometimes people asked some GUC for
formatting date, boolean and other. If these functions try to use a
cast to text first, then there is some space for customization via
custom cast functions.

This is basically nonsense. If you don't control a type's output
function, you don't control its cast to text either. Nor do I think
it's a good idea to encourage people to make their casts to text operate
differently from their output functions. We have that one wart in
boolean casting because the SQL standard specifies the result of cast to
text and it's different from our historical practice in the bool output
function --- but it is a wart, not something we should encourage people
to emulate.

regards, tom lane

#20Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Tom Lane (#18)
Re: string function - "format" function proposal

On Mon, Sep 6, 2010 at 11:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So?  You'd need to quote the values anyway, in general.  If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.

I'm not sure that it's a good idea to have any type-specific special
cases.

As I remember, the original motivation of %v formatter is
some DBMSes don't like quoted numeric literals. However,
Postgres accepts quoted numerics, and we're developing Postgres.

So, our consensus would be %v formatter should be removed
completely from the format function.

--
Itagaki Takahiro

#21Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#20)
Re: string function - "format" function proposal

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Sep 6, 2010 at 11:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So?  You'd need to quote the values anyway, in general.  If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.

I'm not sure that it's a good idea to have any type-specific special
cases.

As I remember, the original motivation of %v formatter is
some DBMSes don't like quoted numeric literals. However,
Postgres accepts quoted numerics, and we're developing Postgres.

So, our consensus would be %v formatter should be removed
completely from the format function.

I think so tag that quotes all without numbers can be very useful, but
it isn't too much important for me. I can live without them.

Regards

Pavel

Show quoted text

--
Itagaki Takahiro

#22Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#21)
2 attachment(s)
Re: string function - "format" function proposal

Hello

I am sending a updated version.

changes:

* tag %v removed from format function,
* proprietary tags %lq a iq removed from sprintf
* code cleaned

patch divided to two parts - format function and stringfunc (contains
sprintf function and substitute function)

Regards

Pavel Stehule

2010/9/6 Pavel Stehule <pavel.stehule@gmail.com>:

Show quoted text

2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Mon, Sep 6, 2010 at 11:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

So?  You'd need to quote the values anyway, in general.  If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.

I'm not sure that it's a good idea to have any type-specific special
cases.

As I remember, the original motivation of %v formatter is
some DBMSes don't like quoted numeric literals. However,
Postgres accepts quoted numerics, and we're developing Postgres.

So, our consensus would be %v formatter should be removed
completely from the format function.

I think so tag that quotes all without numbers can be very useful, but
it isn't too much important for me. I can live without them.

Regards

Pavel

--
Itagaki Takahiro

Attachments:

format.difftext/x-patch; charset=US-ASCII; name=format.diffDownload
*** ./doc/src/sgml/func.sgml.orig	2010-08-24 08:30:43.000000000 +0200
--- ./doc/src/sgml/func.sgml	2010-09-09 08:58:12.515512369 +0200
***************
*** 1272,1277 ****
--- 1272,1280 ----
      <primary>encode</primary>
     </indexterm>
     <indexterm>
+     <primary>format</primary>
+    </indexterm>
+    <indexterm>
      <primary>initcap</primary>
     </indexterm>
     <indexterm>
***************
*** 1482,1487 ****
--- 1485,1507 ----
  
        <row>
         <entry>
+         <literal><function>format</function>(<parameter>formatstr</parameter> <type>text</type> 
+         [, <parameter>str</parameter> <type>"any"</type> [, ...] ])</literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+          This functions can be used to create a formated string or message. There are allowed
+          three types of tags: %s as string, %i as SQL identifiers and %l as SQL literals. Attention:
+          result for %i and %l must not be same as result of <function>quote_ident</function> and 
+          <function>quote_literal</function> functions, because this function doesn't try to coerce 
+          parameters to <type>text</type> type and directly use a type's output functions.
+        </entry>
+        <entry><literal>format('Hello %s', 'World')</literal></entry>
+        <entry><literal>Hello World</literal></entry>
+       </row>       
+ 
+       <row>
+        <entry>
          <literal><function>encode</function>(<parameter>data</parameter> <type>bytea</type>,
          <parameter>type</parameter> <type>text</type>)</literal>
         </entry>
*** ./src/backend/utils/adt/varlena.c.orig	2010-08-24 08:30:43.000000000 +0200
--- ./src/backend/utils/adt/varlena.c	2010-09-09 08:44:40.450637505 +0200
***************
*** 21,28 ****
--- 21,30 ----
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "parser/parse_coerce.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/bytea.h"
  #include "utils/lsyscache.h"
***************
*** 3702,3704 ****
--- 3704,3860 ----
  
  	PG_RETURN_TEXT_P(result);
  }
+ 
+ /*
+  * Text format - a variadic function replaces %c symbols with entered text.
+  */
+ Datum
+ text_format(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData		str;
+ 	char		*cp;
+ 	int			i = 1;
+ 	size_t		len;
+ 	char		*start_ptr;
+ 	char 			*end_ptr;
+ 	text	*result;
+ 
+ 	/* When format string is null, returns null */
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 
+ 	initStringInfo(&str);
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		/*
+ 		 * there are allowed escape char - '\'
+ 		 */
+ 		if (cp[0] == '\\')
+ 		{
+ 			/* check next char */
+ 			if (cp < end_ptr)
+ 			{
+ 				switch (cp[1])
+ 				{
+ 					case '\\':
+ 					case '%':
+ 						appendStringInfoChar(&str, cp[1]);
+ 						break;
+ 						
+ 					default:
+ 						ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("unsupported escape sequence \\%c", cp[1])));
+ 				}
+ 				cp++;
+ 			}
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("broken escape sequence")));
+ 		}
+ 		else if (cp[0] == '%')
+ 		{
+ 			char 	tag;
+ 			
+ 			/* initial check */
+ 			if (cp == end_ptr)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing formating tag")));
+ 		
+ 			if (i >= PG_NARGS())
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("too few parameters for format function")));
+ 			
+ 			tag = cp[1];
+ 			cp++;
+ 
+ 			if (!PG_ARGISNULL(i))
+ 		        {
+ 				Oid	valtype;
+ 				Datum	value;
+ 				Oid                     typoutput;
+ 				bool            typIsVarlena;
+ 		        
+ 				/* append n-th value */
+ 				value = PG_GETARG_DATUM(i);
+ 				valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
+ 				getTypeOutputInfo(valtype, &typoutput, &typIsVarlena);
+ 				
+ 				if (tag == 's')
+ 				{
+ 					/* show it as unspecified string */
+ 					appendStringInfoString(&str, OidOutputFunctionCall(typoutput, value));
+ 				}
+ 				else if (tag == 'i')
+ 				{
+ 					char *target_value;
+ 				
+ 					/* show it as sql identifier */
+ 					target_value = OidOutputFunctionCall(typoutput, value);
+ 					appendStringInfoString(&str, quote_identifier(target_value));
+ 				}
+ 				else if (tag == 'l')
+ 				{
+ 					text *txt;
+ 					text *quoted_txt;
+ 				
+ 					/* get text value and quotize */
+ 					txt = cstring_to_text(OidOutputFunctionCall(typoutput, value));
+ 					quoted_txt = DatumGetTextP(DirectFunctionCall1(quote_literal,
+ 													    PointerGetDatum(txt)));
+ 					appendStringInfoString(&str, text_to_cstring(quoted_txt));
+ 				}
+ 				else
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unsupported tag \"%%%c\"", tag)));
+ 			}
+ 			else
+ 			{
+ 				if (tag == 'i')
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 						 errmsg("NULL is used as SQL identifier")));
+ 				else if (tag == 'l')
+ 					appendStringInfoString(&str, "NULL");
+ 				else if (tag != 's')
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unsupported tag \"%%%c\"", tag)));
+ 			}
+ 			i++;
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	/* check if all arguments are used */
+ 	if (i != PG_NARGS())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("too many parameters for format function")));
+ 
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 	pfree(str.data);
+ 
+         PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * Non variadic text_format function - only wrapper
+  *   Print and check format string
+  */
+ Datum
+ text_format_nv(PG_FUNCTION_ARGS)
+ {
+ 	return text_format(fcinfo);
+ }
*** ./src/include/catalog/pg_proc.h.orig	2010-08-24 08:30:43.000000000 +0200
--- ./src/include/catalog/pg_proc.h	2010-09-09 08:37:11.321512358 +0200
***************
*** 2732,2737 ****
--- 2732,2741 ----
  DESCR("return the last n characters");
  DATA(insert OID = 3062 ( reverse	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_  text_reverse  _null_ _null_ _null_ ));
  DESCR("reverse text");
+ DATA(insert OID = 3063 ( format		PGNSP PGUID 12 1 0 2276 f f f f f s 2 0 25 "25 2276" "{25,2276}" "{i,v}" _null_ _null_  text_format _null_ _null_ _null_ ));
+ DESCR("format text message");
+ DATA(insert OID = 3064 ( format		PGNSP PGUID 12 1 0 0 f f f f f s 1 0 25 "25" _null_ _null_ _null_ _null_  text_format_nv _null_ _null_ _null_ ));
+ DESCR("format text message");
  
  DATA(insert OID = 1810 (  bit_length	   PGNSP PGUID 14 1 0 0 f f f t f i 1 0 23 "17" _null_ _null_ _null_ _null_ "select pg_catalog.octet_length($1) * 8" _null_ _null_ _null_ ));
  DESCR("length in bits");
*** ./src/include/utils/builtins.h.orig	2010-08-24 08:30:44.000000000 +0200
--- ./src/include/utils/builtins.h	2010-09-09 08:38:33.001512464 +0200
***************
*** 738,743 ****
--- 738,745 ----
  extern Datum text_left(PG_FUNCTION_ARGS);
  extern Datum text_right(PG_FUNCTION_ARGS);
  extern Datum text_reverse(PG_FUNCTION_ARGS);
+ extern Datum text_format(PG_FUNCTION_ARGS);
+ extern Datum text_format_nv(PG_FUNCTION_ARGS);
  
  /* version.c */
  extern Datum pgsql_version(PG_FUNCTION_ARGS);
*** ./src/test/regress/expected/text.out.orig	2010-08-24 08:30:44.000000000 +0200
--- ./src/test/regress/expected/text.out	2010-09-09 09:09:59.000000000 +0200
***************
*** 118,120 ****
--- 118,139 ----
    5 | ahoj | ahoj
  (11 rows)
  
+ select format('some text');
+   format   
+ -----------
+  some text
+ (1 row)
+ 
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10);
+                           format                           
+ -----------------------------------------------------------
+  -- insert into "My tab" (a,b,c) values('Hello',NULL,'10')
+ (1 row)
+ 
+ -- should fail
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', NULL, 'Hello', NULL, 10);
+ ERROR:  NULL is used as SQL identifier
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', NULL, 'Hello');
+ ERROR:  too few parameters for format function
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10, 10);
+ ERROR:  too many parameters for format function
*** ./src/test/regress/sql/text.sql.orig	2010-08-24 08:30:44.000000000 +0200
--- ./src/test/regress/sql/text.sql	2010-09-09 09:17:50.920668808 +0200
***************
*** 41,43 ****
--- 41,49 ----
  select concat_ws(NULL,10,20,null,30) is null;
  select reverse('abcde');
  select i, left('ahoj', i), right('ahoj', i) from generate_series(-5, 5) t(i) order by i;
+ select format('some text');
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10);
+ -- should fail
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', NULL, 'Hello', NULL, 10);
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', NULL, 'Hello');
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10, 10);
stringfunc.difftext/x-patch; charset=US-ASCII; name=stringfunc.diffDownload
*** ./contrib/stringfunc/expected/stringfunc.out.orig	2010-09-09 11:04:09.726641291 +0200
--- ./contrib/stringfunc/expected/stringfunc.out	2010-09-09 13:24:04.000000000 +0200
***************
*** 0 ****
--- 1,101 ----
+ SET client_min_messages = warning;
+ \set ECHO none
+ RESET client_min_messages;
+ -- sprintf test
+ select sprintf('>>>%10s %10d<<<', 'hello', 10);
+            sprintf           
+ -----------------------------
+  >>>     hello         10<<<
+ (1 row)
+ 
+ select sprintf('>>>%-10s<<<', 'hello');
+      sprintf      
+ ------------------
+  >>>hello     <<<
+ (1 row)
+ 
+ select sprintf('>>>%5.2<<<', 'abcde');
+ ERROR:  unsupported sprintf format tag '<'
+ select sprintf('>>>%*s<<<', 10, 'abcdef');
+      sprintf      
+ ------------------
+  >>>    abcdef<<<
+ (1 row)
+ 
+ select sprintf('>>>%*s<<<', 10); -- error
+ ERROR:  too few parameters specified for printf function
+ select sprintf('%010d', 10);
+   sprintf   
+ ------------
+  0000000010
+ (1 row)
+ 
+ select sprintf('%.6d', 10);
+  sprintf 
+ ---------
+  000010
+ (1 row)
+ 
+ select sprintf('%d', 100.0/3.0);
+  sprintf 
+ ---------
+  33
+ (1 row)
+ 
+ select sprintf('%e', 100.0/3.0);
+    sprintf    
+ --------------
+  3.333333e+01
+ (1 row)
+ 
+ select sprintf('%f', 100.0/3.0);
+   sprintf  
+ -----------
+  33.333333
+ (1 row)
+ 
+ select sprintf('%g', 100.0/3.0);
+  sprintf 
+ ---------
+  33.3333
+ (1 row)
+ 
+ select sprintf('%7.4e', 100.0/3.0);
+   sprintf   
+ ------------
+  3.3333e+01
+ (1 row)
+ 
+ select sprintf('%7.4f', 100.0/3.0);
+  sprintf 
+ ---------
+  33.3333
+ (1 row)
+ 
+ select sprintf('%7.4g', 100.0/3.0);
+  sprintf 
+ ---------
+    33.33
+ (1 row)
+ 
+ select sprintf('%d', NULL);
+  sprintf 
+ ---------
+  <NULL>
+ (1 row)
+ 
+ select substitute('second parameter is $2 and first parameter is $1', 'FIRST', 'SECOND');
+                        substitute                        
+ ---------------------------------------------------------
+  second parameter is SECOND and first parameter is FIRST
+ (1 row)
+ 
+ -- should fail
+ select substitute('third parameter is $3 and first parameter is $1', 'FIRST', 'SECOND');
+ ERROR:  positional placeholder "$3" is not valid
+ select substitute(' NULL parameter is $1', NULL);
+         substitute         
+ ---------------------------
+   NULL parameter is <NULL>
+ (1 row)
+ 
*** ./contrib/stringfunc/Makefile.orig	2010-09-09 11:00:55.877627534 +0200
--- ./contrib/stringfunc/Makefile	2010-09-09 10:34:29.769641263 +0200
***************
*** 0 ****
--- 1,20 ----
+ # $PostgreSQL: pgsql/contrib/stringfunc/Makefile,v 1.23 2009/08/28 20:26:18 petere Exp $
+ 
+ MODULE_big = stringfunc
+ OBJS= stringfunc.o
+ 
+ DATA_built = stringfunc.sql
+ DATA = uninstall_stringfunc.sql
+ REGRESS = stringfunc
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/stringfunc
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 
*** ./contrib/stringfunc/sql/stringfunc.sql.orig	2010-09-09 10:58:22.000000000 +0200
--- ./contrib/stringfunc/sql/stringfunc.sql	2010-09-09 13:23:54.676512814 +0200
***************
*** 0 ****
--- 1,28 ----
+ SET client_min_messages = warning;
+ \set ECHO none
+ \i stringfunc.sql
+ \set ECHO all
+ RESET client_min_messages;
+ 
+ -- sprintf test
+ select sprintf('>>>%10s %10d<<<', 'hello', 10);
+ select sprintf('>>>%-10s<<<', 'hello');
+ select sprintf('>>>%5.2<<<', 'abcde');
+ select sprintf('>>>%*s<<<', 10, 'abcdef');
+ select sprintf('>>>%*s<<<', 10); -- error
+ select sprintf('%010d', 10);
+ select sprintf('%.6d', 10);
+ 
+ select sprintf('%d', 100.0/3.0);
+ select sprintf('%e', 100.0/3.0);
+ select sprintf('%f', 100.0/3.0);
+ select sprintf('%g', 100.0/3.0);
+ select sprintf('%7.4e', 100.0/3.0);
+ select sprintf('%7.4f', 100.0/3.0);
+ select sprintf('%7.4g', 100.0/3.0);
+ select sprintf('%d', NULL);
+ 
+ select substitute('second parameter is $2 and first parameter is $1', 'FIRST', 'SECOND');
+ -- should fail
+ select substitute('third parameter is $3 and first parameter is $1', 'FIRST', 'SECOND');
+ select substitute(' NULL parameter is $1', NULL);
\ No newline at end of file
*** ./contrib/stringfunc/stringfunc.c.orig	2010-09-09 08:24:11.000000000 +0200
--- ./contrib/stringfunc/stringfunc.c	2010-09-09 13:23:18.796512637 +0200
***************
*** 0 ****
--- 1,662 ----
+ #include "postgres.h"
+ #include "string.h"
+ 
+ #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
+ #include "mb/pg_wchar.h"
+ #include "parser/parse_coerce.h"
+ #include "utils/array.h"
+ #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
+ 
+ PG_MODULE_MAGIC;
+ 
+ #define CHECK_PAD(symbol, pad_value)	\
+ do { \
+ 	if (pdesc->flags & pad_value)		\
+ 		ereport(ERROR,  	\
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
+ 				 errmsg("broken sprintf format"),          \
+ 				 errdetail("Format string is '%s'.", TextDatumGetCString(fmt)), 	   \
+ 				 errhint("Symbol '%c' can be used only one time.", symbol))); \
+ 	pdesc->flags |= pad_value; \
+ } while(0);
+ 
+ /*
+  * string functions
+  */
+ Datum	stringfunc_sprintf(PG_FUNCTION_ARGS);
+ Datum	stringfunc_sprintf_nv(PG_FUNCTION_ARGS);
+ Datum	stringfunc_substitute(PG_FUNCTION_ARGS);
+ Datum	stringfunc_substitute_nv(PG_FUNCTION_ARGS);
+ 
+ 
+ /*
+  * V1 registrations
+  */
+ PG_FUNCTION_INFO_V1(stringfunc_sprintf);
+ PG_FUNCTION_INFO_V1(stringfunc_sprintf_nv);
+ PG_FUNCTION_INFO_V1(stringfunc_substitute);
+ PG_FUNCTION_INFO_V1(stringfunc_substitute_nv);
+ 
+ typedef enum {
+     stringfunc_ZERO       =   1,
+     stringfunc_SPACE      =   2,
+     stringfunc_PLUS       =   4,
+     stringfunc_MINUS      =   8,
+     stringfunc_STAR_WIDTH =  16,
+     stringfunc_SHARP      =  32,
+     stringfunc_WIDTH      =  64,
+     stringfunc_PRECISION  = 128,
+     stringfunc_STAR_PRECISION = 256
+ } PlaceholderTags;
+ 
+ typedef struct {
+ 	int	flags;
+ 	char		field_type;
+ 	char		lenmod;
+ 	int32		width;
+ 	int32		precision;
+ } FormatPlaceholderData;
+ 
+ typedef FormatPlaceholderData *PlaceholderDesc;
+ 
+ static Datum 
+ castValueTo(Datum value, Oid targetTypeId, Oid inputTypeId)
+ {
+ 	Oid		funcId;
+ 	CoercionPathType	pathtype;
+ 	FmgrInfo	finfo;
+ 	Datum	   result;
+ 
+ 	if (inputTypeId != UNKNOWNOID)
+ 		pathtype = find_coercion_pathway(targetTypeId, inputTypeId, 
+ 									COERCION_EXPLICIT, 
+ 									&funcId);
+ 	else
+ 		pathtype = COERCION_PATH_COERCEVIAIO;
+ 	
+ 	switch (pathtype)
+ 	{
+ 		case COERCION_PATH_RELABELTYPE:
+ 			result = value;
+ 			break;
+ 		case COERCION_PATH_FUNC:
+ 			{
+ 				Assert(OidIsValid(funcId));
+ 				
+ 				fmgr_info(funcId, &finfo);
+ 				result = FunctionCall1(&finfo, value);
+ 			}
+ 			break;
+ 		
+ 		case COERCION_PATH_COERCEVIAIO:
+ 			{
+ 				Oid                     typoutput;
+ 				Oid			typinput;
+ 				bool            typIsVarlena;
+ 				Oid		typIOParam;
+ 				char 	*extval;
+ 		        
+ 				getTypeOutputInfo(inputTypeId, &typoutput, &typIsVarlena);
+ 				extval = OidOutputFunctionCall(typoutput, value);
+ 				
+ 				getTypeInputInfo(targetTypeId, &typinput, &typIOParam);
+ 				result = OidInputFunctionCall(typinput, extval, typIOParam, -1);
+ 			}
+ 			break;
+ 		
+ 		default:
+ 			elog(ERROR, "failed to find conversion function from %s to %s",
+ 					format_type_be(inputTypeId), format_type_be(targetTypeId));
+ 			/* be compiler quiet */
+ 			result = (Datum) 0;
+ 	}
+ 	
+ 	return result;
+ }
+ 
+ /*
+  * parse and verify sprintf parameter 
+  *
+  *      %[flags][width][.precision]specifier
+  *
+  */
+ static char *
+ parsePlaceholder(char *src, char *end_ptr, PlaceholderDesc pdesc, text *fmt)
+ {
+ 	char		c;
+ 
+ 	pdesc->field_type = '\0';
+ 	pdesc->lenmod = '\0';
+ 	pdesc->flags = 0;
+ 	pdesc->width = 0;
+ 	pdesc->precision = 0;
+ 
+ 	while (src < end_ptr && pdesc->field_type == '\0')
+ 	{
+ 		c = *++src;
+ 		
+ 		switch (c)
+ 		{
+ 			case '0':
+ 				CHECK_PAD('0', stringfunc_ZERO);
+ 				break;
+ 			case ' ':
+ 				CHECK_PAD(' ', stringfunc_SPACE);
+ 				break;
+ 			case '+':
+ 				CHECK_PAD('+', stringfunc_PLUS);
+ 				break;
+ 			case '-':
+ 				CHECK_PAD('-', stringfunc_MINUS);
+ 				break;
+ 			case '*':
+ 				CHECK_PAD('*', stringfunc_STAR_WIDTH);
+ 				break;
+ 			case '#':
+ 				CHECK_PAD('#', stringfunc_SHARP);
+ 				break;
+ 			case 'o': case 'i': case 'e': case 'E': case 'f': 
+ 			case 'g': case 'd': case 's': case 'x': case 'X': 
+ 				pdesc->field_type = *src;
+ 				break;
+ 			case '1': case '2': case '3': case '4':
+ 			case '5': case '6': case '7': case '8': case '9':
+ 				CHECK_PAD('9', stringfunc_WIDTH);
+ 				pdesc->width = c - '0';
+ 				while (src < end_ptr && isdigit(src[1]))
+ 					pdesc->width = pdesc->width * 10 + *++src - '0';
+ 				break;
+ 			case '.':
+ 				if (src < end_ptr)
+ 				{
+ 					if (src[1] == '*')
+ 					{
+ 						CHECK_PAD('.', stringfunc_STAR_PRECISION);
+ 						src++;
+ 					}
+ 					else
+ 					{
+ 						/*
+ 						 * when no one digit is entered, then precision
+ 						 * is zero - digits are optional.
+ 						 */
+ 						CHECK_PAD('.', stringfunc_PRECISION);
+ 						while (src < end_ptr && isdigit(src[1]))
+ 						{
+ 							pdesc->precision = pdesc->precision * 10 + *++src - '0';
+ 						}
+ 					}
+ 				}
+ 				else 
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("broken sprintf format"),
+ 							 errdetail("missing precision value")));
+ 				break;
+ 
+ 			default:
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("unsupported sprintf format tag '%c'", c)));
+ 		}
+ 	}
+ 
+ 	if (pdesc->field_type == '\0')
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format")));
+ 
+ 	return src;
+ }
+ 
+ static char *
+ currentFormat(StringInfo str, PlaceholderDesc pdesc)
+ {
+ 	resetStringInfo(str);
+ 	appendStringInfoChar(str,'%');
+ 	
+ 	if (pdesc->flags & stringfunc_ZERO)
+ 		appendStringInfoChar(str, '0');
+ 
+ 	if (pdesc->flags & stringfunc_MINUS)
+ 		appendStringInfoChar(str, '-');
+ 
+ 	if (pdesc->flags & stringfunc_PLUS)
+ 		appendStringInfoChar(str, '+');
+ 		
+ 	if (pdesc->flags & stringfunc_SPACE)
+ 		appendStringInfoChar(str, ' ');
+ 		
+ 	if (pdesc->flags & stringfunc_SHARP)
+ 		appendStringInfoChar(str, '#');
+ 
+ 	if ((pdesc->flags & stringfunc_WIDTH) || (pdesc->flags & stringfunc_STAR_WIDTH))
+ 		appendStringInfoChar(str, '*');
+ 		
+ 	if ((pdesc->flags & stringfunc_PRECISION) || (pdesc->flags & stringfunc_STAR_PRECISION))
+ 		appendStringInfoString(str, ".*");
+ 
+ 	/* Append l or ll. Decision is based on value of INT64_FORMAT */
+ 	if (pdesc->lenmod == 'l')
+ 	{
+ 		if (strcmp(INT64_FORMAT, "%lld") == 0)
+ 			appendStringInfoString(str, "ll");
+ 		else
+ 			appendStringInfoString(str, "l");
+ 	}
+ 	else if (pdesc->lenmod != '\0')
+ 		appendStringInfoChar(str, pdesc->lenmod);
+ 
+ 	appendStringInfoChar(str, pdesc->field_type);
+ 	
+ 	return str->data;
+ }
+ 
+ /*
+  * simulate %+width.precion%s format of sprintf function 
+  */
+ static void 
+ append_string(StringInfo str,  PlaceholderDesc pdesc, char *string)
+ {
+ 	int	nchars = 0;				/* length of substring in chars */
+ 	int	binlen = 0;				/* length of substring in bytes */
+ 
+ 	/*
+ 	 * apply precision - it means "show only first n chars", for strings - this flag is 
+ 	 * ignored for proprietary tags %lq and iq, because we can't to show a first n chars 
+ 	 * from possible quoted value. 
+ 	 */
+ 	if (pdesc->flags & stringfunc_PRECISION && pdesc->field_type != 'q')
+ 	{
+ 		char *ptr = string;
+ 		int	  len = pdesc->precision;
+ 		
+ 		if (pg_database_encoding_max_length() > 1)
+ 		{
+ 			while (*ptr && len > 0)
+ 			{
+ 				ptr += pg_mblen(ptr);
+ 				len--;
+ 				nchars++;
+ 			}
+ 		}
+ 		else
+ 		{
+ 			while (*ptr && len > 0)
+ 			{
+ 				ptr++;
+ 				len--;
+ 				nchars++;
+ 			}
+ 		}
+ 		
+ 		binlen = ptr - string;
+ 	}
+ 	else
+ 	{
+ 		/* there isn't precion specified, show complete string */
+ 		nchars = pg_mbstrlen(string);
+ 		binlen = strlen(string);
+ 	}
+ 	
+ 	/* when width is specified, then we have to solve left or right align */
+ 	if (pdesc->flags & stringfunc_WIDTH)
+ 	{
+ 		if (pdesc->width > nchars)
+ 		{
+ 			/* add neccessary spaces to begin or end */
+ 			if (pdesc->flags & stringfunc_MINUS)
+ 			{
+ 				/* allign to left */
+ 				appendBinaryStringInfo(str, string, binlen);
+ 				appendStringInfoSpaces(str, pdesc->width - nchars);
+ 			}
+ 			else
+ 			{
+ 				/* allign to right */
+ 				appendStringInfoSpaces(str, pdesc->width - nchars);
+ 				appendBinaryStringInfo(str, string, binlen);
+ 			}
+ 
+ 		}
+ 		else
+ 			/* just copy result to output */
+ 			appendBinaryStringInfo(str, string, binlen);
+ 	}
+ 	else
+ 		/* just copy result to output */
+ 		appendBinaryStringInfo(str, string, binlen);
+ }
+ 
+ /*
+  * Set width and precision when they are defined dynamicaly
+  */
+ static 
+ int setWidthAndPrecision(PlaceholderDesc pdesc, FunctionCallInfoData *fcinfo, int current)
+ {
+ 
+ 	/* 
+ 	 * don't allow ambiguous definition
+ 	 */
+ 	if ((pdesc->flags & stringfunc_WIDTH) && (pdesc->flags & stringfunc_STAR_WIDTH))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format"),
+ 				 errdetail("ambiguous width definition")));
+ 
+ 	if ((pdesc->flags & stringfunc_PRECISION) && (pdesc->flags & stringfunc_STAR_PRECISION))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format"),
+ 				 errdetail("ambiguous precision definition")));
+ 	if (pdesc->flags & stringfunc_STAR_WIDTH)
+ 	{
+ 		if (current >= PG_NARGS())
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("too few parameters specified for printf function")));
+ 		
+ 		if (PG_ARGISNULL(current))
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 				 errmsg("null value not allowed"),
+ 				 errhint("width (%dth) arguments is NULL", current)));
+ 		
+ 		pdesc->width = DatumGetInt32(castValueTo(PG_GETARG_DATUM(current), INT4OID, 
+ 									get_fn_expr_argtype(fcinfo->flinfo, current)));
+ 		/* reset flag */
+ 		pdesc->flags ^= stringfunc_STAR_WIDTH;
+ 		pdesc->flags |= stringfunc_WIDTH;
+ 		current += 1;
+ 	}
+ 	
+ 	if (pdesc->flags & stringfunc_STAR_PRECISION)
+ 	{
+ 		if (current >= PG_NARGS())
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("too few parameters specified for printf function")));
+ 		
+ 		if (PG_ARGISNULL(current))
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 				 errmsg("null value not allowed"),
+ 				 errhint("width (%dth) arguments is NULL", current)));
+ 		
+ 		pdesc->precision = DatumGetInt32(castValueTo(PG_GETARG_DATUM(current), INT4OID, 
+ 									get_fn_expr_argtype(fcinfo->flinfo, current)));
+ 		/* reset flags */
+ 		pdesc->flags ^= stringfunc_STAR_PRECISION;
+ 		pdesc->flags |= stringfunc_PRECISION;
+ 		current += 1;
+ 	}
+ 	
+ 	return current;
+ }
+ 
+ /*
+  * sprintf function - it is wrapper for libc vprintf function
+  *
+  *    ensure PostgreSQL -> C casting
+  */
+ Datum
+ stringfunc_sprintf(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData   str;
+ 	StringInfoData   format_str;
+ 	char		*cp;
+ 	int			i = 1;
+ 	size_t		len;
+ 	char		*start_ptr,
+ 				*end_ptr;
+ 	FormatPlaceholderData		pdesc;
+ 	text *result;
+ 
+ 	Oid                     typoutput;
+ 	bool            typIsVarlena;
+ 	Datum 	value;
+ 	Oid valtype;
+ 
+ 	/* When format string is null, returns null */
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 
+ 	initStringInfo(&str);
+ 	initStringInfo(&format_str);
+ 
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		if (cp[0] == '%')
+ 		{
+ 			/* when cp is not pointer on last char, check %% */
+ 			if (cp < end_ptr && cp[1] == '%')
+ 			{
+ 				appendStringInfoChar(&str, cp[1]);
+ 				cp++;
+ 				continue;
+ 			}
+ 
+ 			cp = parsePlaceholder(cp, end_ptr, &pdesc, fmt);
+ 			i = setWidthAndPrecision(&pdesc, fcinfo, i);
+ 			
+ 			if (i >= PG_NARGS())
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("too few parameters specified for printf function")));
+ 
+ 			if (!PG_ARGISNULL(i))
+ 			{
+ 				/* append n-th value */
+ 				value = PG_GETARG_DATUM(i);
+ 				valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
+ 				
+ 				/* convert value to target type */
+ 				switch (pdesc.field_type)
+ 				{
+ 					case 'o': case 'd': case 'i': case 'x': case 'X':
+ 						{
+ 							int64	target_value;
+ 							const char 		*format;
+ 							
+ 							pdesc.lenmod = 'l';
+ 							target_value = DatumGetInt64(castValueTo(value, INT8OID, valtype));
+ 							format = currentFormat(&format_str, &pdesc);
+ 							
+ 							if ((pdesc.flags & stringfunc_WIDTH) && (pdesc.flags & stringfunc_PRECISION))
+ 								appendStringInfo(&str, format, pdesc.width, pdesc.precision, target_value);
+ 							else if (pdesc.flags & stringfunc_WIDTH)
+ 								appendStringInfo(&str, format, pdesc.width, target_value);
+ 							else if (pdesc.flags & stringfunc_PRECISION)
+ 								appendStringInfo(&str, format, pdesc.precision, target_value);
+ 							else
+ 								appendStringInfo(&str, format, target_value);
+ 						}
+ 						break;
+ 					case 'e': case 'f': case 'g': case 'G': case 'E':
+ 						{
+ 							float8	target_value;
+ 							const char 		*format;
+ 							
+ 							target_value = DatumGetFloat8(castValueTo(value, FLOAT8OID, valtype));
+ 							format = currentFormat(&format_str, &pdesc);
+ 							
+ 							if ((pdesc.flags & stringfunc_WIDTH) && (pdesc.flags & stringfunc_PRECISION))
+ 								appendStringInfo(&str, format, pdesc.width, pdesc.precision, target_value);
+ 							else if (pdesc.flags & stringfunc_WIDTH)
+ 								appendStringInfo(&str, format, pdesc.width, target_value);
+ 							else if (pdesc.flags & stringfunc_PRECISION)
+ 								appendStringInfo(&str, format, pdesc.precision, target_value);
+ 							else
+ 								appendStringInfo(&str, format, target_value);
+ 						}
+ 						break;
+ 					case 's':
+ 						{
+ 							char		*target_value;
+ 
+ 							getTypeOutputInfo(valtype, &typoutput, &typIsVarlena);
+ 							target_value = OidOutputFunctionCall(typoutput, value);
+ 							
+ 							append_string(&str, &pdesc, target_value);
+ 							pfree(target_value);
+ 						}
+ 						break;
+ 					default:
+ 						elog(ERROR, "unknown format: %c", pdesc.field_type);
+ 				}
+ 			}
+ 			else
+ 				/* append a NULL string */
+ 				append_string(&str, &pdesc, "<NULL>");
+ 			i++;
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	/* check if all arguments are used */
+ 	if (i != PG_NARGS())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("too many parameters for printf function")));
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 	
+ 	pfree(str.data);
+ 	pfree(format_str.data);
+ 
+ 	PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * only wrapper
+  */
+ Datum
+ stringfunc_sprintf_nv(PG_FUNCTION_ARGS)
+ {
+ 	return stringfunc_sprintf(fcinfo);
+ }
+ 
+ 
+ /*
+  * Substitute a positional parameters by value
+  */
+ Datum
+ stringfunc_substitute(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData		str;
+ 	char		*cp;
+ 	size_t		len;
+ 	char		*start_ptr;
+ 	char		*end_ptr;
+ 	text	*result;
+ 	ArrayType *array;
+ 	Oid			elmtype;
+ 	int16		elmlen;
+ 	bool		elmbyval;
+ 	char		elmalign;
+ 	int			num_elems = 0;
+ 	Datum	   *elem_values;
+ 	bool	   *elem_nulls;
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 	
+ 	if (PG_NARGS() == 2)
+ 	{
+ 		array = PG_GETARG_ARRAYTYPE_P(1);
+ 		elmtype = ARR_ELEMTYPE(array);
+ 		get_typlenbyvalalign(elmtype, &elmlen, &elmbyval, &elmalign);
+ 
+ 		deconstruct_array(array, elmtype,
+ 						  elmlen, elmbyval, elmalign,
+ 						  &elem_values, &elem_nulls,
+ 						  &num_elems);
+ 	}
+ 
+ 	initStringInfo(&str);
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		/*
+ 		 * there are allowed escape char - '\'
+ 		 */
+ 		if (cp[0] == '\\')
+ 		{
+ 			/* check next char */
+ 			if (cp < end_ptr)
+ 			{
+ 				switch (cp[1])
+ 				{
+ 					case '\\':
+ 					case '$':
+ 						appendStringInfoChar(&str, cp[1]);
+ 						break;
+ 						
+ 					default:
+ 						ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("unsuported escape sequence \"\\%c\"", cp[1])));
+ 				}
+ 				cp++;
+ 			}
+ 			else
+ 				elog(ERROR, "broken escape sequence");
+ 		}
+ 		else if (cp[0] == '$')
+ 		{
+ 			long pos;
+ 			char *endptr;
+ 
+ 			/* initial check */
+ 			if (cp == end_ptr)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("missing a parameter position specification")));
+ 		
+ 			if (!isdigit(cp[1]))
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("expected a numeric value")));
+ 				
+ 			pos = strtol(&cp[1], &endptr, 10);
+ 			cp = endptr - 1;
+ 			
+ 			if (pos < 1 || pos > num_elems)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("positional placeholder \"$%ld\" is not valid", pos)));
+ 			
+ 			if (!elem_nulls[pos - 1])
+ 				appendStringInfoString(&str,  text_to_cstring(DatumGetTextP(elem_values[pos - 1])));
+ 			else
+ 				appendStringInfoString(&str, "<NULL>");
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 	pfree(str.data);
+ 
+         PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * Non variadic text_substitute function - only wrapper
+  *   Print and check format string
+  */
+ Datum
+ stringfunc_substitute_nv(PG_FUNCTION_ARGS)
+ {
+ 	return text_format(fcinfo);
+ }
*** ./contrib/stringfunc/stringfunc.sql.in.orig	2010-09-09 11:01:11.917637557 +0200
--- ./contrib/stringfunc/stringfunc.sql.in	2010-09-09 10:50:22.476641406 +0200
***************
*** 0 ****
--- 1,25 ----
+ /* $PostgreSQL: pgsql/contrib/stringfunc/stringfunc.sql.in,v 1.25 2009/06/11 18:30:03 tgl Exp $ */
+ 
+ -- Adjust this setting to control where the objects get created.
+ SET search_path = public;
+ 
+ CREATE OR REPLACE FUNCTION sprintf(fmt text, VARIADIC args "any")
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_sprintf'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION sprintf(fmt text)
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_sprintf_nv'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION substitute(fmt text, VARIADIC args text[])
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_substitute'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION substitute(fmt text)
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_substitute_nv'
+ LANGUAGE C STABLE;
+ 
*** ./contrib/stringfunc/uninstall_stringfunc.sql.orig	2010-09-09 11:01:30.311637526 +0200
--- ./contrib/stringfunc/uninstall_stringfunc.sql	2010-09-09 10:55:23.053514595 +0200
***************
*** 0 ****
--- 1,10 ----
+ /* $PostgreSQL: pgsql/contrib/stringfunc/uninstall_stringfunc.sql,v 1.8 2008/04/14 17:05:32 tgl Exp $ */
+ 
+ -- Adjust this setting to control where the objects get dropped.
+ SET search_path = public;
+ 
+ DROP FUNCTION sprintf(fmt text, VARIADIC args "any");
+ DROP FUNCTION sprintf(fmt text);
+ DROP FUNCTION substitute(fmt text, VARIADIC args text[]);
+ DROP FUNCTION substitute(fmt text);
+ 
*** ./doc/src/sgml/contrib.sgml.orig	2010-06-14 19:25:24.000000000 +0200
--- ./doc/src/sgml/contrib.sgml	2010-09-09 13:52:12.080641043 +0200
***************
*** 115,120 ****
--- 115,121 ----
   &seg;
   &contrib-spi;
   &sslinfo;
+  &stringfunc;
   &tablefunc;
   &test-parser;
   &tsearch2;
*** ./doc/src/sgml/filelist.sgml.orig	2010-06-14 19:25:24.000000000 +0200
--- ./doc/src/sgml/filelist.sgml	2010-09-09 13:43:12.527516062 +0200
***************
*** 127,132 ****
--- 127,133 ----
  <!entity seg             SYSTEM "seg.sgml">
  <!entity contrib-spi     SYSTEM "contrib-spi.sgml">
  <!entity sslinfo         SYSTEM "sslinfo.sgml">
+ <!entity stringfunc      SYSTEM "stringfunc.sgml">
  <!entity tablefunc       SYSTEM "tablefunc.sgml">
  <!entity test-parser     SYSTEM "test-parser.sgml">
  <!entity tsearch2        SYSTEM "tsearch2.sgml">
*** ./doc/src/sgml/stringfunc.sgml.orig	2010-09-09 13:43:44.000000000 +0200
--- ./doc/src/sgml/stringfunc.sgml	2010-09-09 13:47:16.615512694 +0200
***************
*** 0 ****
--- 1,49 ----
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/stringfunc.sgml,v 1.2 2008/09/12 18:29:49 tgl Exp $ -->
+ 
+ <sect1 id="stringfunc">
+  <title>stringfunc</title>
+ 
+  <indexterm zone="stringfunc">
+   <primary>stringfunc</primary>
+  </indexterm>
+ 
+  <para>
+   The <filename>stringfunc</> module provides a additional function
+   for operation over strings. These functions can be used as patter
+   for developing a variadic functions.
+  </para>
+ 
+  <sect2>
+   <title>How to Use It</title>
+ 
+   <para>
+    Here's a simple example of usage:
+ 
+   <programlisting>
+    SELECT sprintf('formated number: %10d',10);
+    SELECT substitute('file '$1' doesn''t exists', '/var/log/applog');
+   </programlisting>
+   </para>
+ 
+   <para>
+    Nodul contains following functions:
+   </para>
+ 
+   <itemizedlist>
+    <listitem>
+     <para>
+       <function>sprintf(formatstr [, params])</> clasic sprintf function - it 
+       simplyfied version of libc sprintf function - it doesn't support length
+       modifiers and it will do necessary conversions automaticaly.  
+     </para>
+    </listitem>
+ 
+    <listitem>
+     <para>
+       <function>substitute(formatstr [, params])</> this function replace
+       a positional placeholers like $n by params.
+     </para>
+    </listitem>
+   </itemizedlist>
+  </sect2>
+ </sect1>
#23Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#22)
Re: string function - "format" function proposal

On Thu, Sep 9, 2010 at 8:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I am sending a updated version.

changes:
* tag %v removed from format function,
* proprietary tags %lq a iq removed from sprintf
* code cleaned

patch divided to two parts - format function and stringfunc (contains
sprintf function and substitute function)

=== Discussions about the spec ===
Two patches add format() into the core, and substitute() and sprintf() into
stringfunc contrib module. But will we have 3 versions of string formatters?

IMHO, substitute() is the best choice that we will have in the core because
functionalities in format() and sprintf() can be achieved by combination of
substitute() and quote_nullable(), quote_ident(), or to_char(). I think the
core will provide only simple and non-overlapped features. Users can write
wrapper functions by themselves if they think the description is redundant.

=== format.diff ===
* It has a reject in doc, but the hunk can be fixed easily.
1 out of 2 hunks FAILED -- saving rejects to file doc/src/sgml/func.sgml.rej
COMMENT: We have the function list in alphabetical order,
so format() should be inserted after encode().
* It can be built without compile warnings.
* Enough documentation and regression tests are included.

=== stringfunc.diff ===
* It can be applied cleanly and built without compile warnings.
* Documentation is included, but not enough.
COMMENT: According to existing docs, function list are described with
<variablelist> or <table>.
* Enough regression tests are included.
* COMMENT: stringfunc directory should be added to contrib/Makefile.

* BUG: stringfunc_substitute_nv() calls text_format().
I think we don't need stringfunc_substitute_nv at all.
It can be replaced by stringfunc_substitute(). _nv version is only
required if it is in the core because of sanity regression test.

* BUG?: The doc says sprintf() doesn't support length modifiers,
but it is actually supported in broken state:
postgres=# SELECT sprintf('%*s', 2, 'ABC');
sprintf
---------
ABC <= should be ERROR if unsupported, or AB if supported.
(1 row)

* BUG?: ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unsupported tag \"%%%c\"", tag)));
Is the code ok if the tag (a char) is a partial byte of multi-byte character?
My machine prints ? in the case, but it might be platform-dependent.

=== Both patches ===
* Performance: I don't think those functions are not performance-critical,
but we could cache typoutput functions in fn_extra if needed.
record_out would be a reference.

* Coding: Whitespace and tabs are mixed in some places. They are not so
important because we will run pgindent, but careful choice will be
preferred even of a patch.

--
Itagaki Takahiro

#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#23)
2 attachment(s)
Re: string function - "format" function proposal

Hello

2010/9/29 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Thu, Sep 9, 2010 at 8:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

I am sending a updated version.

changes:
* tag %v removed from format function,
* proprietary tags %lq a iq removed from sprintf
* code cleaned

patch divided to two parts - format function and stringfunc (contains
sprintf function and substitute function)

=== Discussions about the spec ===
Two patches add format() into the core, and substitute() and sprintf() into
stringfunc contrib module. But will we have 3 versions of string formatters?

IMHO, substitute() is the best choice that we will have in the core because
functionalities in format() and sprintf() can be achieved by combination of
substitute() and quote_nullable(), quote_ident(), or to_char(). I think the
core will provide only simple and non-overlapped features. Users can write
wrapper functions by themselves if they think the description is redundant.

I think we need a three variants of formating functions - "format" in
core, fo simply creating and building a messages, a SQL strings,
"sprintf" for traditionalist in contrib - this functions isn't well
joined to SQL environment and it's too heavy - more it overwrite a
some functionality of "to_char" function. "substitute" function
provide just positional unformatted parameters - that isn't typical
ucase - so must not be in core too.

=== format.diff ===
* It has a reject in doc, but the hunk can be fixed easily.
   1 out of 2 hunks FAILED -- saving rejects to file doc/src/sgml/func.sgml.rej
 COMMENT: We have the function list in alphabetical order,

fixed

 so format() should be inserted after encode().
* It can be built without compile warnings.
* Enough documentation and regression tests are included.

=== stringfunc.diff ===
* It can be applied cleanly and built without compile warnings.
* Documentation is included, but not enough.
 COMMENT: According to existing docs, function list are described with
 <variablelist> or <table>.

fixed

* Enough regression tests are included.
* COMMENT: stringfunc directory should be added to contrib/Makefile.

* BUG: stringfunc_substitute_nv() calls text_format().
 I think we don't need stringfunc_substitute_nv at all.
 It can be replaced by stringfunc_substitute(). _nv version is only
 required if it is in the core because of sanity regression test.

you have a true - but I am not sure about coding patters for contribs,
so I designed it with respect to core sanity check.

* BUG?: The doc says sprintf() doesn't support length modifiers,
 but it is actually supported in broken state:

I was wrong in documentation - length modifiers are supported -
positional modifiers are not supported. fixed.

postgres=# SELECT sprintf('%*s', 2, 'ABC');
 sprintf
---------
 ABC      <= should be ERROR if unsupported, or AB if supported.
(1 row)

it works well - "with" modifier doesn't reduce string. String is
stripped by "precision" modifiers.

SELECT sprintf('%*.s', 2, ABC) --> AB

checked via gcc

please, try
printf(">>%s<<\n", "12345678");
printf(">>%3s<<\n", "12345678");
printf(">>%.3s<<\n", "12345678");
printf(">>%10.3s<<\n", "12345678");

do you understand me, why I "dislike" "printf"? How much people knows
well these formatting rules?

* BUG?: ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
         errmsg("unsupported tag \"%%%c\"", tag)));
 Is the code ok if the tag (a char) is a partial byte of multi-byte character?

it's bug - the supported tags are only single byte, but unsupported
tag can be multibyte character and must by showed correctly - fixed.

 My machine prints ? in the case, but it might be platform-dependent.

=== Both patches ===
* Performance: I don't think those functions are not performance-critical,
 but we could cache typoutput functions in fn_extra if needed.
 record_out would be a reference.

I though about it too and I checked it now - there is 0.4% performance
on 10000000 rows on my PC (format function) - so I don't do any
changes - caching of oids means a few lines more - but here isn't
expected effect.

* Coding: Whitespace and tabs are mixed in some places. They are not so
 important because we will run pgindent, but careful choice will be
 preferred even of a patch.

checked, fixed

Thank you very much for review

regards

Pavel Stehule

Show quoted text

--
Itagaki Takahiro

Attachments:

format.difftext/x-patch; charset=US-ASCII; name=format.diffDownload
*** ./doc/src/sgml/func.sgml.orig	2010-09-09 02:48:22.000000000 +0200
--- ./doc/src/sgml/func.sgml	2010-09-29 07:18:59.845395002 +0200
***************
*** 1272,1277 ****
--- 1272,1280 ----
      <primary>encode</primary>
     </indexterm>
     <indexterm>
+     <primary>format</primary>
+    </indexterm>
+    <indexterm>
      <primary>initcap</primary>
     </indexterm>
     <indexterm>
***************
*** 1495,1500 ****
--- 1498,1520 ----
         <entry><literal>encode(E'123\\000\\001', 'base64')</literal></entry>
         <entry><literal>MTIzAAE=</literal></entry>
        </row>       
+       
+       </row>
+        <entry>
+         <literal><function>format</function>(<parameter>formatstr</parameter> <type>text</type> 
+         [, <parameter>str</parameter> <type>"any"</type> [, ...] ])</literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+          This functions can be used to create a formated string or message. There are allowed
+          three types of tags: %s as string, %i as SQL identifiers and %l as SQL literals. Attention:
+          result for %i and %l must not be same as result of <function>quote_ident</function> and 
+          <function>quote_literal</function> functions, because this function doesn't try to coerce 
+          parameters to <type>text</type> type and directly use a type's output functions.
+        </entry>
+        <entry><literal>format('Hello %s', 'World')</literal></entry>
+        <entry><literal>Hello World</literal></entry>
+       </row>       
  
        <row>
         <entry><literal><function>initcap(<parameter>string</parameter>)</function></literal></entry>
*** ./src/backend/utils/adt/varlena.c.orig	2010-09-29 07:16:25.744395786 +0200
--- ./src/backend/utils/adt/varlena.c	2010-09-29 09:51:39.340270718 +0200
***************
*** 21,28 ****
--- 21,30 ----
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "parser/parse_coerce.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/bytea.h"
  #include "utils/lsyscache.h"
***************
*** 48,53 ****
--- 50,61 ----
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ typedef struct
+ {
+ 	int	n_valid_oids;
+ 	Oid	typoutput[FUNC_MAX_ARGS];
+ } format_fn_extra_cache;
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
***************
*** 3702,3704 ****
--- 3710,3866 ----
  
  	PG_RETURN_TEXT_P(result);
  }
+ 
+ /*
+  * Text format - a variadic function replaces %c symbols with entered text.
+  */
+ Datum
+ text_format(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData		str;
+ 	char		*cp;
+ 	int			i = 1;
+ 	size_t		len;
+ 	char		*start_ptr;
+ 	char 			*end_ptr;
+ 	text	*result;
+ 
+ 	/* When format string is null, returns null */
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 
+ 	initStringInfo(&str);
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		/*
+ 		 * there are allowed escape char - '\'
+ 		 */
+ 		if (cp[0] == '\\')
+ 		{
+ 			/* check next char */
+ 			if (cp < end_ptr)
+ 			{
+ 				switch (cp[1])
+ 				{
+ 					case '\\':
+ 					case '%':
+ 						appendStringInfoChar(&str, cp[1]);
+ 						break;
+ 
+ 					default:
+ 						ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("unsupported escape sequence \\%.*s", pg_mblen(&cp[1]), &cp[1])));
+ 				}
+ 				cp++;
+ 			}
+ 			else
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("broken escape sequence")));
+ 		}
+ 		else if (cp[0] == '%')
+ 		{
+ 			char 	tag;
+ 
+ 			/* initial check */
+ 			if (cp == end_ptr)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("missing formating tag")));
+ 
+ 			if (i >= PG_NARGS())
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("too few parameters for format function")));
+ 
+ 			tag = cp[1];
+ 			cp++;
+ 
+ 			if (!PG_ARGISNULL(i))
+ 			{
+ 				Oid	valtype;
+ 				Datum	value;
+ 				Oid                     typoutput;
+ 				bool            typIsVarlena;
+ 
+ 				/* append n-th value */
+ 				value = PG_GETARG_DATUM(i);
+ 				valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
+ 				getTypeOutputInfo(valtype, &typoutput, &typIsVarlena);
+ 
+ 				if (tag == 's')
+ 				{
+ 					/* show it as unspecified string */
+ 					appendStringInfoString(&str, OidOutputFunctionCall(typoutput, value));
+ 				}
+ 				else if (tag == 'i')
+ 				{
+ 					char *target_value;
+ 
+ 					/* show it as sql identifier */
+ 					target_value = OidOutputFunctionCall(typoutput, value);
+ 					appendStringInfoString(&str, quote_identifier(target_value));
+ 				}
+ 				else if (tag == 'l')
+ 				{
+ 					text *txt;
+ 					text *quoted_txt;
+ 
+ 					/* get text value and quotize */
+ 					txt = cstring_to_text(OidOutputFunctionCall(typoutput, value));
+ 					quoted_txt = DatumGetTextP(DirectFunctionCall1(quote_literal,
+ 													    PointerGetDatum(txt)));
+ 					appendStringInfoString(&str, text_to_cstring(quoted_txt));
+ 				}
+ 				else
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unsupported tag \"%%%.*s\"", pg_mblen(cp), cp)));
+ 			}
+ 			else
+ 			{
+ 				if (tag == 'i')
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 						 errmsg("NULL is used as SQL identifier")));
+ 				else if (tag == 'l')
+ 					appendStringInfoString(&str, "NULL");
+ 				else if (tag != 's')
+ 					ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unsupported tag \"%%%.*s\"", pg_mblen(cp), cp)));
+ 			}
+ 			i++;
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	/* check if all arguments are used */
+ 	if (i != PG_NARGS())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("too many parameters for format function")));
+ 
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 	pfree(str.data);
+ 
+         PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * Non variadic text_format function - only wrapper
+  *   Print and check format string
+  */
+ Datum
+ text_format_nv(PG_FUNCTION_ARGS)
+ {
+ 	return text_format(fcinfo);
+ }
*** ./src/include/catalog/pg_proc.h.orig	2010-09-03 03:34:55.000000000 +0200
--- ./src/include/catalog/pg_proc.h	2010-09-29 07:16:42.658395449 +0200
***************
*** 2741,2746 ****
--- 2741,2750 ----
  DESCR("return the last n characters");
  DATA(insert OID = 3062 ( reverse	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_  text_reverse  _null_ _null_ _null_ ));
  DESCR("reverse text");
+ DATA(insert OID = 3063 ( format		PGNSP PGUID 12 1 0 2276 f f f f f s 2 0 25 "25 2276" "{25,2276}" "{i,v}" _null_ _null_  text_format _null_ _null_ _null_ ));
+ DESCR("format text message");
+ DATA(insert OID = 3064 ( format		PGNSP PGUID 12 1 0 0 f f f f f s 1 0 25 "25" _null_ _null_ _null_ _null_  text_format_nv _null_ _null_ _null_ ));
+ DESCR("format text message");
  
  DATA(insert OID = 1810 (  bit_length	   PGNSP PGUID 14 1 0 0 f f f t f i 1 0 23 "17" _null_ _null_ _null_ _null_ "select pg_catalog.octet_length($1) * 8" _null_ _null_ _null_ ));
  DESCR("length in bits");
*** ./src/include/utils/builtins.h.orig	2010-09-03 03:34:55.000000000 +0200
--- ./src/include/utils/builtins.h	2010-09-29 07:16:42.661395833 +0200
***************
*** 742,747 ****
--- 742,749 ----
  extern Datum text_left(PG_FUNCTION_ARGS);
  extern Datum text_right(PG_FUNCTION_ARGS);
  extern Datum text_reverse(PG_FUNCTION_ARGS);
+ extern Datum text_format(PG_FUNCTION_ARGS);
+ extern Datum text_format_nv(PG_FUNCTION_ARGS);
  
  /* version.c */
  extern Datum pgsql_version(PG_FUNCTION_ARGS);
*** ./src/test/regress/expected/text.out.orig	2010-09-29 07:16:25.749395800 +0200
--- ./src/test/regress/expected/text.out	2010-09-29 07:16:42.662395333 +0200
***************
*** 118,120 ****
--- 118,139 ----
    5 | ahoj | ahoj
  (11 rows)
  
+ select format('some text');
+   format   
+ -----------
+  some text
+ (1 row)
+ 
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10);
+                           format                           
+ -----------------------------------------------------------
+  -- insert into "My tab" (a,b,c) values('Hello',NULL,'10')
+ (1 row)
+ 
+ -- should fail
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', NULL, 'Hello', NULL, 10);
+ ERROR:  NULL is used as SQL identifier
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', NULL, 'Hello');
+ ERROR:  too few parameters for format function
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10, 10);
+ ERROR:  too many parameters for format function
*** ./src/test/regress/sql/text.sql.orig	2010-09-29 07:16:25.751395218 +0200
--- ./src/test/regress/sql/text.sql	2010-09-29 07:16:42.662395333 +0200
***************
*** 41,43 ****
--- 41,49 ----
  select concat_ws(NULL,10,20,null,30) is null;
  select reverse('abcde');
  select i, left('ahoj', i), right('ahoj', i) from generate_series(-5, 5) t(i) order by i;
+ select format('some text');
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10);
+ -- should fail
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', NULL, 'Hello', NULL, 10);
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', NULL, 'Hello');
+ select format('-- insert into %i (a,b,c) values(%l,%l,%l)', 'My tab', 'Hello', NULL, 10, 10);
stringfunc.difftext/x-patch; charset=US-ASCII; name=stringfunc.diffDownload
*** ./contrib/Makefile.orig	2010-06-14 18:17:56.000000000 +0200
--- ./contrib/Makefile	2010-09-29 08:14:37.569273826 +0200
***************
*** 40,45 ****
--- 40,46 ----
  		pgstattuple	\
  		seg		\
  		spi		\
+ 		stringfunc	\
  		tablefunc	\
  		test_parser	\
  		tsearch2	\
*** ./contrib/stringfunc/expected/stringfunc.out.orig	2010-09-29 08:11:40.945271948 +0200
--- ./contrib/stringfunc/expected/stringfunc.out	2010-09-29 08:10:48.363270909 +0200
***************
*** 0 ****
--- 1,101 ----
+ SET client_min_messages = warning;
+ \set ECHO none
+ RESET client_min_messages;
+ -- sprintf test
+ select sprintf('>>>%10s %10d<<<', 'hello', 10);
+            sprintf           
+ -----------------------------
+  >>>     hello         10<<<
+ (1 row)
+ 
+ select sprintf('>>>%-10s<<<', 'hello');
+      sprintf      
+ ------------------
+  >>>hello     <<<
+ (1 row)
+ 
+ select sprintf('>>>%5.2<<<', 'abcde');
+ ERROR:  unsupported sprintf format tag '<'
+ select sprintf('>>>%*s<<<', 10, 'abcdef');
+      sprintf      
+ ------------------
+  >>>    abcdef<<<
+ (1 row)
+ 
+ select sprintf('>>>%*s<<<', 10); -- error
+ ERROR:  too few parameters specified for printf function
+ select sprintf('%010d', 10);
+   sprintf   
+ ------------
+  0000000010
+ (1 row)
+ 
+ select sprintf('%.6d', 10);
+  sprintf 
+ ---------
+  000010
+ (1 row)
+ 
+ select sprintf('%d', 100.0/3.0);
+  sprintf 
+ ---------
+  33
+ (1 row)
+ 
+ select sprintf('%e', 100.0/3.0);
+    sprintf    
+ --------------
+  3.333333e+01
+ (1 row)
+ 
+ select sprintf('%f', 100.0/3.0);
+   sprintf  
+ -----------
+  33.333333
+ (1 row)
+ 
+ select sprintf('%g', 100.0/3.0);
+  sprintf 
+ ---------
+  33.3333
+ (1 row)
+ 
+ select sprintf('%7.4e', 100.0/3.0);
+   sprintf   
+ ------------
+  3.3333e+01
+ (1 row)
+ 
+ select sprintf('%7.4f', 100.0/3.0);
+  sprintf 
+ ---------
+  33.3333
+ (1 row)
+ 
+ select sprintf('%7.4g', 100.0/3.0);
+  sprintf 
+ ---------
+    33.33
+ (1 row)
+ 
+ select sprintf('%d', NULL);
+  sprintf 
+ ---------
+  <NULL>
+ (1 row)
+ 
+ select substitute('second parameter is $2 and first parameter is $1', 'FIRST', 'SECOND');
+                        substitute                        
+ ---------------------------------------------------------
+  second parameter is SECOND and first parameter is FIRST
+ (1 row)
+ 
+ -- should fail
+ select substitute('third parameter is $3 and first parameter is $1', 'FIRST', 'SECOND');
+ ERROR:  positional placeholder "$3" is not valid
+ select substitute(' NULL parameter is $1', NULL);
+         substitute         
+ ---------------------------
+   NULL parameter is <NULL>
+ (1 row)
+ 
*** ./contrib/stringfunc/Makefile.orig	2010-09-29 08:11:59.217397393 +0200
--- ./contrib/stringfunc/Makefile	2010-09-29 08:10:48.364270479 +0200
***************
*** 0 ****
--- 1,20 ----
+ # $PostgreSQL: pgsql/contrib/stringfunc/Makefile,v 1.23 2009/08/28 20:26:18 petere Exp $
+ 
+ MODULE_big = stringfunc
+ OBJS= stringfunc.o
+ 
+ DATA_built = stringfunc.sql
+ DATA = uninstall_stringfunc.sql
+ REGRESS = stringfunc
+ 
+ ifdef USE_PGXS
+ PG_CONFIG = pg_config
+ PGXS := $(shell $(PG_CONFIG) --pgxs)
+ include $(PGXS)
+ else
+ subdir = contrib/stringfunc
+ top_builddir = ../..
+ include $(top_builddir)/src/Makefile.global
+ include $(top_srcdir)/contrib/contrib-global.mk
+ endif
+ 
*** ./contrib/stringfunc/sql/stringfunc.sql.orig	2010-09-29 08:11:52.441397426 +0200
--- ./contrib/stringfunc/sql/stringfunc.sql	2010-09-29 08:10:48.364270479 +0200
***************
*** 0 ****
--- 1,28 ----
+ SET client_min_messages = warning;
+ \set ECHO none
+ \i stringfunc.sql
+ \set ECHO all
+ RESET client_min_messages;
+ 
+ -- sprintf test
+ select sprintf('>>>%10s %10d<<<', 'hello', 10);
+ select sprintf('>>>%-10s<<<', 'hello');
+ select sprintf('>>>%5.2<<<', 'abcde');
+ select sprintf('>>>%*s<<<', 10, 'abcdef');
+ select sprintf('>>>%*s<<<', 10); -- error
+ select sprintf('%010d', 10);
+ select sprintf('%.6d', 10);
+ 
+ select sprintf('%d', 100.0/3.0);
+ select sprintf('%e', 100.0/3.0);
+ select sprintf('%f', 100.0/3.0);
+ select sprintf('%g', 100.0/3.0);
+ select sprintf('%7.4e', 100.0/3.0);
+ select sprintf('%7.4f', 100.0/3.0);
+ select sprintf('%7.4g', 100.0/3.0);
+ select sprintf('%d', NULL);
+ 
+ select substitute('second parameter is $2 and first parameter is $1', 'FIRST', 'SECOND');
+ -- should fail
+ select substitute('third parameter is $3 and first parameter is $1', 'FIRST', 'SECOND');
+ select substitute(' NULL parameter is $1', NULL);
\ No newline at end of file
*** ./contrib/stringfunc/stringfunc.c.orig	2010-09-29 08:12:07.209397369 +0200
--- ./contrib/stringfunc/stringfunc.c	2010-09-29 09:42:33.306272084 +0200
***************
*** 0 ****
--- 1,666 ----
+ #include "postgres.h"
+ #include "string.h"
+ 
+ #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
+ #include "mb/pg_wchar.h"
+ #include "parser/parse_coerce.h"
+ #include "utils/array.h"
+ #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
+ 
+ PG_MODULE_MAGIC;
+ 
+ #define CHECK_PAD(symbol, pad_value)	\
+ do { \
+ 	if (pdesc->flags & pad_value)		\
+ 		ereport(ERROR,  	\
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
+ 				 errmsg("broken sprintf format"),          \
+ 				 errdetail("Format string is '%s'.", TextDatumGetCString(fmt)), \
+ 				 errhint("Symbol '%c' can be used only one time.", symbol))); \
+ 	pdesc->flags |= pad_value; \
+ } while(0);
+ 
+ /*
+  * string functions
+  */
+ Datum	stringfunc_sprintf(PG_FUNCTION_ARGS);
+ Datum	stringfunc_sprintf_nv(PG_FUNCTION_ARGS);
+ Datum	stringfunc_substitute(PG_FUNCTION_ARGS);
+ Datum	stringfunc_substitute_nv(PG_FUNCTION_ARGS);
+ 
+ /*
+  * V1 registrations
+  */
+ PG_FUNCTION_INFO_V1(stringfunc_sprintf);
+ PG_FUNCTION_INFO_V1(stringfunc_sprintf_nv);
+ PG_FUNCTION_INFO_V1(stringfunc_substitute);
+ PG_FUNCTION_INFO_V1(stringfunc_substitute_nv);
+ 
+ typedef enum {
+     stringfunc_ZERO       =   1,
+     stringfunc_SPACE      =   2,
+     stringfunc_PLUS       =   4,
+     stringfunc_MINUS      =   8,
+     stringfunc_STAR_WIDTH =  16,
+     stringfunc_SHARP      =  32,
+     stringfunc_WIDTH      =  64,
+     stringfunc_PRECISION  = 128,
+     stringfunc_STAR_PRECISION = 256
+ } PlaceholderTags;
+ 
+ typedef struct {
+ 	int	flags;
+ 	char		field_type;
+ 	char		lenmod;
+ 	int32		width;
+ 	int32		precision;
+ } FormatPlaceholderData;
+ 
+ typedef FormatPlaceholderData *PlaceholderDesc;
+ 
+ static Datum 
+ castValueTo(Datum value, Oid targetTypeId, Oid inputTypeId)
+ {
+ 	Oid		funcId;
+ 	CoercionPathType	pathtype;
+ 	FmgrInfo	finfo;
+ 	Datum	   result;
+ 
+ 	if (inputTypeId != UNKNOWNOID)
+ 		pathtype = find_coercion_pathway(targetTypeId, inputTypeId, 
+ 									COERCION_EXPLICIT, 
+ 									&funcId);
+ 	else
+ 		pathtype = COERCION_PATH_COERCEVIAIO;
+ 
+ 	switch (pathtype)
+ 	{
+ 		case COERCION_PATH_RELABELTYPE:
+ 			result = value;
+ 			break;
+ 		case COERCION_PATH_FUNC:
+ 			{
+ 				Assert(OidIsValid(funcId));
+ 				
+ 				fmgr_info(funcId, &finfo);
+ 				result = FunctionCall1(&finfo, value);
+ 			}
+ 			break;
+ 
+ 		case COERCION_PATH_COERCEVIAIO:
+ 			{
+ 				Oid                     typoutput;
+ 				Oid			typinput;
+ 				bool            typIsVarlena;
+ 				Oid		typIOParam;
+ 				char 	*extval;
+ 
+ 				getTypeOutputInfo(inputTypeId, &typoutput, &typIsVarlena);
+ 				extval = OidOutputFunctionCall(typoutput, value);
+ 				
+ 				getTypeInputInfo(targetTypeId, &typinput, &typIOParam);
+ 				result = OidInputFunctionCall(typinput, extval, typIOParam, -1);
+ 			}
+ 			break;
+ 
+ 		default:
+ 			elog(ERROR, "failed to find conversion function from %s to %s",
+ 					format_type_be(inputTypeId), format_type_be(targetTypeId));
+ 			/* be compiler quiet */
+ 			result = (Datum) 0;
+ 	}
+ 
+ 	return result;
+ }
+ 
+ /*
+  * parse and verify sprintf parameter 
+  *
+  *      %[flags][width][.precision]specifier
+  *
+  */
+ static char *
+ parsePlaceholder(char *src, char *end_ptr, PlaceholderDesc pdesc, text *fmt)
+ {
+ 	char		c;
+ 
+ 	pdesc->field_type = '\0';
+ 	pdesc->lenmod = '\0';
+ 	pdesc->flags = 0;
+ 	pdesc->width = 0;
+ 	pdesc->precision = 0;
+ 
+ 	while (src < end_ptr && pdesc->field_type == '\0')
+ 	{
+ 		c = *++src;
+ 
+ 		switch (c)
+ 		{
+ 			case '0':
+ 				CHECK_PAD('0', stringfunc_ZERO);
+ 				break;
+ 			case ' ':
+ 				CHECK_PAD(' ', stringfunc_SPACE);
+ 				break;
+ 			case '+':
+ 				CHECK_PAD('+', stringfunc_PLUS);
+ 				break;
+ 			case '-':
+ 				CHECK_PAD('-', stringfunc_MINUS);
+ 				break;
+ 			case '*':
+ 				CHECK_PAD('*', stringfunc_STAR_WIDTH);
+ 				break;
+ 			case '#':
+ 				CHECK_PAD('#', stringfunc_SHARP);
+ 				break;
+ 			case 'o': case 'i': case 'e': case 'E': case 'f': 
+ 			case 'g': case 'd': case 's': case 'x': case 'X': 
+ 				pdesc->field_type = *src;
+ 				break;
+ 			case '1': case '2': case '3': case '4':
+ 			case '5': case '6': case '7': case '8': case '9':
+ 				CHECK_PAD('9', stringfunc_WIDTH);
+ 				pdesc->width = c - '0';
+ 				while (src < end_ptr && isdigit(src[1]))
+ 					pdesc->width = pdesc->width * 10 + *++src - '0';
+ 				break;
+ 			case '.':
+ 				if (src < end_ptr)
+ 				{
+ 					if (src[1] == '*')
+ 					{
+ 						CHECK_PAD('.', stringfunc_STAR_PRECISION);
+ 						src++;
+ 					}
+ 					else
+ 					{
+ 						/*
+ 						 * when no one digit is entered, then precision
+ 						 * is zero - digits are optional.
+ 						 */
+ 						CHECK_PAD('.', stringfunc_PRECISION);
+ 						while (src < end_ptr && isdigit(src[1]))
+ 						{
+ 							pdesc->precision = pdesc->precision * 10 + *++src - '0';
+ 						}
+ 					}
+ 				}
+ 				else 
+ 					ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("broken sprintf format"),
+ 							 errdetail("missing precision value")));
+ 				break;
+ 
+ 			default:
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("unsupported sprintf format tag '%.*s'", pg_mblen(src), src)));
+ 		}
+ 	}
+ 
+ 	if (pdesc->field_type == '\0')
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format")));
+ 
+ 	return src;
+ }
+ 
+ static char *
+ currentFormat(StringInfo str, PlaceholderDesc pdesc)
+ {
+ 	resetStringInfo(str);
+ 	appendStringInfoChar(str,'%');
+ 
+ 	if (pdesc->flags & stringfunc_ZERO)
+ 		appendStringInfoChar(str, '0');
+ 
+ 	if (pdesc->flags & stringfunc_MINUS)
+ 		appendStringInfoChar(str, '-');
+ 
+ 	if (pdesc->flags & stringfunc_PLUS)
+ 		appendStringInfoChar(str, '+');
+ 
+ 	if (pdesc->flags & stringfunc_SPACE)
+ 		appendStringInfoChar(str, ' ');
+ 
+ 	if (pdesc->flags & stringfunc_SHARP)
+ 		appendStringInfoChar(str, '#');
+ 
+ 	if ((pdesc->flags & stringfunc_WIDTH) || (pdesc->flags & stringfunc_STAR_WIDTH))
+ 		appendStringInfoChar(str, '*');
+ 
+ 	if ((pdesc->flags & stringfunc_PRECISION) || (pdesc->flags & stringfunc_STAR_PRECISION))
+ 		appendStringInfoString(str, ".*");
+ 
+ 	/* Append l or ll. Decision is based on value of INT64_FORMAT */
+ 	if (pdesc->lenmod == 'l')
+ 	{
+ 		if (strcmp(INT64_FORMAT, "%lld") == 0)
+ 			appendStringInfoString(str, "ll");
+ 		else
+ 			appendStringInfoString(str, "l");
+ 	}
+ 	else if (pdesc->lenmod != '\0')
+ 		appendStringInfoChar(str, pdesc->lenmod);
+ 
+ 	appendStringInfoChar(str, pdesc->field_type);
+ 
+ 	return str->data;
+ }
+ 
+ /*
+  * simulate %+width.precion%s format of sprintf function 
+  */
+ static void 
+ append_string(StringInfo str,  PlaceholderDesc pdesc, char *string)
+ {
+ 	int	nchars = 0;				/* length of substring in chars */
+ 	int	binlen = 0;				/* length of substring in bytes */
+ 
+ 	/*
+ 	 * apply precision - it means "show only first n chars", for strings - this flag is 
+ 	 * ignored for proprietary tags %lq and iq, because we can't to show a first n chars 
+ 	 * from possible quoted value. 
+ 	 */
+ 	if (pdesc->flags & stringfunc_PRECISION && pdesc->field_type != 'q')
+ 	{
+ 		char *ptr = string;
+ 		int	  len = pdesc->precision;
+ 
+ 		if (pg_database_encoding_max_length() > 1)
+ 		{
+ 			while (*ptr && len > 0)
+ 			{
+ 				ptr += pg_mblen(ptr);
+ 				len--;
+ 				nchars++;
+ 			}
+ 		}
+ 		else
+ 		{
+ 			while (*ptr && len > 0)
+ 			{
+ 				ptr++;
+ 				len--;
+ 				nchars++;
+ 			}
+ 		}
+ 
+ 		binlen = ptr - string;
+ 	}
+ 	else
+ 	{
+ 		/* there isn't precion specified, show complete string */
+ 		nchars = pg_mbstrlen(string);
+ 		binlen = strlen(string);
+ 	}
+ 
+ 	/* when width is specified, then we have to solve left or right align */
+ 	if (pdesc->flags & stringfunc_WIDTH)
+ 	{
+ 		if (pdesc->width > nchars)
+ 		{
+ 			/* add neccessary spaces to begin or end */
+ 			if (pdesc->flags & stringfunc_MINUS)
+ 			{
+ 				/* allign to left */
+ 				appendBinaryStringInfo(str, string, binlen);
+ 				appendStringInfoSpaces(str, pdesc->width - nchars);
+ 			}
+ 			else
+ 			{
+ 				/* allign to right */
+ 				appendStringInfoSpaces(str, pdesc->width - nchars);
+ 				appendBinaryStringInfo(str, string, binlen);
+ 			}
+ 
+ 		}
+ 		else
+ 			/* just copy result to output */
+ 			appendBinaryStringInfo(str, string, binlen);
+ 	}
+ 	else
+ 		/* just copy result to output */
+ 		appendBinaryStringInfo(str, string, binlen);
+ }
+ 
+ /*
+  * Set width and precision when they are defined dynamicaly
+  */
+ static 
+ int setWidthAndPrecision(PlaceholderDesc pdesc, FunctionCallInfoData *fcinfo, int current)
+ {
+ 
+ 	/* 
+ 	 * don't allow ambiguous definition
+ 	 */
+ 	if ((pdesc->flags & stringfunc_WIDTH) && (pdesc->flags & stringfunc_STAR_WIDTH))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format"),
+ 				 errdetail("ambiguous width definition")));
+ 
+ 	if ((pdesc->flags & stringfunc_PRECISION) && (pdesc->flags & stringfunc_STAR_PRECISION))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("broken sprintf format"),
+ 				 errdetail("ambiguous precision definition")));
+ 	if (pdesc->flags & stringfunc_STAR_WIDTH)
+ 	{
+ 		if (current >= PG_NARGS())
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("too few parameters specified for printf function")));
+ 
+ 		if (PG_ARGISNULL(current))
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 				 errmsg("null value not allowed"),
+ 				 errhint("width (%dth) arguments is NULL", current)));
+ 
+ 		pdesc->width = DatumGetInt32(castValueTo(PG_GETARG_DATUM(current), INT4OID, 
+ 									get_fn_expr_argtype(fcinfo->flinfo, current)));
+ 		/* reset flag */
+ 		pdesc->flags ^= stringfunc_STAR_WIDTH;
+ 		pdesc->flags |= stringfunc_WIDTH;
+ 		current += 1;
+ 	}
+ 
+ 	if (pdesc->flags & stringfunc_STAR_PRECISION)
+ 	{
+ 		if (current >= PG_NARGS())
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("too few parameters specified for printf function")));
+ 
+ 		if (PG_ARGISNULL(current))
+ 			ereport(ERROR,
+ 				(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 				 errmsg("null value not allowed"),
+ 				 errhint("width (%dth) arguments is NULL", current)));
+ 
+ 		pdesc->precision = DatumGetInt32(castValueTo(PG_GETARG_DATUM(current), INT4OID, 
+ 									get_fn_expr_argtype(fcinfo->flinfo, current)));
+ 		/* reset flags */
+ 		pdesc->flags ^= stringfunc_STAR_PRECISION;
+ 		pdesc->flags |= stringfunc_PRECISION;
+ 		current += 1;
+ 	}
+ 
+ 	return current;
+ }
+ 
+ /*
+  * sprintf function - it is wrapper for libc vprintf function
+  *
+  *    ensure PostgreSQL -> C casting
+  */
+ Datum
+ stringfunc_sprintf(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData   str;
+ 	StringInfoData   format_str;
+ 	char		*cp;
+ 	int			i = 1;
+ 	size_t		len;
+ 	char		*start_ptr,
+ 				*end_ptr;
+ 	FormatPlaceholderData		pdesc;
+ 	text *result;
+ 
+ 	Oid                     typoutput;
+ 	bool            typIsVarlena;
+ 	Datum 	value;
+ 	Oid valtype;
+ 
+ 	/* When format string is null, returns null */
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 
+ 	initStringInfo(&str);
+ 	initStringInfo(&format_str);
+ 
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		if (cp[0] == '%')
+ 		{
+ 			/* when cp is not pointer on last char, check %% */
+ 			if (cp < end_ptr && cp[1] == '%')
+ 			{
+ 				appendStringInfoChar(&str, cp[1]);
+ 				cp++;
+ 				continue;
+ 			}
+ 
+ 			cp = parsePlaceholder(cp, end_ptr, &pdesc, fmt);
+ 			i = setWidthAndPrecision(&pdesc, fcinfo, i);
+ 
+ 			if (i >= PG_NARGS())
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("too few parameters specified for printf function")));
+ 
+ 			if (!PG_ARGISNULL(i))
+ 			{
+ 				/* append n-th value */
+ 				value = PG_GETARG_DATUM(i);
+ 				valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
+ 
+ 				/* convert value to target type */
+ 				switch (pdesc.field_type)
+ 				{
+ 					case 'o': case 'd': case 'i': case 'x': case 'X':
+ 						{
+ 							int64	target_value;
+ 							const char 		*format;
+ 
+ 							pdesc.lenmod = 'l';
+ 							target_value = DatumGetInt64(castValueTo(value, INT8OID, valtype));
+ 							format = currentFormat(&format_str, &pdesc);
+ 
+ 							if ((pdesc.flags & stringfunc_WIDTH) && (pdesc.flags & stringfunc_PRECISION))
+ 								appendStringInfo(&str, format, pdesc.width, pdesc.precision, target_value);
+ 							else if (pdesc.flags & stringfunc_WIDTH)
+ 								appendStringInfo(&str, format, pdesc.width, target_value);
+ 							else if (pdesc.flags & stringfunc_PRECISION)
+ 								appendStringInfo(&str, format, pdesc.precision, target_value);
+ 							else
+ 								appendStringInfo(&str, format, target_value);
+ 						}
+ 						break;
+ 					case 'e': case 'f': case 'g': case 'G': case 'E':
+ 						{
+ 							float8	target_value;
+ 							const char 		*format;
+ 
+ 							target_value = DatumGetFloat8(castValueTo(value, FLOAT8OID, valtype));
+ 							format = currentFormat(&format_str, &pdesc);
+ 
+ 							if ((pdesc.flags & stringfunc_WIDTH) && (pdesc.flags & stringfunc_PRECISION))
+ 								appendStringInfo(&str, format, pdesc.width, pdesc.precision, target_value);
+ 							else if (pdesc.flags & stringfunc_WIDTH)
+ 								appendStringInfo(&str, format, pdesc.width, target_value);
+ 							else if (pdesc.flags & stringfunc_PRECISION)
+ 								appendStringInfo(&str, format, pdesc.precision, target_value);
+ 							else
+ 								appendStringInfo(&str, format, target_value);
+ 						}
+ 						break;
+ 					case 's':
+ 						{
+ 							char		*target_value;
+ 
+ 							getTypeOutputInfo(valtype, &typoutput, &typIsVarlena);
+ 							target_value = OidOutputFunctionCall(typoutput, value);
+ 							
+ 							append_string(&str, &pdesc, target_value);
+ 							pfree(target_value);
+ 						}
+ 						break;
+ 					default:
+ 						/* don't be happen - formats are checked in parsing stage */
+ 						elog(ERROR, "unknown format: %c", pdesc.field_type);
+ 				}
+ 			}
+ 			else
+ 				/* append a NULL string */
+ 				append_string(&str, &pdesc, "<NULL>");
+ 			i++;
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	/* check if all arguments are used */
+ 	if (i != PG_NARGS())
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("too many parameters for printf function")));
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 
+ 	pfree(str.data);
+ 	pfree(format_str.data);
+ 
+ 	PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * only wrapper
+  */
+ Datum
+ stringfunc_sprintf_nv(PG_FUNCTION_ARGS)
+ {
+ 	return stringfunc_sprintf(fcinfo);
+ }
+ 
+ /*
+  * Substitute a positional parameters by value
+  */
+ Datum
+ stringfunc_substitute(PG_FUNCTION_ARGS)
+ {
+ 	text	   *fmt;
+ 	StringInfoData		str;
+ 	char		*cp;
+ 	size_t		len;
+ 	char		*start_ptr;
+ 	char		*end_ptr;
+ 	text	*result;
+ 	ArrayType *array;
+ 	Oid			elmtype;
+ 	int16		elmlen;
+ 	bool		elmbyval;
+ 	char		elmalign;
+ 	int			num_elems = 0;
+ 	Datum	   *elem_values;
+ 	bool	   *elem_nulls;
+ 
+ 	fmt = PG_GETARG_TEXT_PP(0);
+ 	len = VARSIZE_ANY_EXHDR(fmt);
+ 	start_ptr = VARDATA_ANY(fmt);
+ 	end_ptr = start_ptr + len - 1;
+ 
+ 	if (PG_NARGS() == 2)
+ 	{
+ 		array = PG_GETARG_ARRAYTYPE_P(1);
+ 		elmtype = ARR_ELEMTYPE(array);
+ 		get_typlenbyvalalign(elmtype, &elmlen, &elmbyval, &elmalign);
+ 
+ 		deconstruct_array(array, elmtype,
+ 						  elmlen, elmbyval, elmalign,
+ 						  &elem_values, &elem_nulls,
+ 						  &num_elems);
+ 	}
+ 
+ 	initStringInfo(&str);
+ 	for (cp = start_ptr; cp <= end_ptr; cp++)
+ 	{
+ 		/*
+ 		 * there are allowed escape char - '\'
+ 		 */
+ 		if (cp[0] == '\\')
+ 		{
+ 			/* check next char */
+ 			if (cp < end_ptr)
+ 			{
+ 				switch (cp[1])
+ 				{
+ 					case '\\':
+ 					case '$':
+ 						appendStringInfoChar(&str, cp[1]);
+ 						break;
+ 
+ 					default:
+ 						/* 
+ 						 * because unsupported symbols should be a multibyte chars,
+ 						 * we cannot to use a %c formating. We have take a complete
+ 						 * multibyte char length and show it via %.*s.
+ 						 */
+ 						ereport(ERROR,
+ 							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							 errmsg("unsuported escape sequence \"\\%.*s\"", pg_mblen(&cp[1]), &cp[1])));
+ 				}
+ 				cp++;
+ 			}
+ 			else
+ 				elog(ERROR, "broken escape sequence");
+ 		}
+ 		else if (cp[0] == '$')
+ 		{
+ 			long pos;
+ 			char *endptr;
+ 
+ 			/* initial check */
+ 			if (cp == end_ptr)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("missing a parameter position specification")));
+ 
+ 			if (!isdigit(cp[1]))
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("expected a numeric value")));
+ 
+ 			pos = strtol(&cp[1], &endptr, 10);
+ 			cp = endptr - 1;
+ 
+ 			if (pos < 1 || pos > num_elems)
+ 				ereport(ERROR,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("positional placeholder \"$%ld\" is not valid", pos)));
+ 
+ 			if (!elem_nulls[pos - 1])
+ 				appendStringInfoString(&str,  text_to_cstring(DatumGetTextP(elem_values[pos - 1])));
+ 			else
+ 				appendStringInfoString(&str, "<NULL>");
+ 		}
+ 		else
+ 			appendStringInfoChar(&str, cp[0]);
+ 	}
+ 
+ 	result = cstring_to_text_with_len(str.data, str.len);
+ 	pfree(str.data);
+ 
+ 	PG_RETURN_TEXT_P(result);
+ }
+ 
+ /*
+  * Non variadic text_substitute function - only wrapper
+  *   Print and check format string
+  */
+ Datum
+ stringfunc_substitute_nv(PG_FUNCTION_ARGS)
+ {
+ 	return stringfunc_substitute(fcinfo);
+ }
*** ./contrib/stringfunc/stringfunc.sql.in.orig	2010-09-29 08:12:51.329397362 +0200
--- ./contrib/stringfunc/stringfunc.sql.in	2010-09-29 08:10:48.366270806 +0200
***************
*** 0 ****
--- 1,25 ----
+ /* $PostgreSQL: pgsql/contrib/stringfunc/stringfunc.sql.in,v 1.25 2009/06/11 18:30:03 tgl Exp $ */
+ 
+ -- Adjust this setting to control where the objects get created.
+ SET search_path = public;
+ 
+ CREATE OR REPLACE FUNCTION sprintf(fmt text, VARIADIC args "any")
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_sprintf'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION sprintf(fmt text)
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_sprintf_nv'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION substitute(fmt text, VARIADIC args text[])
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_substitute'
+ LANGUAGE C STABLE;
+ 
+ CREATE OR REPLACE FUNCTION substitute(fmt text)
+ RETURNS text 
+ AS '$libdir/stringfunc','stringfunc_substitute_nv'
+ LANGUAGE C STABLE;
+ 
*** ./contrib/stringfunc/uninstall_stringfunc.sql.orig	2010-09-29 08:12:56.353273675 +0200
--- ./contrib/stringfunc/uninstall_stringfunc.sql	2010-09-29 08:10:48.366270806 +0200
***************
*** 0 ****
--- 1,10 ----
+ /* $PostgreSQL: pgsql/contrib/stringfunc/uninstall_stringfunc.sql,v 1.8 2008/04/14 17:05:32 tgl Exp $ */
+ 
+ -- Adjust this setting to control where the objects get dropped.
+ SET search_path = public;
+ 
+ DROP FUNCTION sprintf(fmt text, VARIADIC args "any");
+ DROP FUNCTION sprintf(fmt text);
+ DROP FUNCTION substitute(fmt text, VARIADIC args text[]);
+ DROP FUNCTION substitute(fmt text);
+ 
*** ./doc/src/sgml/contrib.sgml.orig	2010-09-29 08:10:11.529398663 +0200
--- ./doc/src/sgml/contrib.sgml	2010-09-29 08:10:48.367270236 +0200
***************
*** 115,120 ****
--- 115,121 ----
   &seg;
   &contrib-spi;
   &sslinfo;
+  &stringfunc;
   &tablefunc;
   &test-parser;
   &tsearch2;
*** ./doc/src/sgml/filelist.sgml.orig	2010-09-29 08:10:11.530395997 +0200
--- ./doc/src/sgml/filelist.sgml	2010-09-29 08:10:48.367270236 +0200
***************
*** 127,132 ****
--- 127,133 ----
  <!entity seg             SYSTEM "seg.sgml">
  <!entity contrib-spi     SYSTEM "contrib-spi.sgml">
  <!entity sslinfo         SYSTEM "sslinfo.sgml">
+ <!entity stringfunc      SYSTEM "stringfunc.sgml">
  <!entity tablefunc       SYSTEM "tablefunc.sgml">
  <!entity test-parser     SYSTEM "test-parser.sgml">
  <!entity tsearch2        SYSTEM "tsearch2.sgml">
*** ./doc/src/sgml/stringfunc.sgml.orig	2010-09-29 08:42:33.899272275 +0200
--- ./doc/src/sgml/stringfunc.sgml	2010-09-29 09:04:00.493270789 +0200
***************
*** 0 ****
--- 1,67 ----
+ <!-- $PostgreSQL: pgsql/doc/src/sgml/stringfunc.sgml,v 1.2 2008/09/12 18:29:49 tgl Exp $ -->
+ 
+ <sect1 id="stringfunc">
+  <title>stringfunc</title>
+ 
+  <indexterm zone="stringfunc">
+   <primary>stringfunc</primary>
+  </indexterm>
+ 
+  <para>
+   The <filename>stringfunc</> module provides a additional function
+   for operation over strings. These functions can be used as patter
+   for developing a variadic functions.
+  </para>
+ 
+  <sect2>
+   <title>How to Use It</title>
+ 
+   <para>
+    Here's a simple example of usage:
+ 
+   <programlisting>
+    SELECT sprintf('formated number: %10d',10);
+    SELECT substitute('file '$1' doesn''t exists', '/var/log/applog');
+   </programlisting>
+   </para>
+  </sect2>
+ 
+  <sect2>
+   <title><filename>stringfunc</> Functions and Operators</title>
+ 
+   <table id="stringfunc-func-table">
+    <title><filename>stringfunc</> Functions</title>
+ 
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+       <entry>Example</entry>
+       <entry>Result</entry>
+      </row>
+     </thead>
+ 
+     <tbody>
+      <row>
+       <entry><function>sprintf(formatstr [, params])</function></entry>
+       <entry><type>text</type></entry>
+       <entry>simplyfied version of libc sprintf function - it doesn't support 
+       positional parameters and it will do necessary conversions automaticaly.</entry>
+       <entry><literal>sprintf('Hello %10s','World')</literal></entry>
+       <entry><literal>Hello      World</literal></entry>
+      </row>
+ 
+      <row>
+       <entry><function>substitute(formatstr [, params])</function></entry>
+       <entry><type>text</type></entry>
+       <entry>replace a positional placeholers like $n by params</entry>
+       <entry><literal>substitute('$1,$2,$2,$1','A','B');</literal></entry>
+       <entry><literal>A,B,B,A</literal></entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+  </sect2>
+ </sect1>
#25Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#24)
Re: string function - "format" function proposal

On Wed, Sep 29, 2010 at 3:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

[ updated patch, in response to a review from Itagaki Takahiro ]

This patch appears to be waiting for a second round of review.
Itagaki-san, are you planning to do that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Robert Haas (#25)
Re: string function - "format" function proposal

On Thu, Oct 14, 2010 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 3:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

[ updated patch, in response to a review from Itagaki Takahiro ]

This patch appears to be waiting for a second round of review.
Itagaki-san, are you planning to do that?

I can, but I was waiting for other people's comments about the design:
- format() in core, that implements %s, %i, and %l.
- substitute() for $n format and sprintf() that partially implements
the same function in C in contrib/stringfunc.

I don't like having three similar functions for the same purpose,
but Pavel said they are the best solutions. What will be our consensus?

--
Itagaki Takahiro

#27Alvaro Herrera
alvherre@commandprompt.com
In reply to: Itagaki Takahiro (#26)
Re: string function - "format" function proposal

Excerpts from Itagaki Takahiro's message of mié oct 13 23:03:16 -0300 2010:

On Thu, Oct 14, 2010 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 3:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

[ updated patch, in response to a review from Itagaki Takahiro ]

This patch appears to be waiting for a second round of review.
Itagaki-san, are you planning to do that?

I can, but I was waiting for other people's comments about the design:
- format() in core, that implements %s, %i, and %l.
- substitute() for $n format and sprintf() that partially implements
the same function in C in contrib/stringfunc.

I don't like having three similar functions for the same purpose,
but Pavel said they are the best solutions. What will be our consensus?

I don't have much love for moving the position stuff ($n) out of the main
function either. I've been meaning to take a look at how hard it would
be to integrate that into format() in core -- no luck :-(

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#28Robert Haas
robertmhaas@gmail.com
In reply to: Itagaki Takahiro (#26)
Re: string function - "format" function proposal

On Wed, Oct 13, 2010 at 10:03 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Thu, Oct 14, 2010 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 3:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

[ updated patch, in response to a review from Itagaki Takahiro ]

This patch appears to be waiting for a second round of review.
Itagaki-san, are you planning to do that?

I can, but I was waiting for other people's comments about the design:
 - format() in core, that implements %s, %i, and %l.
 - substitute() for $n format and sprintf() that partially implements
   the same function in C in contrib/stringfunc.

I don't like having three similar functions for the same purpose,
but Pavel said they are the best solutions. What will be our consensus?

<rereads thread>

I agree with you. I think we should pick one implementation and just
go with it. There's nothing to say that Pavel can't distribute his
own code however he likes, but I don't think there's any compelling
reason for us to carry all that code in the main tree, even in
/contrib. Let's make format support %s, %i, and %l, as well as
allowing things like %$3l (meaning, escape the third argument as a
literal and interpolate here), and call it good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#29Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#28)
Re: string function - "format" function proposal

2010/10/14 Robert Haas <robertmhaas@gmail.com>:

On Wed, Oct 13, 2010 at 10:03 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Thu, Oct 14, 2010 at 10:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 29, 2010 at 3:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

[ updated patch, in response to a review from Itagaki Takahiro ]

This patch appears to be waiting for a second round of review.
Itagaki-san, are you planning to do that?

I can, but I was waiting for other people's comments about the design:
 - format() in core, that implements %s, %i, and %l.
 - substitute() for $n format and sprintf() that partially implements
   the same function in C in contrib/stringfunc.

I don't like having three similar functions for the same purpose,
but Pavel said they are the best solutions. What will be our consensus?

<rereads thread>

I agree with you.  I think we should pick one implementation and just
go with it.  There's nothing to say that Pavel can't distribute his
own code however he likes, but I don't think there's any compelling
reason for us to carry all that code in the main tree, even in
/contrib.  Let's make format support %s, %i, and %l, as well as
allowing things like %$3l (meaning, escape the third argument as a
literal and interpolate here), and call it good.

my objection to put all to one functions was a format complexity and
little bit less readability - like %$3. But I am not strong it this.

Regards

Pavel Stehule

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#30Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Robert Haas (#28)
Re: string function - "format" function proposal

Let's make format support %s, %i, and %l, as well as
allowing things like %$3l (meaning, escape the third argument as a
literal and interpolate here), and call it good.

Your idea is:
% [ $ pos ] format -- ex. %$3l , %l
Escapes: %% => %

Just for information, $ and pos are reversed in C sprintf.
% [ pos $ ] format -- ex. %3$l , %l
Escapes: %% => %

IMHO, I like {} syntax as like as C# because the format strings are extensible.
{ pos [ : format ] } -- ex {3:l}, {3} (, and {l} could be also supported)
Escapes: {{ => {, }} => }

--
Itagaki Takahiro

#31Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#30)
Re: string function - "format" function proposal

Hello

2010/10/14 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

Let's make format support %s, %i, and %l, as well as
allowing things like %$3l (meaning, escape the third argument as a
literal and interpolate here), and call it good.

Your idea is:
 % [ $ pos ] format  -- ex. %$3l , %l
 Escapes: %% => %

Just for information, $ and pos are reversed in C sprintf.
 % [ pos $ ] format  -- ex. %3$l , %l
 Escapes: %% => %

ook - +1 for %3$l

IMHO, I like {} syntax as like as C# because the format strings are extensible.
 { pos [ : format ] } -- ex {3:l}, {3} (, and {l} could be also supported)
 Escapes: {{ => {, }} => }

I dislike it. The target usage for this function is plpgsql code. I
prefer a simply design - second sprintf is useles. More - {} can be
used in messages relative often and with your proposal, you have to
intensivelly use a escaping.

Regards

Pavel

Show quoted text

--
Itagaki Takahiro

#32Robert Haas
robertmhaas@gmail.com
In reply to: Itagaki Takahiro (#30)
Re: string function - "format" function proposal

On Thu, Oct 14, 2010 at 2:25 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

Let's make format support %s, %i, and %l, as well as
allowing things like %$3l (meaning, escape the third argument as a
literal and interpolate here), and call it good.

Your idea is:
 % [ $ pos ] format  -- ex. %$3l , %l
 Escapes: %% => %

Just for information, $ and pos are reversed in C sprintf.
 % [ pos $ ] format  -- ex. %3$l , %l
 Escapes: %% => %

Oh, woops. I intended to copy the way C works.

IMHO, I like {} syntax as like as C# because the format strings are extensible.
 { pos [ : format ] } -- ex {3:l}, {3} (, and {l} could be also supported)
 Escapes: {{ => {, }} => }

My personal preference (and Pavel's, I guess) is to use the C-like
syntax. But I wouldn't be upset if consensus congealed around some
other option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#32)
Re: string function - "format" function proposal

On 10/14/2010 08:25 AM, Robert Haas wrote:

IMHO, I like {} syntax as like as C# because the format strings are extensible.
{ pos [ : format ] } -- ex {3:l}, {3} (, and {l} could be also supported)
Escapes: {{ => {, }} => }

My personal preference (and Pavel's, I guess) is to use the C-like
syntax. But I wouldn't be upset if consensus congealed around some
other option.

They're both somewhat arcane. But I think the C syntax is likely to be
more familiar to a wider group of users (including, for example, perl
hackers) than the C# syntax, and is to be preferred on those grounds alone.

cheers

andrew

#34Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Andrew Dunstan (#33)
Re: string function - "format" function proposal

On Thu, Oct 14, 2010 at 9:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

They're both somewhat arcane. But I think the C syntax is likely to be more
familiar to a wider group of users (including, for example, perl hackers)
than the C# syntax, and is to be preferred on those grounds alone.

OK, probably C syntax is the best design.
Then, let's merge format() and substitute() in the latest patch.

I have a comment about %i for identifier format. %i is also used in
printf(), so it would be better to choose another character, like %I.
(%l is ok, but would be %L if we choose %I.)
Implementation for sprintf() in strincfunc might not be used now,
but it will be a conflict when we also merge it to format() in the future.

--
Itagaki Takahiro

#35Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#34)
Re: string function - "format" function proposal

2010/10/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Thu, Oct 14, 2010 at 9:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

They're both somewhat arcane. But I think the C syntax is likely to be more
familiar to a wider group of users (including, for example, perl hackers)
than the C# syntax, and is to be preferred on those grounds alone.

OK, probably C syntax is the best design.
Then, let's merge format() and  substitute() in the latest patch.

I have a comment about %i for identifier format. %i is also used in
printf(), so it would be better to choose another character, like %I.
(%l is ok, but would be %L if we choose %I.)
Implementation for sprintf() in strincfunc might not be used now,
but it will be a conflict when we also merge it to format() in the future.

I like to see only lower case chars for tags. Using upper chars isn't
a good idea. The system in tags must not be same as I designed, but
there should be clean relation between tag and semantic. The current
system was simple %s string, %i identifier %l literal - if you don't
like %l or %i, then maybe %ls or %is - like "literal string" or "ident
string".

I don't think so merging sprintf and format can be good. Sprintf is
too complex - so long years users don't know specification well and
creating some like sprintf function can be messy for users. I like to
see accurate sprintf function in contrib - and some else in core.

Regards

Pavel Stehule

Show quoted text

--
Itagaki Takahiro

#36Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#35)
Re: string function - "format" function proposal

On Fri, Oct 15, 2010 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

then maybe %ls or %is - like "literal string" or "ident string".

Yeah, good idea!

I don't think so merging sprintf and format can be good. Sprintf is
too complex - so long years users don't know specification well and
creating some like sprintf function can be messy for users. I like to
see accurate sprintf function in contrib - and some else in core.

I agree that full-spec sprintf is too complex, but precision and
zero-full for numeric types are commonly used. I think someone
will ask us "Why don't have numeric formats though we have %s?".

--
Itagaki Takahiro

#37Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#36)
Re: string function - "format" function proposal

2010/10/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Fri, Oct 15, 2010 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

then maybe %ls or %is - like "literal string" or "ident string".

Yeah, good idea!

I don't think so merging sprintf and format can be good. Sprintf is
too complex - so long years users don't know specification well and
creating some like sprintf function can be messy for users. I like to
see accurate sprintf function in contrib - and some else in core.

I agree that full-spec sprintf is too complex, but precision and
zero-full for numeric types are commonly used. I think someone
will ask us "Why don't have numeric formats though we have %s?".

And the reply is - we have function to_char. I don't see any reason
why we have to have two independent formatting systems.

Pavel

Show quoted text

--
Itagaki Takahiro

#38Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#37)
Re: string function - "format" function proposal

On Fri, Oct 15, 2010 at 8:20 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

And the reply is - we have function to_char. I don't see any reason
why we have to have two independent formatting systems.

The formats for literal and identifier can be replaced to quote_nullable() and
quote_ident(), too. Features to write simple queries are constantly in demand.

--
Itagaki Takahiro

#39Robert Haas
robertmhaas@gmail.com
In reply to: Itagaki Takahiro (#36)
Re: string function - "format" function proposal

On Fri, Oct 15, 2010 at 12:55 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Fri, Oct 15, 2010 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

then maybe %ls or %is - like "literal string" or "ident string".

Yeah, good idea!

-1 from me. What does this do except make it more long-winded?

I don't think so merging sprintf and format can be good. Sprintf is
too complex - so long years users don't know specification well and
creating some like sprintf function can be messy for users. I like to
see accurate sprintf function in contrib - and some else in core.

I agree that full-spec sprintf is too complex, but precision and
zero-full for numeric types are commonly used. I think someone
will ask us "Why don't have numeric formats though we have %s?".

I think someone might also ask - why are you bothering to create this
at all? The amount of work that has been put into this is, IMHO, far
out of proportion to the value of the feature. As Pavel points out,
we already have perfectly good mechanisms for converting our various
data types to text. We do not need to invent new ones.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#39)
Re: string function - "format" function proposal

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 15, 2010 at 12:55 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

I agree that full-spec sprintf is too complex, but precision and
zero-full for numeric types are commonly used. I think someone
will ask us "Why don't have numeric formats though we have %s?".

I think someone might also ask - why are you bothering to create this
at all? The amount of work that has been put into this is, IMHO, far
out of proportion to the value of the feature. As Pavel points out,
we already have perfectly good mechanisms for converting our various
data types to text. We do not need to invent new ones.

I beg to differ. IMO to_char is a lot closer to the "sucks big-time"
end of the spectrum than the "perfectly good" end of the spectrum:
it's a bad implementation of a crummy design. I think a lot of people
would like to have something closer to sprintf-style formatting.

I think we should go into this with the idea that it might only do 10%
of what sprintf can do initially, but there will be pressure to cover a
lot of the other 90% eventually. So it would be a good idea to ensure
that we don't make any choices that are gratuitously incompatible with
standard sprintf format codes. In particular, I agree with the idea of
using %I not %i for identifiers --- in fact I'd go so far as to suggest
that all specifiers we invent, rather than borrowing from sprintf, be
upper-case.

regards, tom lane

#41Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#40)
Re: string function - "format" function proposal

On Fri, Oct 15, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 15, 2010 at 12:55 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

I agree that full-spec sprintf is too complex, but precision and
zero-full for numeric types are commonly used. I think someone
will ask us "Why don't have numeric formats though we have %s?".

I think someone might also ask - why are you bothering to create this
at all?  The amount of work that has been put into this is, IMHO, far
out of proportion to the value of the feature.  As Pavel points out,
we already have perfectly good mechanisms for converting our various
data types to text.  We do not need to invent new ones.

I beg to differ.  IMO to_char is a lot closer to the "sucks big-time"
end of the spectrum than the "perfectly good" end of the spectrum:
it's a bad implementation of a crummy design.  I think a lot of people
would like to have something closer to sprintf-style formatting.

I think we should go into this with the idea that it might only do 10%
of what sprintf can do initially, but there will be pressure to cover a
lot of the other 90% eventually.  So it would be a good idea to ensure
that we don't make any choices that are gratuitously incompatible with
standard sprintf format codes.  In particular, I agree with the idea of
using %I not %i for identifiers --- in fact I'd go so far as to suggest
that all specifiers we invent, rather than borrowing from sprintf, be
upper-case.

Hmm. I have a feeling that's going to be a rathole. Among other
problems, what about types other than strings and numbers? The thing
I most often need to format is a date, and the second most common one
is timestamp with time zone. The specification for sprintf is
ridiculously complicated with just the things C has as built-in types,
never mind SQL.

Then again, if I'm not the one who has to spend time in the rathole...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#41)
Re: string function - "format" function proposal

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 15, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we should go into this with the idea that it might only do 10%
of what sprintf can do initially, but there will be pressure to cover a
lot of the other 90% eventually.

Hmm. I have a feeling that's going to be a rathole. Among other
problems, what about types other than strings and numbers?

I think the general solution is to split those off as subproblems.
If you've got a type that has special formatting requirements,
you can do

sprintf('foo is %s', format_my_type(value, other-arguments))

where format_my_type returns text. (So, in particular, you could use
to_char for this if it solved the particular need.)

Having said that, it might make sense to provide special case handling
of dates and timestamps, since that's definitely the most common case
where you might not be satisfied with the default conversion to text.

The specification for sprintf is
ridiculously complicated with just the things C has as built-in types,
never mind SQL.

Sure, but an awful lot of those bells and whistles turn out to be handy.
Personally I think the field width control options are the main thing
that sprintf has got over to_char, so I think we're going to want those
sooner rather than later.

Then again, if I'm not the one who has to spend time in the rathole...

Yeah, I'm not in a hurry to spend time on it either. I just foresee
that somebody will want to, and so I don't want a dead-end definition.

regards, tom lane

#43Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#42)
Re: string function - "format" function proposal

On Fri, Oct 15, 2010 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 15, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we should go into this with the idea that it might only do 10%
of what sprintf can do initially, but there will be pressure to cover a
lot of the other 90% eventually.

Hmm.  I have a feeling that's going to be a rathole.  Among other
problems, what about types other than strings and numbers?

I think the general solution is to split those off as subproblems.
If you've got a type that has special formatting requirements,
you can do

sprintf('foo is %s', format_my_type(value, other-arguments))

where format_my_type returns text.  (So, in particular, you could use
to_char for this if it solved the particular need.)

Having said that, it might make sense to provide special case handling
of dates and timestamps, since that's definitely the most common case
where you might not be satisfied with the default conversion to text.

The specification for sprintf is
ridiculously complicated with just the things C has as built-in types,
never mind SQL.

Sure, but an awful lot of those bells and whistles turn out to be handy.

No doubt. The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

Personally I think the field width control options are the main thing
that sprintf has got over to_char, so I think we're going to want those
sooner rather than later.

Then again, if I'm not the one who has to spend time in the rathole...

Yeah, I'm not in a hurry to spend time on it either.  I just foresee
that somebody will want to, and so I don't want a dead-end definition.

Perhaps we could design a family of (heh, heh, undocumented) functions
called _format_helper(type, text), or something like that. format()
would call the format helper for the appropriate type with the datum
to be formatted and the sprintf-like format string as arguments. Or
maybe that isn't feasible, I'm just brainstorming.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Robert Haas (#43)
Re: string function - "format" function proposal

On Sat, Oct 16, 2010 at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:

No doubt.  The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

If we decide to use C-like sprintf(), I think the only thing we can do
is to implement C-syntax as much as possible. Users will expect the
function behaves as sprintf, because it has the similar syntax.
It's not an item for now, but someone would request it at a future date.

BTW, the interoperability is why I proposed {} syntax. For example,
{1:YYYY-MM-DD} for date is expanded to to_char($1, 'YYYY-MM-DD').
(Maybe it's not so easy; It requires function lookups depending on types.)

--
Itagaki Takahiro

#45Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#44)
Re: string function - "format" function proposal

2010/10/18 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

On Sat, Oct 16, 2010 at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:

No doubt.  The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

If we decide to use C-like sprintf(), I think the only thing we can do
is to implement C-syntax as much as possible. Users will expect the
function behaves as sprintf, because it has the similar syntax.
It's not an item for now, but someone would request it at a future date.

yes, it is reason why I wrote two functions - sprintf and format.

BTW, the interoperability is why I proposed {} syntax. For example,
{1:YYYY-MM-DD} for date is expanded to to_char($1, 'YYYY-MM-DD').
(Maybe it's not so easy; It requires function lookups depending on types.)

why this shorcut is necessary?

Regards

Pavel Stehule

Show quoted text

--
Itagaki Takahiro

#46Robert Haas
robertmhaas@gmail.com
In reply to: Itagaki Takahiro (#44)
Re: string function - "format" function proposal

On Mon, Oct 18, 2010 at 7:37 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Sat, Oct 16, 2010 at 7:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:

No doubt.  The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

If we decide to use C-like sprintf(), I think the only thing we can do
is to implement C-syntax as much as possible. Users will expect the
function behaves as sprintf, because it has the similar syntax.
It's not an item for now, but someone would request it at a future date.

BTW, the interoperability is why I proposed {} syntax. For example,
{1:YYYY-MM-DD} for date is expanded to to_char($1, 'YYYY-MM-DD').
(Maybe it's not so easy; It requires function lookups depending on types.)

There's no particular reason why we couldn't make this work with
sprintf-type syntax; for example, you could allow %{XYZ} to mean
to_char(value, 'XYZ'). But it seems to me that we have agreement that
this should start with just %s, %I, %L and allow 3$ or similar in the
middle to specify which argument it is. We can then argue about how
many more bells and whistles to add later.

I would like to bounce this back for rework along the lines described
above and ask for a resubmit to the next CF. We are out of time to
consider this further for this CF, and clearly it's not ready to go
ATM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#47Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#46)
Re: string function - "format" function proposal

2010/10/18, Robert Haas <robertmhaas@gmail.com>:

On Mon, Oct 18, 2010 at 7:37 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Sat, Oct 16, 2010 at 7:29 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

No doubt. The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

If we decide to use C-like sprintf(), I think the only thing we can do
is to implement C-syntax as much as possible. Users will expect the
function behaves as sprintf, because it has the similar syntax.
It's not an item for now, but someone would request it at a future date.

BTW, the interoperability is why I proposed {} syntax. For example,
{1:YYYY-MM-DD} for date is expanded to to_char($1, 'YYYY-MM-DD').
(Maybe it's not so easy; It requires function lookups depending on types.)

There's no particular reason why we couldn't make this work with
sprintf-type syntax; for example, you could allow %{XYZ} to mean
to_char(value, 'XYZ'). But it seems to me that we have agreement that
this should start with just %s, %I, %L and allow 3$ or similar in the
middle to specify which argument it is. We can then argue about how
many more bells and whistles to add later.

so, yes. Can we finish this discus with this result? I'll prepare
patch for next commit fest. Next question - what about sprintf
function in core? Is living this idea still?

Regards

Pavel

Show quoted text

I would like to bounce this back for rework along the lines described
above and ask for a resubmit to the next CF. We are out of time to
consider this further for this CF, and clearly it's not ready to go
ATM.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#48Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Stehule (#47)
Re: string function - "format" function proposal

On Mon, Oct 18, 2010 at 1:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/10/18, Robert Haas <robertmhaas@gmail.com>:

On Mon, Oct 18, 2010 at 7:37 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:

On Sat, Oct 16, 2010 at 7:29 AM, Robert Haas <robertmhaas@gmail.com>
wrote:

No doubt.  The problem is that we're going to end up with those bells
and whistles in two places: in to_char or other type-specific
formatting functions, and again in format.

If we decide to use C-like sprintf(), I think the only thing we can do
is to implement C-syntax as much as possible. Users will expect the
function behaves as sprintf, because it has the similar syntax.
It's not an item for now, but someone would request it at a future date.

BTW, the interoperability is why I proposed {} syntax. For example,
{1:YYYY-MM-DD} for date is expanded to to_char($1, 'YYYY-MM-DD').
(Maybe it's not so easy; It requires function lookups depending on types.)

There's no particular reason why we couldn't make this work with
sprintf-type syntax; for example, you could allow %{XYZ} to mean
to_char(value, 'XYZ').  But it seems to me that we have agreement that
this should start with just %s, %I, %L and allow 3$ or similar in the
middle to specify which argument it is.  We can then argue about how
many more bells and whistles to add later.

so, yes. Can we finish this discus with this result? I'll prepare
patch for next commit fest. Next question - what about sprintf
function in core? Is living this idea still?

I'm indifferent about whether we put it in core or contrib.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company