string function - "format" function proposal
Hello
I am returning back to string functions. For me, the most important
function isn't commited still. There was discussion about "format" or
"sprintf" fuction. So I'll do a small resume.
goal: to get function that helps with formatting a message texts and
helps with building a SQL commands (used as dynamic SQL)
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" function
Itagaki objectives to "format" function:
* there are not possibility put two parameters without a space between
* it is too simple
My objectives to sprintf function:
* it is designed to different environment than SQL - missing support
NULL, missing support for date, timestamp, boolean, ...
* it is too complex, some parameters has different meaning for different tags
* we have a "to_char" function for complex formatting now.
Now I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literal
using a NULL:
for %s NULL is transformed to empty string - like "concat"
for %i NULL raises an exception
for %l NULL is transformed to ' NULL ' string.
This system is still simple and enough.
Implemented sprintf function can be moved to "sprintf" contrib module.
comments
Regards
Pavel Stehule
On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" functionNow I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literal
These are just ideas:
* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')
* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.
New languages' libraries might be of some help. LLs, C#, etc.
--
Itagaki Takahiro
2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" functionNow I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literalThese are just ideas:
* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')
what is use case for this feature? I don't see it.
* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.New languages' libraries might be of some help. LLs, C#, etc.
I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?
Regards
Pavel
Show quoted text
--
Itagaki Takahiro
Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:
2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" functionNow I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literalThese are just ideas:
* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')what is use case for this feature? I don't see it.
Translations :-) I haven't had a use for that but I've heard people
implements gettext of sorts in database tables. Maybe that kind of
thing would be of use here.
* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.New languages' libraries might be of some help. LLs, C#, etc.
I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?
What about %{sth}? That way you don't need to escape {. The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08} So the above example is:
format('==%{YYYY-MM-DD}==', tm::timestamp);
Not sure about this to_char stuff though, seems too cute. You can do
the case above like this:
format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2010/8/30 Alvaro Herrera <alvherre@commandprompt.com>:
Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:
2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" functionNow I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literalThese are just ideas:
* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')what is use case for this feature? I don't see it.
Translations :-) I haven't had a use for that but I've heard people
implements gettext of sorts in database tables. Maybe that kind of
thing would be of use here.* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.New languages' libraries might be of some help. LLs, C#, etc.
I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?What about %{sth}? That way you don't need to escape {. The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08} So the above example is:
then you need escaping too :)
format('==%{YYYY-MM-DD}==', tm::timestamp);
I am not sure if this is correct -but why not
so there are possible combinations
%s .. no quoting, NULL is ''
%{} .. no quoting, NULL is NULL .. like output from to_char
%{}s .. no quoting with formatting, NULL is ''
now I have not idea about nice syntax for positional parameters - maybe
%{...}$1s or we can use a two variants for tags - not positional '%'
and positional '%', so
$1{...}s, %{...}s, $1, %s, $1s, $1{...}, %{...} can be valid tags
Regards
Pavel Stehule
Not sure about this to_char stuff though, seems too cute. You can do
the case above like this:format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))
I like an using a format like tag - there are not technical problem -
format can be taken from string and data type parameter can be known
too. But this feature can be some enhancing. The basic features are
NULL handling and right quoting.
Show quoted text
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hello
attached WIP patch.
I implement only basic format's tags related to SQL: string, value,
literal, sql identifier. These tags are basic, but there are not any
break to implement any other formats or enhance a syntax. The mix with
to_char function is more complex then I expected - so I don't thinking
about it for now (there are more then one to_char function).
I don't found a nice mix for placeholders and positional placeholders
- so I propose a new special function "substitute" (in contrib) where
placeholders are positional. More - we check in function "format" if
all parameters are used - this check isn't related to positional
placeholders, this is reason for separate implementation too:
so some examples:
postgres=# select substitute('second parameter is "$2" and first
parameter is "$1"', 'first parameter', 'second parameter');
substitute
─────────────────────────────────────────────────────────────────────────────────
second parameter is "second parameter" and first parameter is "first parameter"
(1 row)
postgres=# select format('INSERT INTO %i (c1, c2, c3, c4) VALUES
(%v,%v,%v,%v)', 'my tab',1, NULL, true, 'hello');
format
────────────────────────────────────────────────────────────────
INSERT INTO "my tab" (c1, c2, c3, c4) VALUES (1,NULL,t,'hello')
(1 row)
postgres=# select format('SQL identifier %i cannot be a NULL', NULL);
ERROR: SQL identifier cannot be a NULL
postgres=# select format('NULL is %v or empty string "%s"', NULL, NULL);
format
─────────────────────────────────
NULL is NULL or empty string ""
(1 row)
%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literal
I hope so this system is clean, simple, readable and extensible
Regards
Pavel
2010/8/30 Pavel Stehule <pavel.stehule@gmail.com>:
Show quoted text
2010/8/30 Alvaro Herrera <alvherre@commandprompt.com>:
Excerpts from Pavel Stehule's message of lun ago 30 07:51:55 -0400 2010:
2010/8/30 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Mon, Aug 30, 2010 at 7:58 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
propsals:
* "format" function - uses same formatting as PL/pgSQL RAISE statement
* "sprintf" functionNow I propose a compromise - "format" function with only three tags:
%s .. some string
%i .. SQL identifier
%l .. string literalThese are just ideas:
* Use $n, as like as PREPARE command.
It allows for us to swap arguments in any order.
SELECT format('$2 before $1', 'aaa', 'bbb')what is use case for this feature? I don't see it.
Translations :-) I haven't had a use for that but I've heard people
implements gettext of sorts in database tables. Maybe that kind of
thing would be of use here.* Call to_char() functions for each placeholder.
For example,
format('=={YYYY-MM-DD}==', tm::timestamp)
is equivalent to
'==' || to_char(tm, 'YYYY-MM-DD') || '=='
'{}' prints the input with the default format.New languages' libraries might be of some help. LLs, C#, etc.
I though about integration with to_char function too. There are not
technical barrier. And I can live with just {to_char_format} too. It
can be or cannot be mixed with basic tags together - there is
specified a NULL value behave. If we allow {format} syntax, then we
have to specify a escape syntax for { and }. Do you have a some idea?What about %{sth}? That way you don't need to escape {. The closing } would
need escaping only inside the %{} specifier, so {%{YYYY{\}MM}} prints
{2010{}08} So the above example is:then you need escaping too :)
format('==%{YYYY-MM-DD}==', tm::timestamp);
I am not sure if this is correct -but why not
so there are possible combinations
%s .. no quoting, NULL is ''
%{} .. no quoting, NULL is NULL .. like output from to_char
%{}s .. no quoting with formatting, NULL is ''now I have not idea about nice syntax for positional parameters - maybe
%{...}$1s or we can use a two variants for tags - not positional '%'
and positional '%', so
$1{...}s, %{...}s, $1, %s, $1s, $1{...}, %{...} can be valid tagsRegards
Pavel Stehule
Not sure about this to_char stuff though, seems too cute. You can do
the case above like this:format('==%s==', to_char(tm::timestamp, 'YYYY-MM-DD'))
I like an using a format like tag - there are not technical problem -
format can be taken from string and data type parameter can be known
too. But this feature can be some enhancing. The basic features are
NULL handling and right quoting.--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Aug 31, 2010, at 5:07 PM, Pavel Stehule wrote:
Hello
attached WIP patch.
I implement only basic format's tags related to SQL: string, value,
literal, sql identifier. These tags are basic, but there are not any
break to implement any other formats or enhance a syntax. The mix with
to_char function is more complex then I expected - so I don't thinking
about it for now (there are more then one to_char function).
<snip>
It would be pretty handy if plpgsql EXECUTE could operate like this with USING to support identifiers.
Cheers,
M
On Tue, Aug 31, 2010 at 11:07:40PM +0200, Pavel Stehule wrote:
Hello
attached WIP patch.
I don't see it attached. Is it just me?
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Wed, Sep 1, 2010 at 6:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I don't found a nice mix for placeholders and positional placeholders
How about %pos$format, used in C-printf()? It might be
only in Linux's libc.
printf("<%2$s> <%1$d>\n", 123, "abc");
=> <abc> <123>
http://linux.die.net/man/3/printf
%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literal
Looks good designed. I have a couple of comments and questions:
* There is no examples for %l. What's the difference from %v and %s?
If it always quotes, how does it work? Like as quote_literal()
or quote_nullable()?
* %v quotes text values (and maybe all non-numeric values) with
single quotes, but doesn't numeric values. How do we determine
the difference? By type oid?
* %v also doesn't quote boolean values, but t and f are not valid.
You should use true and false (or 't' and 'f') for the cases.
(So, your "INSERT INTO" example is broken.)
--
Itagaki Takahiro
2010/9/1 David Fetter <david@fetter.org>:
On Tue, Aug 31, 2010 at 11:07:40PM +0200, Pavel Stehule wrote:
Hello
attached WIP patch.
I don't see it attached. Is it just me?
sorry, it was at 1 ofter midnight
Regards
Pavel
Show quoted text
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.icsRemember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachments:
ft02.difftext/x-patch; charset=US-ASCII; name=ft02.diffDownload+534-116
2010/9/1 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Wed, Sep 1, 2010 at 6:07 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I don't found a nice mix for placeholders and positional placeholders
How about %pos$format, used in C-printf()? It might be
only in Linux's libc.printf("<%2$s> <%1$d>\n", 123, "abc");
=> <abc> <123>
http://linux.die.net/man/3/printf
same syntax I designed and didn't implement, because it isn't too
readable (my opinion). But I am not against - just thinking so
separate function can be better.
%i ... sql identifier
%v ... sql value
%s ... string --- the most used tag I expect
%l ... literalLooks good designed. I have a couple of comments and questions:
* There is no examples for %l. What's the difference from %v and %s?
If it always quotes, how does it work? Like as quote_literal()
or quote_nullable()?
%l is always quoted
it is designed for syntax: SELECT integer '10';
%s versus %v ... %s is never quoted X %v is quoted when it is
necessary, NULL is showed as empty string for %s and as "NULL" for %v.
%s is used for messages (the behave is same like "concat"), %v is used
for SQL statement building
* %v quotes text values (and maybe all non-numeric values) with
single quotes, but doesn't numeric values. How do we determine
the difference? By type oid?
every datatype has typecategory attribute
typname │ typcategory
─────────────────┼─────────────
int8 │ N
int2 │ N
int4 │ N
regproc │ N
oid │ N
float4 │ N
float8 │ N
money │ N
numeric │ N
regprocedure │ N
regoper │ N
regoperator │ N
regclass │ N
regtype │ N
regconfig │ N
regdictionary │ N
cardinal_number │ N
so these types are unquoted
* %v also doesn't quote boolean values, but t and f are not valid.
You should use true and false (or 't' and 'f') for the cases.
(So, your "INSERT INTO" example is broken.)
you have a true - it should be fixed
Regards
Pavel
Show quoted text
--
Itagaki Takahiro
On Wed, Sep 1, 2010 at 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
* %v also doesn't quote boolean values, but t and f are not valid.
You should use true and false (or 't' and 'f') for the cases.you have a true - it should be fixed
I found quote_literal() prints boolean values as 'true' or 'false'.
It uses casting to text type rather than calling output function.
OTOH, format functions (and concat funcs) use output functions.
Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.
--
Itagaki Takahiro
2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Wed, Sep 1, 2010 at 1:29 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
* %v also doesn't quote boolean values, but t and f are not valid.
You should use true and false (or 't' and 'f') for the cases.you have a true - it should be fixed
I found quote_literal() prints boolean values as 'true' or 'false'.
It uses casting to text type rather than calling output function.
OTOH, format functions (and concat funcs) use output functions.Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.
Personally I prefer casting to text - it allows some later
customizations. And it's more consistent with || operator. So the
functions concat* should to be fixed.
Regards
Pavel
Show quoted text
--
Itagaki Takahiro
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.
Personally I prefer casting to text -
No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.
iit allows some later
customizations. And it's more consistent with || operator.
I don't buy either of those arguments.
regards, tom lane
2010/9/6 Tom Lane <tgl@sss.pgh.pa.us>:
Pavel Stehule <pavel.stehule@gmail.com> writes:
2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
Which should we use for such purposes? Consistent behavior is
obviously preferred. Boolean type might be the only type that
is converted to different representation in typoutput or cast-to-test,
but we should consider to have boolean-specific hardwired code,
or cast all types to text instead of output functions.Personally I prefer casting to text -
No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.iit allows some later
customizations. And it's more consistent with || operator.I don't buy either of those arguments.
can we use a both? like plpgsql? First check cast to text, and second
use a IO functions?
Why I think so this is useful - sometimes people asked some GUC for
formatting date, boolean and other. If these functions try to use a
cast to text first, then there is some space for customization via
custom cast functions.
Regards
Pavel Stehule
Show quoted text
regards, tom lane
On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.
One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.
If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.
--
Itagaki Takahiro
2010/9/6 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.
we can format some types directly - but I like idea of casting to text
because there is space for users.
Pavel
Show quoted text
If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.--
Itagaki Takahiro
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:
On Mon, Sep 6, 2010 at 10:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
No, you need to use the I/O functions. Not every type is guaranteed to
have a cast to text.
One issue is that Pavel want to generate valid SQL statement using
%v format. Boolean values are printed as t or f, so the unquoted
values are not valid syntax.
So? You'd need to quote the values anyway, in general. If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.
If we only use output functions, boolean values should be written as
't' or 'f' (single-quoted), Only numeric values can be unquoted on %v.
I'm not sure that it's a good idea to have any type-specific special
cases. Failing to quote numeric values will bring in the whole set of
issues about how the parser initially types numeric constants, as in the
other thread over the weekend.
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes:
Why I think so this is useful - sometimes people asked some GUC for
formatting date, boolean and other. If these functions try to use a
cast to text first, then there is some space for customization via
custom cast functions.
This is basically nonsense. If you don't control a type's output
function, you don't control its cast to text either. Nor do I think
it's a good idea to encourage people to make their casts to text operate
differently from their output functions. We have that one wart in
boolean casting because the SQL standard specifies the result of cast to
text and it's different from our historical practice in the bool output
function --- but it is a wart, not something we should encourage people
to emulate.
regards, tom lane
On Mon, Sep 6, 2010 at 11:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So? You'd need to quote the values anyway, in general. If you want
something that will be valid SQL you'd better include the functionality
of quote_literal() in it.I'm not sure that it's a good idea to have any type-specific special
cases.
As I remember, the original motivation of %v formatter is
some DBMSes don't like quoted numeric literals. However,
Postgres accepts quoted numerics, and we're developing Postgres.
So, our consensus would be %v formatter should be removed
completely from the format function.
--
Itagaki Takahiro