format() with embedded to_char() formatter

Started by Itagaki Takahiroabout 15 years ago8 messages
#1Itagaki Takahiro
itagaki.takahiro@gmail.com

format() function is very useful to construct formatted text,
but it doesn't support embedded formatter unlike sprintf() in C.
Of course, we can use to_char() functions for each argument value,
but embedded formatter would be more readable.

I'd like to propose %{...}s syntax, where format('%{xxx}s', arg)
is equivalent to format('%s', to_char(arg, 'xxx')). I think the
approach is better than implement C-like formatter because we
can reuse existing to_char() functions for the purpose.

Here are examples for the usage:

=# SELECT format('%{FM0000}s : %{YYYY-MM-DD}L', 123, current_timestamp);
format
---------------------
0123 : '2010-11-22'

=# SELECT format('CREATE TABLE partition_%{YYYYMMDD}s () INHERITS
parent', current_date);
format
----------------------------------------------------
CREATE TABLE partition_20101122 () INHERITS parent

Is it interesting? Comments welcome.

--
Itagaki Takahiro

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Itagaki Takahiro (#1)
Re: format() with embedded to_char() formatter

Hello

There is little bit complication. There are no one "to_char" function
- so you cannot to use DirectFunctionCall API.

but I am not against to this proposal.

regards

Pavel

2010/11/22 Itagaki Takahiro <itagaki.takahiro@gmail.com>:

Show quoted text

format() function is very useful to construct formatted text,
but it doesn't support embedded formatter unlike sprintf() in C.
Of course, we can use to_char() functions for each argument value,
but embedded formatter would be more readable.

I'd like to propose %{...}s syntax, where format('%{xxx}s', arg)
is equivalent to format('%s', to_char(arg, 'xxx')). I think the
approach is better than implement C-like formatter because we
can reuse existing to_char() functions for the purpose.

Here are examples for the usage:

=# SELECT format('%{FM0000}s : %{YYYY-MM-DD}L', 123, current_timestamp);
      format
---------------------
 0123 : '2010-11-22'

=# SELECT format('CREATE TABLE partition_%{YYYYMMDD}s () INHERITS
parent',  current_date);
                      format
----------------------------------------------------
 CREATE TABLE partition_20101122 () INHERITS parent

Is it interesting? Comments welcome.

--
Itagaki Takahiro

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Itagaki Takahiro (#1)
Re: format() with embedded to_char() formatter

Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:

I'd like to propose %{...}s syntax, where format('%{xxx}s', arg)
is equivalent to format('%s', to_char(arg, 'xxx')). I think the
approach is better than implement C-like formatter because we
can reuse existing to_char() functions for the purpose.

This seems pretty gross, not least because the existing to_char
functions are so limited and broken. I don't really want to make
format() incorporate all the brain damage in timestamp to_char, in
particular. Also, it doesn't seem that you're really getting much
notational leverage with this proposal. And lastly, AFAICS there
is no way to do what you suggest without some really ugly kluges
in the parser --- I think the function parsing code would have to
have special cases to make format() work like this.

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: format() with embedded to_char() formatter

On Mon, Nov 22, 2010 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:

I'd like to propose %{...}s syntax, where format('%{xxx}s', arg)
is equivalent to format('%s', to_char(arg, 'xxx')). I think the
approach is better than implement C-like formatter because we
can reuse existing to_char() functions for the purpose.

This seems pretty gross, not least because the existing to_char
functions are so limited and broken.  I don't really want to make
format() incorporate all the brain damage in timestamp to_char, in
particular.

If the existing to_char() functions are limited and broken, maybe we
ought to fix them. I am reminded of some wit's quote that XML is like
violence - if it doesn't solve your problem, you aren't using enough
of it. I'm not a believer in that philosophy, and don't think that
adding a whole new set of functions with incompatible semantics is the
right way to fix problems with the existing functions. We have enough
of that already - especially around dates - and it sucks badly enough
as it is. Non-orthogonality is bad.

Also, it doesn't seem that you're really getting much
notational leverage with this proposal.

I am not sure I agree. It seems quite convenient to me to be able to
encode all the formatting crap in the message string, rather than
spreading it out all over the SQL statement. Maybe time for another
poll on -general.

And lastly, AFAICS there
is no way to do what you suggest without some really ugly kluges
in the parser --- I think the function parsing code would have to
have special cases to make format() work like this.

Huh?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: format() with embedded to_char() formatter

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 22, 2010 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

And lastly, AFAICS there
is no way to do what you suggest without some really ugly kluges
in the parser --- I think the function parsing code would have to
have special cases to make format() work like this.

Huh?

How exactly are you going to get from

format('string here', timestamp_expr)

to

format('string here', to_char(timestamp_expr))

especially seeing that "to_char" is not one function but an overloaded
family of functions (doubtless soon to become even more overloaded,
if this proposal is adopted)?

Or is the intention to replicate the parser's
overloaded-function-resolution behavior at runtime? That seems awkward,
duplicative, slow, and probably prone to security issues (think
search_path).

Or perhaps Itagaki-san intended to hard-wire a fixed set of to_char
functions that format() knows about. That seems to lose whatever minor
charms the proposal possessed, because it wouldn't be extensible without
changing format()'s C code.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: format() with embedded to_char() formatter

On Mon, Nov 22, 2010 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 22, 2010 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

And lastly, AFAICS there
is no way to do what you suggest without some really ugly kluges
in the parser --- I think the function parsing code would have to
have special cases to make format() work like this.

Huh?

How exactly are you going to get from

       format('string here', timestamp_expr)

to

       format('string here', to_char(timestamp_expr))

especially seeing that "to_char" is not one function but an overloaded
family of functions (doubtless soon to become even more overloaded,
if this proposal is adopted)?

Or is the intention to replicate the parser's
overloaded-function-resolution behavior at runtime?  That seems awkward,
duplicative, slow, and probably prone to security issues (think
search_path).

Ick.

Or perhaps Itagaki-san intended to hard-wire a fixed set of to_char
functions that format() knows about.  That seems to lose whatever minor
charms the proposal possessed, because it wouldn't be extensible without
changing format()'s C code.

Extensibility would be (really) nice to have, but the feature may have
some merit even without that. I certainly spend a lot more time
formatting built-in types than custom ones.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Robert Haas (#6)
Re: format() with embedded to_char() formatter

2010/11/22 Robert Haas <robertmhaas@gmail.com>:

On Mon, Nov 22, 2010 at 12:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Nov 22, 2010 at 9:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

And lastly, AFAICS there
is no way to do what you suggest without some really ugly kluges
in the parser --- I think the function parsing code would have to
have special cases to make format() work like this.

Huh?

How exactly are you going to get from

       format('string here', timestamp_expr)

to

       format('string here', to_char(timestamp_expr))

especially seeing that "to_char" is not one function but an overloaded
family of functions (doubtless soon to become even more overloaded,
if this proposal is adopted)?

a code for this is available now - if there is a some break, then it
is a "search_path". But this is a general problem. Probably isn't
significant problem to limit search_path just for pg_catalog.

Or is the intention to replicate the parser's
overloaded-function-resolution behavior at runtime?  That seems awkward,
duplicative, slow, and probably prone to security issues (think
search_path).

Ick.

Or perhaps Itagaki-san intended to hard-wire a fixed set of to_char
functions that format() knows about.  That seems to lose whatever minor
charms the proposal possessed, because it wouldn't be extensible without
changing format()'s C code.

Extensibility would be (really) nice to have, but the feature may have
some merit even without that.  I certainly spend a lot more time
formatting built-in types than custom ones.

The implementation should not be complex or ugly, but it can returns
back dependency problem.

Regards

Pavel Stehule

Show quoted text

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Itagaki Takahiro
itagaki.takahiro@gmail.com
In reply to: Pavel Stehule (#7)
1 attachment(s)
Re: format() with embedded to_char() formatter

On Tue, Nov 23, 2010 at 03:08, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Or is the intention to replicate the parser's
overloaded-function-resolution behavior at runtime?  That seems awkward,
duplicative, slow, and probably prone to security issues (think
search_path).

Ick.

I've thought lookup_agg_function() is available for the purpose, but
type coercion is required in some cases, for example, to_char(date).
The parser performs the task in normal cases, but format() does it
in execution time. I have no solution for the issue.

Or perhaps Itagaki-san intended to hard-wire a fixed set of to_char
functions that format() knows about.  That seems to lose whatever minor
charms the proposal possessed, because it wouldn't be extensible without
changing format()'s C code.

Extensibility would be (really) nice to have, but the feature may have
some merit even without that.  I certainly spend a lot more time
formatting built-in types than custom ones.

The implementation should not be complex or ugly,  but it can returns
back dependency problem.

Hard-wired approach might be a bit safer than the above because we
can restrict acceptable set of types. However, we might need to add
additional to_char() functions for often-used types to avoid errors,
especially for date and int2 types.

Just for reference, I attached my past works. It would be a bad
design as discussed above, but it would be a help to see in which
case the approach doesn't work.

--
Itagaki Takahiro

Attachments:

format_with_to_char-20101125.patchapplication/octet-stream; name=format_with_to_char-20101125.patchDownload
diff --git a/src/backend/catalog/pg_aggregate.c b/src/backend/catalog/pg_aggregate.c
index eadf41f..cf65b5e 100644
*** a/src/backend/catalog/pg_aggregate.c
--- b/src/backend/catalog/pg_aggregate.c
***************
*** 34,40 ****
  #include "utils/syscache.h"
  
  
! static Oid lookup_agg_function(List *fnName, int nargs, Oid *input_types,
  					Oid *rettype);
  
  
--- 34,40 ----
  #include "utils/syscache.h"
  
  
! extern Oid lookup_agg_function(List *fnName, int nargs, Oid *input_types,
  					Oid *rettype);
  
  
*************** AggregateCreate(const char *aggName,
*** 298,304 ****
  /*
   * lookup_agg_function -- common code for finding both transfn and finalfn
   */
! static Oid
  lookup_agg_function(List *fnName,
  					int nargs,
  					Oid *input_types,
--- 298,304 ----
  /*
   * lookup_agg_function -- common code for finding both transfn and finalfn
   */
! Oid
  lookup_agg_function(List *fnName,
  					int nargs,
  					Oid *input_types,
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ee83e2f..b324b48 100644
*** a/src/backend/utils/adt/varlena.c
--- b/src/backend/utils/adt/varlena.c
***************
*** 22,27 ****
--- 22,28 ----
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "parser/parser.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
  #include "utils/builtins.h"
*************** text_format(PG_FUNCTION_ARGS)
*** 3820,3825 ****
--- 3821,3915 ----
  		isNull = PG_ARGISNULL(arg);
  		typid = get_fn_expr_argtype(fcinfo->flinfo, arg);
  
+ 		if (*cp == '{')
+ 		{
+ 			const char *format = cp + 1;
+ 
+ 			/* search right parensis */
+ 			for (; cp < end_ptr && *cp != '}'; cp++)
+ 			{
+ 				/* XXX: Should we supply a method to escape '}' ? */
+ 			}
+ 			if (cp >= end_ptr)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 							errmsg("conversion format is incomplete")));
+ 
+ 			/* format the value with to_char function */
+ 			if (!isNull)
+ 			{
+ 				Oid		fnOid;
+ 				Oid		rettype;
+ 				Datum	formatDatum;
+ 
+ #define enable_dynamic_to_char_lookup		0
+ 
+ 				if (enable_dynamic_to_char_lookup)
+ 				{
+ 					Oid		input_types[2];
+ 
+ extern Oid lookup_agg_function(List *fnName, int nargs, Oid *input_types,
+ 					Oid *rettype);
+ 
+ 					/*
+ 					 * Only lookup pg_catalog.to_char() and avoid searching
+ 					 * functions in search_path so that it should not be a
+ 					 * security hole.
+ 					 */
+ 					input_types[0] = typid;
+ 					input_types[1] = TEXTOID;
+ 					fnOid = lookup_agg_function(SystemFuncName("to_char"),
+ 								2, input_types, &rettype);
+ 				}
+ 				else
+ 				{
+ 					rettype = TEXTOID;
+ 
+ 					switch (typid)
+ 					{
+ 						case INT4OID:
+ 							fnOid = 1773;
+ 							break;
+ 						case INT8OID:
+ 							fnOid = 1774;
+ 							break;
+ 						case FLOAT4OID:
+ 							fnOid = 1775;
+ 							break;
+ 						case FLOAT8OID:
+ 							fnOid = 1776;
+ 							break;
+ 						case NUMERICOID:
+ 							fnOid = 1772;
+ 							break;
+ 						case TIMESTAMPTZOID:
+ 							fnOid = 1770;
+ 							break;
+ 						case TIMESTAMPOID:
+ 							fnOid = 2049;
+ 							break;
+ 						case INTERVALOID:
+ 							fnOid = 1768;
+ 							break;
+ 						default:
+ 							elog(ERROR, "unsupported type oid: %u", typid);
+ 							fnOid = rettype = InvalidOid;	/* keep compiler quiet */
+ 					}
+ 				}
+ 
+ 				formatDatum = PointerGetDatum(
+ 					cstring_to_text_with_len(format, cp - format));
+ 				value = OidFunctionCall2(fnOid, value, formatDatum);
+ 				typid = rettype;
+ 			}
+ 
+ 			/* Did we run off the end of the string? */
+ 			if (++cp >= end_ptr)
+ 				ereport(ERROR,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("unterminated conversion specifier")));
+ 		}
+ 
  		switch (*cp)
  		{
  			case 's':