Review: listagg aggregate

Started by David E. Wheeleralmost 16 years ago83 messages

david@kineticode.com

almost 16 years ago

1 attachment(s)

Pavel,

My review of your listagg patch.

Submission Review
-----------------
* The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each).

* There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm in it, I guess.

I would like to see an example, though, and the documentation does not currently explain what each of the parameters are for. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need to explain it. But listagg() has an optional second param. I think that the description should explain what it's for.

* There are tests and they look fine.

Usability Review
----------------
* The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own in SQL a few times).

* No, we don't already have it.

* Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so the simple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's SEPARATOR in the group_concat() aggregate.

* No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered.

Feature Test
------------
* Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have been used for something else since you created the patch. I changed them to 2995 and 2996 and then it worked.
* The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and they work fine:

postgres=# select listagg(a, U&'-\0441\043B\043E\043D-') from (values('aaaa'),('bbbb'),('cccc'
listagg
--------------------------
aaaa-слон-bbbb-слон-cccc
(1 row)

postgres=# select listagg(a, U&'\2014') from (values(U&'\0441\043B\043E\043D'),(U&'d\0061t\+000061'),(U&'\0441\043B\043E\043D')) AS g(a);
listagg
----------------
слон—data—слон
(1 row)

postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
listagg
---------
123
(1 row)

Performance Review
------------------

No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I created a quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so:

CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
RETURNS TEXT LANGUAGE SQL AS $$
SELECT array_to_string($1, ',');
$$;

CREATE AGGREGATE string_accum (
SFUNC = array_accum,
BASETYPE = ANYELEMENT,
STYPE = ANYARRAY,
INITCOND = '{}',
FINALFUNC = a2s
);

Then I ran some simple tests (thanks for the clue, depesz):

postgres=# select count(*) from (select string_accum(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a), generate_series(1,10000) i) AS x(i);
count
-------
1
(1 row)

Time: 1365.382 ms

postgres=# select count(*) from (select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a), generate_series(1,10000) i) AS x(i);
count
-------
1
(1 row)

Time: 17.989 ms

So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with --enable-cassert and --enable-debug. Still, good job.

Coding Review
-------------

* Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source for array_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string functions? Otherwise, the style of the C code looks fine to my untrained eye.

Actually, shouldn't it return text rather than varchar?

* Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions and then just a different final function to turn it into a string (using the internal array_to_string() function, perhaps)? I'm not at all sure about it, but given how little code was required to create the same basic functionality in SQL, I'm surprised that the C implementation requires four functions (accumStringResult(), listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe they're required to make it fast and avoid the overhead of an array?

* No compiler warnings, I never made it crash, good comments, does what it says on the tin. I doubt that there are any portability issues, as the code seems to use standard PostgreSQL internal macros and functions.

Architecture Review
-------------------

* No dependencies, things seem to make sense overall, notwithstanding my questions in the Coding Review.

Review Review
-------------

The only thing I haven't covered so far is the name. I agree with Tom's assertion that the name is awful. Sure there may be a precedent in Oracle, but I hardly find that convincing (some of the big corporations seem to do a really shitty job naming things in their APIs). Given that we already have array_agg(), what about simply concat_agg(), as Tom (wincingly, I grant you) suggested? Or string_agg()?

My bike shed is puce.

Attached is a new patch with the changed OIDs and an added phrase in the documentation that indicates that the second argument can be used to separate the concatenated values.

Best,

David

Attachments:

listagg.patchapplication/octet-stream; name=listagg.patch; x-unix-mode=0644Download

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5de44d6..a437d2c 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***************
*** 1789,1794 ****
--- 1789,1798 ----
      </tgroup>
     </table>
  
+    <para>
+     See also <xref linkend="functions-aggregate"> about the aggregate
+     function <function>listagg</function>.
+    </para>
  
     <table id="conversion-names">
      <title>Built-in Conversions</title>
*************** SELECT NULLIF(value, '(none)') ...
*** 9789,9794 ****
--- 9793,9816 ----
       </row>
  
       <row>
+       <entry>
+        <indexterm>
+         <primary>listagg</primary>
+        </indexterm>
+        <function>
+ 	 listagg(<replaceable class="parameter">expression</replaceable> 
+ 	         [, <replaceable class="parameter">expression</replaceable> ] )</function>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>input values concatenated into an string, optionally separated by the second argument</entry>
+      </row>
+ 
+      <row>
        <entry><function>max(<replaceable class="parameter">expression</replaceable>)</function></entry>
        <entry>any array, numeric, string, or date/time type</entry>
        <entry>same as argument type</entry>
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index cbc08f7..cd13ffd 100644
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
***************
*** 18,27 ****
--- 18,40 ----
  #include "access/hash.h"
  #include "access/tuptoaster.h"
  #include "libpq/pqformat.h"
+ #include "lib/stringinfo.h"
+ #include "nodes/execnodes.h"
  #include "utils/array.h"
  #include "utils/builtins.h"
  #include "mb/pg_wchar.h"
  
+ /* type for state data of listagg function */
+ typedef struct
+ {
+ 	StringInfo	strInfo;
+ 	char	delimiter[1];		/* separator string - one or more chars */
+ } ListAggState;
+ 
+ static ListAggState *accumStringResult(ListAggState *state, 
+ 						    text *elem, 
+ 						    text *delimiter, 
+ 						    MemoryContext aggcontext);
  
  /* common code for bpchartypmodin and varchartypmodin */
  static int32
*************** btbpchar_pattern_cmp(PG_FUNCTION_ARGS)
*** 995,997 ****
--- 1008,1163 ----
  
  	PG_RETURN_INT32(result);
  }
+ 
+ /****************************************************************
+  * listagg
+  *  
+  * Concates values and returns string.
+  *
+  * Syntax:
+  *     FUNCTION listagg(string varchar, delimiter varchar = '')
+  *      RETURNS varchar;
+  *
+  * Note: any NULL value is ignored.
+  *
+  ****************************************************************/
+ static ListAggState *
+ accumStringResult(ListAggState *state, text *elem, text *delimiter, 
+ 						    MemoryContext aggcontext)
+ {
+ 	MemoryContext	oldcontext;
+ 	
+ 	/* 
+ 	 * when state is NULL, create new state value.
+ 	 */
+ 	if (state == NULL)
+ 	{
+ 		if (delimiter != NULL)
+ 		{
+ 			char *dstr = text_to_cstring(delimiter);
+ 			int len = strlen(dstr);
+ 			
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(ListAggState) + len);
+ 			
+ 			/* copy delimiter to state var */
+ 			memcpy(&state->delimiter, dstr, len + 1);
+ 		}
+ 		else
+ 		{
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(ListAggState));
+ 			state->delimiter[0] = '\0';
+ 		}
+ 		
+ 		/* Initialise StringInfo */
+ 		state->strInfo = NULL;
+ 		
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	/* only when element isn't null */
+ 	if (elem != NULL)
+ 	{
+ 		char	*value = text_to_cstring(elem);
+ 
+ 		oldcontext = MemoryContextSwitchTo(aggcontext);
+ 		if (state->strInfo != NULL)
+ 			appendStringInfoString(state->strInfo, state->delimiter);
+ 		else
+ 			state->strInfo = makeStringInfo();
+ 			
+ 		appendStringInfoString(state->strInfo, value);
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	return state;
+ }
+ 
+ Datum
+ listagg1_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	ListAggState *state = NULL;
+ 	text *elem;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "listagg2_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (ListAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_P(1);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    NULL,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for listagg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.	
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum 
+ listagg2_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	ListAggState *state = NULL;
+ 	text *elem;
+ 	text *delimiter = NULL;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "listagg2_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (ListAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_P(1);
+ 	delimiter = PG_ARGISNULL(2) ? NULL : PG_GETARG_TEXT_P(2);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    delimiter,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for listagg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.	
+ 	 */
+ 	PG_RETURN_POINTER(state);	
+ }
+ 
+ Datum
+ listagg_finalfn(PG_FUNCTION_ARGS)
+ {
+ 	ListAggState *state = NULL;
+ 
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 		
+ 	/* cannot be called directly because of internal-type argument */
+ 	Assert(fcinfo->context &&
+ 		    (IsA(fcinfo->context, AggState) ||
+ 			    IsA(fcinfo->context, WindowAggState)));
+ 	
+ 	state = (ListAggState *) PG_GETARG_POINTER(0);
+ 	if (state->strInfo != NULL)
+ 		PG_RETURN_TEXT_P(cstring_to_text(state->strInfo->data));
+ 	else
+ 		PG_RETURN_NULL();    
+ }
diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h
index 3d232d2..b352243 100644
*** a/src/include/catalog/pg_aggregate.h
--- b/src/include/catalog/pg_aggregate.h
*************** DATA(insert ( 2901 xmlconcat2	  -					0	
*** 223,228 ****
--- 223,232 ----
  /* array */
  DATA(insert ( 2335	array_agg_transfn	array_agg_finalfn		0	2281	_null_ ));
  
+ /* varchar */
+ DATA(insert (3030	listagg1_transfn	listagg_finalfn			0	2281	_null_ ));
+ DATA(insert (3031	listagg2_transfn	listagg_finalfn			0	2281	_null_ ));
+ 
  /*
   * prototypes for functions in pg_aggregate.c
   */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c4320e5..db7a818 100644
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DESCR("convert int4 number to hex");
*** 2252,2257 ****
--- 2252,2268 ----
  DATA(insert OID =  2090 ( to_hex	   PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "20" _null_ _null_ _null_ _null_ to_hex64 _null_ _null_ _null_ ));
  DESCR("convert int8 number to hex");
  
+ DATA(insert OID = 2995 (  listagg1_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 2 0 2281 "2281 25" _null_ _null_ _null_ _null_ listagg1_transfn _null_ _null_ _null_ ));
+ DESCR("listagg one param transition function");
+ DATA(insert OID = 2996 (  listagg2_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ listagg2_transfn _null_ _null_ _null_ ));
+ DESCR("listagg two params transition function");
+ DATA(insert OID = 2999 (  listagg_finalfn   PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_ _null_ _null_ listagg_finalfn _null_ _null_ _null_ ));
+ DESCR("listagg final function");
+ DATA(insert OID = 3030 (  listagg		   PGNSP PGUID 12 1 0 0 t f f f f i 1 0 25 "25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string");
+ DATA(insert OID = 3031 (  listagg		   PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string with delimiter");
+ 
  /* for character set encoding support */
  
  /* return database encoding name */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 72e9ed0..076e713 100644
*** a/src/include/utils/builtins.h
--- b/src/include/utils/builtins.h
*************** extern Datum varchartypmodin(PG_FUNCTION
*** 666,671 ****
--- 666,675 ----
  extern Datum varchartypmodout(PG_FUNCTION_ARGS);
  extern Datum varchar(PG_FUNCTION_ARGS);
  
+ extern Datum listagg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum listagg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum listagg_finalfn(PG_FUNCTION_ARGS);
+ 
  /* varlena.c */
  extern text *cstring_to_text(const char *s);
  extern text *cstring_to_text_with_len(const char *s, int len);
diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out
index 9669a52..84a0c94 100644
*** a/src/test/regress/expected/aggregates.out
--- b/src/test/regress/expected/aggregates.out
*************** select aggfns(distinct a,a,c order by a,
*** 799,801 ****
--- 799,832 ----
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
  LINE 1: select aggfns(distinct a,a,c order by a,b)
                                                  ^
+ -- list agg test
+ select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+    listagg    
+ --------------
+  aaaabbbbcccc
+ (1 row)
+ 
+ select listagg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+     listagg     
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select listagg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+     listagg     
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select listagg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+   listagg  
+ -----------
+  bbbb,cccc
+ (1 row)
+ 
+ select listagg(a,',') from (values(null),(null)) g(a);
+  listagg 
+ ---------
+  
+ (1 row)
+ 
diff --git a/src/test/regress/sql/aggregates.sql b/src/test/regress/sql/aggregates.sql
index 18f2e57..fd165d6 100644
*** a/src/test/regress/sql/aggregates.sql
--- b/src/test/regress/sql/aggregates.sql
*************** select aggfns(distinct a,b,c order by a,
*** 355,357 ****
--- 355,364 ----
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
  select aggfns(distinct a,a,c order by a,b)
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
+ 
+ -- list agg test
+ select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select listagg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select listagg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+ select listagg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+ select listagg(a,',') from (values(null),(null)) g(a);

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#1)

Re: Review: listagg aggregate

Hello

2010/1/22 David E. Wheeler <david@kineticode.com>:

Pavel,

My review of your listagg patch.

Submission Review
-----------------
* The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each).

* There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm in it, I guess.

I would like to see an example, though, and the documentation does not currently explain what each of the parameters are for. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need to explain it. But listagg() has an optional second param. I think that the description should explain what it's for.

can I help with it, please. My English is terrible.

* There are tests and they look fine.

Usability Review
----------------
* The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own in SQL a few times).

* No, we don't already have it.

* Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so the simple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's SEPARATOR in the group_concat() aggregate.

* No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered.

Feature Test
------------
* Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have been used for something else since you created the patch. I changed them to 2995 and 2996 and then it worked.
* The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and they work fine:

postgres=# select listagg(a, U&'-\0441\043B\043E\043D-') from (values('aaaa'),('bbbb'),('cccc'
listagg
--------------------------
aaaa-слон-bbbb-слон-cccc
(1 row)

postgres=# select listagg(a, U&'\2014') from (values(U&'\0441\043B\043E\043D'),(U&'d\0061t\+000061'),(U&'\0441\043B\043E\043D')) AS g(a);
listagg
----------------
слон—data—слон
(1 row)

postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
listagg
---------
123
(1 row)

Performance Review
------------------

No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I created a quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so:

CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
RETURNS TEXT LANGUAGE SQL AS $$
SELECT array_to_string($1, ',');
$$;

CREATE AGGREGATE string_accum (
SFUNC = array_accum,
BASETYPE = ANYELEMENT,
STYPE = ANYARRAY,
INITCOND = '{}',
FINALFUNC = a2s
);

Then I ran some simple tests (thanks for the clue, depesz):

postgres=# select count(*) from (select string_accum(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a), generate_series(1,10000) i) AS x(i);
count
-------
1
(1 row)

Time: 1365.382 ms

postgres=# select count(*) from (select listagg(a) from (values('aaaa'),('bbbb'),('cccc')) AS g(a), generate_series(1,10000) i) AS x(i);
count
-------
1
(1 row)

Time: 17.989 ms

So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with --enable-cassert and --enable-debug. Still, good job.

Coding Review
-------------

* Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source for array_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string functions? Otherwise, the style of the C code looks fine to my untrained eye.

array user functions are used more time in pg core. The complexity of
array functions are much higher, so I don't think we need special
file.

Actually, shouldn't it return text rather than varchar?

I'll recheck it. I am sure so all parameters should be a text.

* Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions and then just a different final function to turn it into a string (using the internal array_to_string() function, perhaps)?

We can, but it isn't good way. Processing of arrays is little bit more
expensive then processing plain text. It is reason why listagg is
faster, than your custom aggregate. More, the final function could be
faster - the content is final.

I'm not at all sure about it, but given how little code was required
to create the same basic functionality in SQL, I'm surprised that the
C implementation requires four functions (accumStringResult(),
listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe
they're required to make it fast and avoid the overhead of an array?

It normal for aggregate functions. We need more transfn function,
because we need two two variant: listagg(col), listagg(col, sep). Our
coding guidlines doesn't advice share C functions - but these
functions are +/- wrapper for accumStringResult - so there is zero
overhead.

* No compiler warnings, I never made it crash, good comments, does what it says on the tin. I doubt that there are any portability issues, as the code seems to use standard PostgreSQL internal macros and functions.

Architecture Review
-------------------

* No dependencies, things seem to make sense overall, notwithstanding my questions in the Coding Review.

Review Review
-------------

The only thing I haven't covered so far is the name. I agree with Tom's assertion that the name is awful. Sure there may be a precedent in Oracle, but I hardly find that convincing (some of the big corporations seem to do a really shitty job naming things in their APIs). Given that we already have array_agg(), what about simply concat_agg(), as Tom (wincingly, I grant you) suggested? Or string_agg()?

I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.

My bike shed is puce.

Attached is a new patch with the changed OIDs and an added phrase in the documentation that indicates that the second argument can be used to separate the concatenated values.

Best,

Thank you very much
Regards
Pavel

Show quoted text

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Pavel Stehule (#2)

Re: Review: listagg aggregate

On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:

can I help with it, please. My English is terrible.

Yes, I added a bit in the patch I submitted.

array user functions are used more time in pg core. The complexity of
array functions are much higher, so I don't think we need special
file.

Okay. Should have tried it in PL/pgSQL then, perhaps.

I'll recheck it. I am sure so all parameters should be a text.

Probably shouldn't go into varchar.c then, yes?

We can, but it isn't good way. Processing of arrays is little bit more
expensive then processing plain text. It is reason why listagg is
faster, than your custom aggregate. More, the final function could be
faster - the content is final.

Understood.

It normal for aggregate functions. We need more transfn function,
because we need two two variant: listagg(col), listagg(col, sep). Our
coding guidlines doesn't advice share C functions - but these
functions are +/- wrapper for accumStringResult - so there is zero
overhead.

Ah, okay, it's because of the second argument. Now I understand.

I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.

Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it.

Best,

David

Simon Riggs

simon@2ndQuadrant.com

almost 16 years ago

In reply to: David E. Wheeler (#1)

Re: Review: listagg aggregate

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:

No performance issues

ISTM that this class of function is inherently dangerous performance
wise.

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

--
Simon Riggs www.2ndQuadrant.com

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Simon Riggs (#4)

Re: Review: listagg aggregate

2010/1/24 Simon Riggs <simon@2ndquadrant.com>:

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:

No performance issues

ISTM that this class of function is inherently dangerous performance
wise.

there is potencial risk, but this risk isn't new. The almost all what
you say is true for array aggregates.

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

There are no limit for generating large values - like bytea, xml, or
text. There are not limit for array_accum or array(query). So, I don't
think we need some special mechanism for listagg. If somebody will
generate too large a value, then he will get "out of memory"
exception.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

this is true, but this isn't some new. array_accum working well
without optimizer problems.

Show quoted text

--
Simon Riggs www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Simon Riggs (#4)

Re: Review: listagg aggregate

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:

No performance issues

ISTM that this class of function is inherently dangerous performance
wise.

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

We have that problem already with array_agg(), and I don't recall many
complaints about it. It might be worth worrying about at some point,
but I don't think it's reasonable to insist that it be fixed before
any more such aggregates are created.

I agree that testing-to-failure would be a good idea just to be sure it
fails cleanly.

regards, tom lane

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#3)

Re: Review: listagg aggregate

2010/1/24 David E. Wheeler <david@kineticode.com>:

On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:

can I help with it, please. My English is terrible.

Yes, I added a bit in the patch I submitted.

array user functions are used more time in pg core. The complexity of
array functions are much higher, so I don't think we need special
file.

Okay. Should have tried it in PL/pgSQL then, perhaps.

:) - I'll look on it again.

I'll recheck it. I am sure so all parameters should be a text.

Probably shouldn't go into varchar.c then, yes?

We can, but it isn't good way. Processing of arrays is little bit more
expensive then processing plain text. It is reason why listagg is
faster, than your custom aggregate. More, the final function could be
faster - the content is final.

Understood.

It normal for aggregate functions. We need more transfn function,
because we need two two variant: listagg(col), listagg(col, sep). Our
coding guidlines doesn't advice share C functions - but these
functions are +/- wrapper for accumStringResult - so there is zero
overhead.

Ah, okay, it's because of the second argument. Now I understand.

I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.

Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it.

The "list" is common name for this content - it is usual in Microsoft
SQL Server, it is usual in Oracle. Maybe we can vote about the name

Regards
Pavel Stehule

Show quoted text

Best,

David

Scott Bailey

artacus@comcast.net

almost 16 years ago

In reply to: David E. Wheeler (#3)

Re: Review: listagg aggregate

I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.

Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it.

The corresponding function in Oracle is called wm_concat. In MySQL its
called group_concat. I don't believe DB2 or SQL Server have built in
equivalents. The Oracle name isn't really an option ("wm' stands for
workspace manager)

I think listagg or string_agg would be the most appropriate names. Oh
and before Oracle had wm_concat, Tom Kyte wrote a function called stragg
that was pretty popular.

Scott

Peter Eisentraut

peter_e@gmx.net

almost 16 years ago

In reply to: Scott Bailey (#8)

Re: Review: listagg aggregate

On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:

I think listagg or string_agg would be the most appropriate names. Oh
and before Oracle had wm_concat, Tom Kyte wrote a function called
stragg that was pretty popular.

Well,

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

#10

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Peter Eisentraut (#9)

Re: Review: listagg aggregate

2010/1/25 Peter Eisentraut <peter_e@gmx.net>:

On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:

I think listagg or string_agg would be the most appropriate names. Oh
and before Oracle had wm_concat, Tom Kyte wrote a function called
stragg that was pretty popular.

Well,

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

searching "list sql string" on google. You can see, so "list" is usually used.

Regards
Pavel

Show quoted text

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Pavel Stehule (#10)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/1/25 Peter Eisentraut <peter_e@gmx.net>:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

IOW, if you define away the problem then there's no problem?

I agree that "list" is a terrible choice of name here. "string_agg"
seemed reasonable and in keeping with the standardized "array_agg".

regards, tom lane

#12

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Tom Lane (#11)

Re: Review: listagg aggregate

2010/1/25 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/1/25 Peter Eisentraut <peter_e@gmx.net>:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

IOW, if you define away the problem then there's no problem?

I agree that "list" is a terrible choice of name here. "string_agg"
seemed reasonable and in keeping with the standardized "array_agg".

I am not happy, I thing so we do bigger chaos then it is. But it
hasn't progress. So I agree with name string_agg. In this case isn't a
problem rename this function if somebody would.

I'll send patch over hour.

regards
Pavel Stehule

Show quoted text

regards, tom lane

#13

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Tom Lane (#11)

1 attachment(s)

Re: Review: listagg aggregate

2010/1/25 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/1/25 Peter Eisentraut <peter_e@gmx.net>:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

IOW, if you define away the problem then there's no problem?

I agree that "list" is a terrible choice of name here. "string_agg"
seemed reasonable and in keeping with the standardized "array_agg".

actualised patch - the name is string_agg

regards
Pavel Stehule

Show quoted text

regards, tom lane

Attachments:

string_agg.diffapplication/octet-stream; name=string_agg.diffDownload

*** ./doc/src/sgml/func.sgml.orig	2010-01-19 06:50:18.000000000 +0100
--- ./doc/src/sgml/func.sgml	2010-01-25 15:48:05.050382968 +0100
***************
*** 1789,1794 ****
--- 1789,1798 ----
      </tgroup>
     </table>
  
+    <para>
+    See also <xref linkend="functions-aggregate"> about the aggregate
+    function <function>string_agg</function>.
+    </para>
  
     <table id="conversion-names">
      <title>Built-in Conversions</title>
***************
*** 9811,9816 ****
--- 9815,9837 ----
       </row>
  
       <row>
+       <entry>
+        <indexterm>
+         <primary>string_agg</primary>
+        </indexterm>
+        <function>string_agg(<replaceable class="parameter">expression</replaceable> 
+                      [, <replaceable class="parameter">expression</replaceable> ] )</function>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>input values concatenated into an string, optionally separated by the second argument</entry>
+      </row>
+ 
+      <row>
        <entry><function>sum(<replaceable class="parameter">expression</replaceable>)</function></entry>
        <entry>
         <type>smallint</type>, <type>int</type>,
*** ./src/backend/utils/adt/varlena.c.orig	2010-01-02 17:57:55.000000000 +0100
--- ./src/backend/utils/adt/varlena.c	2010-01-25 15:34:13.259507851 +0100
***************
*** 18,26 ****
--- 18,28 ----
  
  #include "access/tuptoaster.h"
  #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "nodes/execnodes.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
  #include "utils/builtins.h"
***************
*** 48,53 ****
--- 50,67 ----
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ /* type for state data of string_agg function */
+ typedef struct
+ {
+ 	StringInfo	strInfo;
+ 	char	delimiter[1];		/* separator string - one or more chars */
+ } StringAggState;
+ 
+ static StringAggState *accumStringResult(StringAggState *state, 
+ 						    text *elem, 
+ 						    text *delimiter, 
+ 						    MemoryContext aggcontext);
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
***************
*** 3143,3145 ****
--- 3157,3311 ----
  
  	PG_RETURN_INT32(result);
  }
+ 
+ /*
+  * string_agg
+  *  
+  * Concates values and returns string.
+  *
+  * Syntax:
+  *     FUNCTION string_agg(string text, delimiter text = '')
+  *      RETURNS text;
+  *
+  * Note: any NULL value is ignored.
+  */
+ static StringAggState *
+ accumStringResult(StringAggState *state, text *elem, text *delimiter, 
+ 						    MemoryContext aggcontext)
+ {
+ 	MemoryContext	oldcontext;
+ 	
+ 	/* 
+ 	 * when state is NULL, create new state value.
+ 	 */
+ 	if (state == NULL)
+ 	{
+ 		if (delimiter != NULL)
+ 		{
+ 			char *dstr = text_to_cstring(delimiter);
+ 			int len = strlen(dstr);
+ 			
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(StringAggState) + len);
+ 			
+ 			/* copy delimiter to state var */
+ 			memcpy(&state->delimiter, dstr, len + 1);
+ 		}
+ 		else
+ 		{
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(StringAggState));
+ 			state->delimiter[0] = '\0';
+ 		}
+ 		
+ 		/* Initialise StringInfo */
+ 		state->strInfo = NULL;
+ 		
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	/* only when element isn't null */
+ 	if (elem != NULL)
+ 	{
+ 		char	*value = text_to_cstring(elem);
+ 
+ 		oldcontext = MemoryContextSwitchTo(aggcontext);
+ 		if (state->strInfo != NULL)
+ 			appendStringInfoString(state->strInfo, state->delimiter);
+ 		else
+ 			state->strInfo = makeStringInfo();
+ 			
+ 		appendStringInfoString(state->strInfo, value);
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	return state;
+ }
+ 
+ Datum
+ string_agg1_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	StringAggState *state = NULL;
+ 	text *elem;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg1_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (StringAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_PP(1);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    NULL,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.	
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum 
+ string_agg2_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	StringAggState *state = NULL;
+ 	text *elem;
+ 	text *delimiter = NULL;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg2_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (StringAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_PP(1);
+ 	delimiter = PG_ARGISNULL(2) ? NULL : PG_GETARG_TEXT_PP(2);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    delimiter,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum
+ string_agg_finalfn(PG_FUNCTION_ARGS)
+ {
+ 	StringAggState *state = NULL;
+ 
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 		
+ 	/* cannot be called directly because of internal-type argument */
+ 	Assert(fcinfo->context &&
+ 		    (IsA(fcinfo->context, AggState) ||
+ 			    IsA(fcinfo->context, WindowAggState)));
+ 	
+ 	state = (StringAggState *) PG_GETARG_POINTER(0);
+ 	if (state->strInfo != NULL)
+ 		PG_RETURN_TEXT_P(cstring_to_text(state->strInfo->data));
+ 	else
+ 		PG_RETURN_NULL();
+ }
*** ./src/include/catalog/pg_aggregate.h.orig	2010-01-05 02:06:56.000000000 +0100
--- ./src/include/catalog/pg_aggregate.h	2010-01-25 15:14:08.953574327 +0100
***************
*** 223,228 ****
--- 223,232 ----
  /* array */
  DATA(insert ( 2335	array_agg_transfn	array_agg_finalfn		0	2281	_null_ ));
  
+ /* text */
+ DATA(insert (3034	string_agg1_transfn	string_agg_finalfn			0	2281	_null_ ));
+ DATA(insert (3035	string_agg2_transfn	string_agg_finalfn			0	2281	_null_ ));
+ 
  /*
   * prototypes for functions in pg_aggregate.c
   */
*** ./src/include/catalog/pg_proc.h.orig	2010-01-19 15:11:32.000000000 +0100
--- ./src/include/catalog/pg_proc.h	2010-01-25 15:19:30.631574450 +0100
***************
*** 2818,2823 ****
--- 2818,2834 ----
  DATA(insert OID = 2817 (  float8_corr				PGNSP PGUID 12 1 0 0 f f f t f i 1 0 701 "1022" _null_ _null_ _null_ _null_ float8_corr _null_ _null_ _null_ ));
  DESCR("CORR(double, double) aggregate final function");
  
+ DATA(insert OID = 3031 (  string_agg1_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 2 0 2281 "2281 25" _null_ _null_ _null_ _null_ string_agg1_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg one param transition function");
+ DATA(insert OID = 3032 (  string_agg2_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ string_agg2_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg two params transition function");
+ DATA(insert OID = 3033 (  string_agg_finalfn   PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_ _null_ _null_ string_agg_finalfn _null_ _null_ _null_ ));
+ DESCR("string_agg final function");
+ DATA(insert OID = 3034 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 1 0 25 "25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string");
+ DATA(insert OID = 3035 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string with delimiter");
+ 
  /* To ASCII conversion */
  DATA(insert OID = 1845 ( to_ascii	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_	to_ascii_default _null_ _null_ _null_ ));
  DESCR("encode text from DB encoding to ASCII text");
*** ./src/include/utils/builtins.h.orig	2010-01-19 06:50:18.000000000 +0100
--- ./src/include/utils/builtins.h	2010-01-25 15:22:44.784448955 +0100
***************
*** 722,727 ****
--- 722,731 ----
  
  extern Datum pg_column_size(PG_FUNCTION_ARGS);
  
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
+ 
  /* version.c */
  extern Datum pgsql_version(PG_FUNCTION_ARGS);
  
***************
*** 770,775 ****
--- 774,782 ----
  extern Datum chr (PG_FUNCTION_ARGS);
  extern Datum repeat(PG_FUNCTION_ARGS);
  extern Datum ascii(PG_FUNCTION_ARGS);
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
  
  /* inet_net_ntop.c */
  extern char *inet_net_ntop(int af, const void *src, int bits,
*** ./src/test/regress/expected/aggregates.out.orig	2009-12-15 18:57:47.000000000 +0100
--- ./src/test/regress/expected/aggregates.out	2010-01-25 15:54:06.000000000 +0100
***************
*** 799,801 ****
--- 799,832 ----
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
  LINE 1: select aggfns(distinct a,a,c order by a,b)
                                                  ^
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+   string_agg  
+ --------------
+  aaaabbbbcccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+  string_agg 
+ ------------
+  bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null)) g(a);
+  string_agg 
+ ------------
+  
+ (1 row)
+ 
*** ./src/test/regress/sql/aggregates.sql.orig	2009-12-15 18:57:48.000000000 +0100
--- ./src/test/regress/sql/aggregates.sql	2010-01-25 15:53:48.406291403 +0100
***************
*** 355,357 ****
--- 355,364 ----
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
  select aggfns(distinct a,a,c order by a,b)
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
+ 
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null)) g(a);

#14

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Tom Lane (#6)

Re: Review: listagg aggregate

2010/1/24 Tom Lane <tgl@sss.pgh.pa.us>:

Simon Riggs <simon@2ndQuadrant.com> writes:

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:

No performance issues

ISTM that this class of function is inherently dangerous performance
wise.

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

We have that problem already with array_agg(), and I don't recall many
complaints about it. It might be worth worrying about at some point,
but I don't think it's reasonable to insist that it be fixed before
any more such aggregates are created.

I agree that testing-to-failure would be a good idea just to be sure it
fails cleanly.

postgres=# \timing
Timing is on.
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,10000000) g(i);
pg_size_pretty
----------------
210 MB
(1 row)

Time: 5831,218 ms
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,50000000) g(i);
^[^[ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by
21 more bytes.
postgres=#

I thing, so 210 MB is more then is necessary :)

Regards
Pavel Stehule

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Peter Eisentraut (#9)

Re: Review: listagg aggregate

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

concat_agg().

David

#16

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Pavel Stehule (#12)

Re: Review: listagg aggregate

On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:

I am not happy, I thing so we do bigger chaos then it is. But it
hasn't progress. So I agree with name string_agg. In this case isn't a
problem rename this function if somebody would.

Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single argument in that case?

David

#17

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#15)

Re: Review: listagg aggregate

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

concat_agg().

I like that one...

...Robert

#18

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#16)

Re: Review: listagg aggregate

2010/1/25 David E. Wheeler <david@kineticode.com>:

On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:

I am not happy, I thing so we do bigger chaos then it is. But it
hasn't progress. So I agree with name string_agg. In this case isn't a
problem rename this function if somebody would.

Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single argument in that case?

we can, but without one common known name we will have a propriety name.

Pavel

Show quoted text

David

#19

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#17)

Re: Review: listagg aggregate

2010/1/25 Robert Haas <robertmhaas@gmail.com>:

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

concat_agg().

I like that one...

why is concat_agg better than listagg ?

Pavel

Show quoted text

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Pavel Stehule (#19)

Re: Review: listagg aggregate

On Jan 25, 2010, at 23:14, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

why is concat_agg better than listagg ?

Because it's an aggregate that cocatenates values. It's not an
aggregate that lists things. I also like concat_agg better than
string_agg because it's not limited to acting on strings.

Best,

David

#21

Alastair Turner

bell@ctrlf5.co.za

almost 16 years ago

In reply to: David E. Wheeler (#20)

Re: Review: listagg aggregate

On Tue, Jan 26, 2010 at 1:08 PM, David E. Wheeler <david@kineticode.com> wrote:

.....

Because it's an aggregate that cocatenates values. It's not an aggregate
that lists things. I also like concat_agg better than string_agg because
it's not limited to acting on strings.

.....

Given that it potentially produces a delimited list, not a straight
conacatenation (and that list is unacceptable since it would be
descriptive as a noun but not as a verb) would implode_agg not be the
most descriptive name?

#22

Alastair Turner

bell@ctrlf5.co.za

almost 16 years ago

In reply to: Alastair Turner (#21)

Re: Review: listagg aggregate

On Tue, Jan 26, 2010 at 1:23 PM, Alastair Turner <bell@ctrlf5.co.za> wrote:

.....

Given that it potentially produces a delimited list, not a straight
conacatenation (and that list is unacceptable since it would be
descriptive as a noun but not as a verb) would implode_agg not be the
most descriptive name?

Actually, scratch that. The other *agg functions are named for what
they produce as output. Not for their process - as per the objection
to list_agg and suggestions of conact_agg and implode_agg. string_agg
would be consistent, which is a wonderful thing if you can get it in a
naming scheme.

#23

Peter Eisentraut

peter_e@gmx.net

almost 16 years ago

In reply to: David E. Wheeler (#20)

Re: Review: listagg aggregate

On tis, 2010-01-26 at 03:08 -0800, David E. Wheeler wrote:

On Jan 25, 2010, at 23:14, Pavel Stehule <pavel.stehule@gmail.com>
wrote:

why is concat_agg better than listagg ?

Because it's an aggregate that cocatenates values. It's not an
aggregate that lists things. I also like concat_agg better than
string_agg because it's not limited to acting on strings.

What else can it act on?

#24

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#19)

Re: Review: listagg aggregate

On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/25 Robert Haas <robertmhaas@gmail.com>:

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

concat_agg().

I like that one...

why is concat_agg better than listagg ?

Because it doesn't make lists.

Honestly, I don't love concat_agg() either - why should something need
to have agg in the name just because it's an aggregate? I think the
most descriptive name would be something like
concatenate_with_separator(), but that's kind of long.

...Robert

#25

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#24)

Re: Review: listagg aggregate

2010/1/26 Robert Haas <robertmhaas@gmail.com>:

On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/25 Robert Haas <robertmhaas@gmail.com>:

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

xmlagg -> concatenates values to form xml datum
array_agg -> concatenates values to form array datum
??? -> concatenates values to form string datum

concat_agg().

I like that one...

why is concat_agg better than listagg ?

Because it doesn't make lists.

Honestly, I don't love concat_agg() either - why should something need
to have agg in the name just because it's an aggregate? I think the
most descriptive name would be something like
concatenate_with_separator(), but that's kind of long.

This is never ending story :)

MySQL has function concate_ws - but this function has different semantic.

I thing so string_agg is short, and from one view consistent

Pavel

Show quoted text

...Robert

#26

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Peter Eisentraut (#23)

Re: Review: listagg aggregate

On Jan 26, 2010, at 4:03, Peter Eisentraut <peter_e@gmx.net> wrote:

What else can it act on?

Any data type, since they all can be converted to text. Integers would
be a common choice.

David

#27

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Pavel Stehule (#19)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> writes:

2010/1/25 Robert Haas <robertmhaas@gmail.com>:

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler <david@kineticode.com> wrote:

concat_agg().

I like that one...

why is concat_agg better than listagg ?

It isn't ... it's the wrong part of speech. "concat"enate is a verb,
whereas the other functions we would like it to be named parallel to
are using nouns there.

(Yes, I know "array" can be used as a verb, but I don't think anyone
reads it that way in "array_agg"...)

regards, tom lane

#28

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 16 years ago

In reply to: Tom Lane (#27)

Re: Review: listagg aggregate

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Stehule <pavel.stehule@gmail.com> writes:

why is concat_agg better than listagg ?

It isn't ... it's the wrong part of speech. "concat"enate is a
verb,

Concatenation is a noun. "concat" doesn't get far enough to
distinguish.

-Kevin

#29

Erik Rijkers

er@xs4all.nl

almost 16 years ago

In reply to: Kevin Grittner (#28)

testing cvs HEAD - HS/SR - missing file

I don't know if it has anything to do with HS/SR, I haven't tried it on a single CVS vanilla
installation (yet).

To test with HS/SR, I've setup three 9.0devel instances (cvs as of today) on a single machine, one
as a primary, and two as slaves. I used the instructions in

http://wiki.postgresql.org/wiki/Streaming_Replication

and

./configure --prefix=/var/data1/pg_stuff/pg_installations/pgsql.sr_primary \
--with-pgport=6565 --quiet --enable-depend --with-openssl --with-perl \
--with-libxml --with-libxslt

These seem to work/replicate well.

Now, when restoring a 700MB dump (made with a 9.0devel pg_dump + pg_restore) into the primary,
errors like the following occur, on all three instances:

FATAL: could not open file "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such
file or directory

So, there are three logfiles:
pgsql.sr_primary/logfile
pgsql.sr_slavery/logfile
pgsql.sr_slave02/logfile

$ grep -E 'FATAL|ERROR' pgsql.sr_*/logfile
pgsql.sr_primary/logfile:ERROR: canceling autovacuum task
pgsql.sr_primary/logfile:ERROR: canceling autovacuum task
pgsql.sr_primary/logfile:FATAL: could not open file "pg_xlog/0000000100000001000000FF" (log file
1, segment 255): No such file or directory
pgsql.sr_primary/logfile:FATAL: could not open file "pg_xlog/0000000100000001000000FF" (log file
1, segment 255): No such file or directory
pgsql.sr_primary/logfile:ERROR: canceling autovacuum task
pgsql.sr_primary/logfile:ERROR: canceling autovacuum task
pgsql.sr_slave02/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
pgsql.sr_slavery/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory

This has happened several times, always 'segment' 255, 'log file' 1, 3 or 4.

hth,

Erik Rijkers

#30

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: David E. Wheeler (#20)

Re: Review: listagg aggregate

"David E. Wheeler" <david@kineticode.com> writes:

Because it's an aggregate that cocatenates values. It's not an
aggregate that lists things. I also like concat_agg better than
string_agg because it's not limited to acting on strings.

But what it *produces* is a string. For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

regards, tom lane

#31

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 16 years ago

In reply to: Erik Rijkers (#29)

Re: testing cvs HEAD - HS/SR - missing file

"Erik Rijkers" <er@xs4all.nl> wrote:

FATAL: could not open file "pg_xlog/0000000100000001000000FF"
(log file 1, segment 255): No such file or directory

Yeah, log file segment numbers skip FF; they go from FE to 00 with
the "log file" number (the middle part) bumped by one. Whatever is
coming up with that segment number is wrong. (Unless this is
changing in 9.0?)

-Kevin

#32

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Tom Lane (#30)

Re: Review: listagg aggregate

On Jan 26, 2010, at 9:36 AM, Tom Lane wrote:

But what it *produces* is a string. For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been great. The inclusion of "agg" in the name is unfortunate.

I'll have a look at Pavel's new patch now.

David

#33

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: David E. Wheeler (#32)

Re: Review: listagg aggregate

"David E. Wheeler" <david@kineticode.com> writes:

Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been great. The inclusion of "agg" in the name is unfortunate.

Yeah, I wouldn't be for it either if it weren't for the precedent of
array_agg. I was quite surprised that the SQL committee chose that
name, because they've avoided using the term "aggregate function" at
all, but there it is ...

regards, tom lane

#34

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Pavel Stehule (#13)

Re: Review: listagg aggregate

On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:

actualised patch - the name is string_agg

All looks fine except I'm getting this error during initdb:

creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL: could not create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(3031) is duplicated.
child process exited with exit code 1

Would you mind re-submitting with unique OIDs?

Thanks,

David

#35

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#30)

Re: Review: listagg aggregate

On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David E. Wheeler" <david@kineticode.com> writes:

Because it's an aggregate that cocatenates values. It's not an
aggregate that lists things. I also like concat_agg better than
string_agg because it's not limited to acting on strings.

But what it *produces* is a string. For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

This point is well-taken, but naming it string_agg() because it
produces a string doesn't seem quite descriptive enough. We might
someday (if we don't already) have a number of aggregates that produce
an output that is a string; we can't name them all by the output type.

...Robert

#36

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#35)

Re: Review: listagg aggregate

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But what it *produces* is a string. �For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

This point is well-taken, but naming it string_agg() because it
produces a string doesn't seem quite descriptive enough. We might
someday (if we don't already) have a number of aggregates that produce
an output that is a string; we can't name them all by the output type.

True, but the same point could be made against array_agg, and that
didn't stop the committee from choosing that name. As long as
string_agg is the "most obvious" aggregate-to-string functionality,
which ISTM it is, I think it's all right for it to have pride of place
in naming.

regards, tom lane

#37

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Erik Rijkers (#29)

Re: testing cvs HEAD - HS/SR - missing file

On Wed, Jan 27, 2010 at 2:19 AM, Erik Rijkers <er@xs4all.nl> wrote:

Now, when restoring a 700MB dump (made with a 9.0devel pg_dump + pg_restore) into the primary,
errors like the following occur, on all three instances:

FATAL: could not open file "pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such
file or directory

Thanks for the report!

This is the bug of SR :( I think that walsender wrongly treats the WAL-boundary.

pgsql.sr_slave02/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
pgsql.sr_slavery/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory

Also the ReadRecord() or its surrounding functions seem to have treated
wrongly the WAL-boundary.

I'll fix those bugs after a night's sleep.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#38

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Fujii Masao (#37)

1 attachment(s)

Re: testing cvs HEAD - HS/SR - missing file

On Wed, Jan 27, 2010 at 4:02 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

This is the bug of SR :( I think that walsender wrongly treats the WAL-boundary.

The attached patch would fix this bug.

pgsql.sr_slave02/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory
pgsql.sr_slavery/logfile:ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/0000000100000001000000FF" (log file 1, segment 255): No such file or directory

Also the ReadRecord() or its surrounding functions seem to have treated
wrongly the WAL-boundary.

Oops! I've misread the log messages. This is not the bug.
Because of the FATAL error from the primary, the standby
seems to just emit an ERROR message, and exit.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

sr_xlogid_boundary.patchtext/x-patch; charset=US-ASCII; name=sr_xlogid_boundary.patchDownload

*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 661,666 **** XLogSend(StringInfo outMsg)
--- 661,673 ----
  
  		sentPtr = endptr;
  
+ 		if (sentPtr.xrecoff >= XLogFileSize)
+ 		{
+ 			/* crossing a logid boundary */
+ 			sentPtr.xlogid += 1;
+ 			sentPtr.xrecoff = 0;
+ 		}
+ 
  		/*
  		 * Read the log directly into the output buffer to prevent
  		 * extra memcpy calls.

#39

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Fujii Masao (#38)

Re: testing cvs HEAD - HS/SR - missing file

Fujii Masao wrote:

*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***************
*** 661,666 **** XLogSend(StringInfo outMsg)
--- 661,673 ----

sentPtr = endptr;

+ 		if (sentPtr.xrecoff >= XLogFileSize)
+ 		{
+ 			/* crossing a logid boundary */
+ 			sentPtr.xlogid += 1;
+ 			sentPtr.xrecoff = 0;
+ 		}
+ 
/*
* Read the log directly into the output buffer to prevent
* extra memcpy calls.

Before that, endptr is advanced using XLByteAdvance() macro, which does
handle xlogid boundaries. Is XLByteAdvance() broken?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#40

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Heikki Linnakangas (#39)

Re: testing cvs HEAD - HS/SR - missing file

On Wed, Jan 27, 2010 at 7:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Before that, endptr is advanced using XLByteAdvance() macro, which does
handle xlogid boundaries. Is XLByteAdvance() broken?

No. The cause of the bug is that endptr might be set to the SendRqstPtr
that has crossed a xlogid boundary in the following code.

/* if we went beyond SendRqstPtr, back off */
if (XLByteLT(SendRqstPtr, endptr))
endptr = SendRqstPtr;

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#41

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Fujii Masao (#40)

Re: testing cvs HEAD - HS/SR - missing file

Fujii Masao wrote:

On Wed, Jan 27, 2010 at 7:05 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Before that, endptr is advanced using XLByteAdvance() macro, which does
handle xlogid boundaries. Is XLByteAdvance() broken?

No. The cause of the bug is that endptr might be set to the SendRqstPtr
that has crossed a xlogid boundary in the following code.

/* if we went beyond SendRqstPtr, back off */
if (XLByteLT(SendRqstPtr, endptr))
endptr = SendRqstPtr;

But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#42

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Heikki Linnakangas (#41)

Re: testing cvs HEAD - HS/SR - missing file

On Wed, Jan 27, 2010 at 8:37 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?

Right. But the point is that LogwrtResult.Write might indicate 0/FF000000
because it's the last byte + 1 written out. XLogRead() treats this as the
location where WAL begins to be read, so converts it to the WAL file name
0000000100000000000000FF by using XLByteToSeg.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#43

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#34)

1 attachment(s)

Re: Review: listagg aggregate

with actualised oids

Regards
Pavel Stehule

2010/1/26 David E. Wheeler <david@kineticode.com>:

Show quoted text

On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:

actualised patch - the name is string_agg

All looks fine except I'm getting this error during initdb:

creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL: could not create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(3031) is duplicated.
child process exited with exit code 1

Would you mind re-submitting with unique OIDs?

Thanks,

David

Attachments:

string_agg.diffapplication/octet-stream; name=string_agg.diffDownload

*** ./doc/src/sgml/func.sgml.orig	2010-01-19 06:50:18.000000000 +0100
--- ./doc/src/sgml/func.sgml	2010-01-25 15:48:05.050382968 +0100
***************
*** 1789,1794 ****
--- 1789,1798 ----
      </tgroup>
     </table>
  
+    <para>
+    See also <xref linkend="functions-aggregate"> about the aggregate
+    function <function>string_agg</function>.
+    </para>
  
     <table id="conversion-names">
      <title>Built-in Conversions</title>
***************
*** 9811,9816 ****
--- 9815,9837 ----
       </row>
  
       <row>
+       <entry>
+        <indexterm>
+         <primary>string_agg</primary>
+        </indexterm>
+        <function>string_agg(<replaceable class="parameter">expression</replaceable> 
+                      [, <replaceable class="parameter">expression</replaceable> ] )</function>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>input values concatenated into an string, optionally separated by the second argument</entry>
+      </row>
+ 
+      <row>
        <entry><function>sum(<replaceable class="parameter">expression</replaceable>)</function></entry>
        <entry>
         <type>smallint</type>, <type>int</type>,
*** ./src/backend/utils/adt/varlena.c.orig	2010-01-02 17:57:55.000000000 +0100
--- ./src/backend/utils/adt/varlena.c	2010-01-25 15:34:13.259507851 +0100
***************
*** 18,26 ****
--- 18,28 ----
  
  #include "access/tuptoaster.h"
  #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "nodes/execnodes.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
  #include "utils/builtins.h"
***************
*** 48,53 ****
--- 50,67 ----
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ /* type for state data of string_agg function */
+ typedef struct
+ {
+ 	StringInfo	strInfo;
+ 	char	delimiter[1];		/* separator string - one or more chars */
+ } StringAggState;
+ 
+ static StringAggState *accumStringResult(StringAggState *state, 
+ 						    text *elem, 
+ 						    text *delimiter, 
+ 						    MemoryContext aggcontext);
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
***************
*** 3143,3145 ****
--- 3157,3311 ----
  
  	PG_RETURN_INT32(result);
  }
+ 
+ /*
+  * string_agg
+  *  
+  * Concates values and returns string.
+  *
+  * Syntax:
+  *     FUNCTION string_agg(string text, delimiter text = '')
+  *      RETURNS text;
+  *
+  * Note: any NULL value is ignored.
+  */
+ static StringAggState *
+ accumStringResult(StringAggState *state, text *elem, text *delimiter, 
+ 						    MemoryContext aggcontext)
+ {
+ 	MemoryContext	oldcontext;
+ 	
+ 	/* 
+ 	 * when state is NULL, create new state value.
+ 	 */
+ 	if (state == NULL)
+ 	{
+ 		if (delimiter != NULL)
+ 		{
+ 			char *dstr = text_to_cstring(delimiter);
+ 			int len = strlen(dstr);
+ 			
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(StringAggState) + len);
+ 			
+ 			/* copy delimiter to state var */
+ 			memcpy(&state->delimiter, dstr, len + 1);
+ 		}
+ 		else
+ 		{
+ 			oldcontext = MemoryContextSwitchTo(aggcontext);
+ 			state = palloc(sizeof(StringAggState));
+ 			state->delimiter[0] = '\0';
+ 		}
+ 		
+ 		/* Initialise StringInfo */
+ 		state->strInfo = NULL;
+ 		
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	/* only when element isn't null */
+ 	if (elem != NULL)
+ 	{
+ 		char	*value = text_to_cstring(elem);
+ 
+ 		oldcontext = MemoryContextSwitchTo(aggcontext);
+ 		if (state->strInfo != NULL)
+ 			appendStringInfoString(state->strInfo, state->delimiter);
+ 		else
+ 			state->strInfo = makeStringInfo();
+ 			
+ 		appendStringInfoString(state->strInfo, value);
+ 		MemoryContextSwitchTo(oldcontext);
+ 	}
+ 	
+ 	return state;
+ }
+ 
+ Datum
+ string_agg1_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	StringAggState *state = NULL;
+ 	text *elem;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg1_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (StringAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_PP(1);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    NULL,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.	
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum 
+ string_agg2_transfn(PG_FUNCTION_ARGS)
+ {
+ 	MemoryContext	aggcontext;
+ 	StringAggState *state = NULL;
+ 	text *elem;
+ 	text *delimiter = NULL;
+ 
+ 	if (fcinfo->context && IsA(fcinfo->context, AggState))
+ 		aggcontext = ((AggState *) fcinfo->context)->aggcontext;
+ 	else if (fcinfo->context && IsA(fcinfo->context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) fcinfo->context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg2_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 	
+ 	state = PG_ARGISNULL(0) ? NULL : (StringAggState *) PG_GETARG_POINTER(0);
+ 	elem = PG_ARGISNULL(1) ? NULL : PG_GETARG_TEXT_PP(1);
+ 	delimiter = PG_ARGISNULL(2) ? NULL : PG_GETARG_TEXT_PP(2);
+ 	
+ 	state = accumStringResult(state,
+ 					    elem,
+ 					    delimiter,
+ 					    aggcontext);
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum
+ string_agg_finalfn(PG_FUNCTION_ARGS)
+ {
+ 	StringAggState *state = NULL;
+ 
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 		
+ 	/* cannot be called directly because of internal-type argument */
+ 	Assert(fcinfo->context &&
+ 		    (IsA(fcinfo->context, AggState) ||
+ 			    IsA(fcinfo->context, WindowAggState)));
+ 	
+ 	state = (StringAggState *) PG_GETARG_POINTER(0);
+ 	if (state->strInfo != NULL)
+ 		PG_RETURN_TEXT_P(cstring_to_text(state->strInfo->data));
+ 	else
+ 		PG_RETURN_NULL();
+ }
*** ./src/include/catalog/pg_aggregate.h.orig	2010-01-05 02:06:56.000000000 +0100
--- ./src/include/catalog/pg_aggregate.h	2010-01-25 15:14:08.953574327 +0100
***************
*** 223,228 ****
--- 223,232 ----
  /* array */
  DATA(insert ( 2335	array_agg_transfn	array_agg_finalfn		0	2281	_null_ ));
  
+ /* text */
+ DATA(insert (3537	string_agg1_transfn	string_agg_finalfn			0	2281	_null_ ));
+ DATA(insert (3538	string_agg2_transfn	string_agg_finalfn			0	2281	_null_ ));
+ 
  /*
   * prototypes for functions in pg_aggregate.c
   */
*** ./src/include/catalog/pg_proc.h.orig	2010-01-19 15:11:32.000000000 +0100
--- ./src/include/catalog/pg_proc.h	2010-01-25 15:19:30.631574450 +0100
***************
*** 2818,2823 ****
--- 2818,2834 ----
  DATA(insert OID = 2817 (  float8_corr				PGNSP PGUID 12 1 0 0 f f f t f i 1 0 701 "1022" _null_ _null_ _null_ _null_ float8_corr _null_ _null_ _null_ ));
  DESCR("CORR(double, double) aggregate final function");
  
+ DATA(insert OID = 3534 (  string_agg1_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 2 0 2281 "2281 25" _null_ _null_ _null_ _null_ string_agg1_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg one param transition function");
+ DATA(insert OID = 3535 (  string_agg2_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ string_agg2_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg two params transition function");
+ DATA(insert OID = 3536 (  string_agg_finalfn   PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_ _null_ _null_ string_agg_finalfn _null_ _null_ _null_ ));
+ DESCR("string_agg final function");
+ DATA(insert OID = 3537 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 1 0 25 "25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string");
+ DATA(insert OID = 3538 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string with delimiter");
+ 
  /* To ASCII conversion */
  DATA(insert OID = 1845 ( to_ascii	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_	to_ascii_default _null_ _null_ _null_ ));
  DESCR("encode text from DB encoding to ASCII text");
*** ./src/include/utils/builtins.h.orig	2010-01-19 06:50:18.000000000 +0100
--- ./src/include/utils/builtins.h	2010-01-25 15:22:44.784448955 +0100
***************
*** 722,727 ****
--- 722,731 ----
  
  extern Datum pg_column_size(PG_FUNCTION_ARGS);
  
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
+ 
  /* version.c */
  extern Datum pgsql_version(PG_FUNCTION_ARGS);
  
***************
*** 770,775 ****
--- 774,782 ----
  extern Datum chr (PG_FUNCTION_ARGS);
  extern Datum repeat(PG_FUNCTION_ARGS);
  extern Datum ascii(PG_FUNCTION_ARGS);
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
  
  /* inet_net_ntop.c */
  extern char *inet_net_ntop(int af, const void *src, int bits,
*** ./src/test/regress/expected/aggregates.out.orig	2009-12-15 18:57:47.000000000 +0100
--- ./src/test/regress/expected/aggregates.out	2010-01-25 15:54:06.000000000 +0100
***************
*** 799,801 ****
--- 799,832 ----
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
  LINE 1: select aggfns(distinct a,a,c order by a,b)
                                                  ^
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+   string_agg  
+ --------------
+  aaaabbbbcccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+  string_agg 
+ ------------
+  bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null)) g(a);
+  string_agg 
+ ------------
+  
+ (1 row)
+ 
*** ./src/test/regress/sql/aggregates.sql.orig	2009-12-15 18:57:48.000000000 +0100
--- ./src/test/regress/sql/aggregates.sql	2010-01-25 15:53:48.406291403 +0100
***************
*** 355,357 ****
--- 355,364 ----
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
  select aggfns(distinct a,a,c order by a,b)
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
+ 
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null)) g(a);

#44

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 16 years ago

In reply to: Fujii Masao (#42)

Re: testing cvs HEAD - HS/SR - missing file

Fujii Masao wrote:

On Wed, Jan 27, 2010 at 8:37 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

But SendRqstPtr comes from LogwrtResult.Write, surely that's correct, no?

Right. But the point is that LogwrtResult.Write might indicate 0/FF000000
because it's the last byte + 1 written out. XLogRead() treats this as the
location where WAL begins to be read, so converts it to the WAL file name
0000000100000000000000FF by using XLByteToSeg.

Ah, I see it now, thanks. How confusing.

Your patch clearly works, but I wonder how we should logically treat
that boundary value. You're setting sentPtr to the beginning of the
logid, when LogwrtResult.Write is 0/FF000000. So you're logically
thinking that the non-existent FF log segment has been sent to the
standby as soon as the previous segment has been fully sent. Another
option would be to set SendRqstPtr to the beginning of next logid, when
LogwrtResult.Write is 0/FF000000.. Yet another option would be to set
startptr to the beginning of next logid, when sentPtr is 0/FF000000:

***************
*** 633,638 ****
--- 633,648 ----
  		 * WAL record.
  		 */
  		startptr = sentPtr;
+ 		if (startptr.xrecoff >= XLogFileSize)
+ 		{
+ 			/*
+ 			 * crossing a logid boundary, skip the non-existent last log
+ 			 * segment in previous logical log file.
+ 			 */
+ 			startptr.xlogid += 1;
+ 			startptr.xrecoff = 0;
+ 		}
+
  		endptr = startptr;
  		XLByteAdvance(endptr, MAX_SEND_SIZE);
  		/* round down to page boundary. */

I think I prefer that, it feels logically more correct. Setting sentPtr
beyond LogwrtResult.Write feels wrong, even though 0/FF000000 and
1/00000000 are really the same physical location. I'll commit it that way.

I also note that I broke that same thing in ReadRecord() in the
retry-logic patch I just comitted. I'll fix that too.

Thanks for the testing, Erik!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#45

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#36)

Re: Review: listagg aggregate

On Tue, Jan 26, 2010 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But what it *produces* is a string. For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

This point is well-taken, but naming it string_agg() because it
produces a string doesn't seem quite descriptive enough. We might
someday (if we don't already) have a number of aggregates that produce
an output that is a string; we can't name them all by the output type.

True, but the same point could be made against array_agg, and that
didn't stop the committee from choosing that name. As long as
string_agg is the "most obvious" aggregate-to-string functionality,
which ISTM it is, I think it's all right for it to have pride of place
in naming.

Maybe so, but personally, I'd still prefer something more descriptive.

...Robert

#46

Erik Rijkers

er@xs4all.nl

almost 16 years ago

In reply to: Heikki Linnakangas (#44)

Re: testing cvs HEAD - HS/SR - missing file

On Wed, January 27, 2010 17:38, Heikki Linnakangas wrote:

I'll commit it that way.

This is just to let you know that these commits seem to have solved the bug.

I repeated the original test (700 MB pg_restore into a primary,
with two replicating slaves, on one machine).

Checking afterwards:
$ for port in 6565 6566 6567;
do
psql -qtAp $port -c "select pg_database_size('${PGDATABASE}'), version()";
done
17057193792|PostgreSQL 9.0devel-sr_primary [...]
17057193792|PostgreSQL 9.0devel-sr_slavery
17057193792|PostgreSQL 9.0devel-sr_slave02

So that looked promising :) To make sure I also COPYd
all tables to file and diff'ed them. There were indeed
no differences.

I did get at one point in the sr_slave02 log: (consecutive lines)

LOG: consistent recovery state reached at 0/2000000
LOG: database system is ready to accept read only connections
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection

but apparently this is not problematic?

HS/SR is a fantastic set of features. I'll keep hammering away a bit at the dynamic duo; if you
have specific testing ideas, let me know.

Thanks!

Erik Rijkers

#47

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Pavel Stehule (#43)

Re: Review: listagg aggregate

On Jan 27, 2010, at 7:58 AM, Pavel Stehule wrote:

with actualised oids

Thanks. Looks good, modulo my preference for concat_agg(). I'll mark it ready for committer.

Best,

David

#48

Fujii Masao

masao.fujii@gmail.com

almost 16 years ago

In reply to: Erik Rijkers (#46)

Re: testing cvs HEAD - HS/SR - missing file

On Thu, Jan 28, 2010 at 9:32 AM, Erik Rijkers <er@xs4all.nl> wrote:

On Wed, January 27, 2010 17:38, Heikki Linnakangas wrote:

I'll commit it that way.

Thanks Heikki! Yeah, your approach makes more sense.

This is just to let you know that these commits seem to have solved the bug.

Good! Thanks Erik!

I did get at one point in the sr_slave02 log: (consecutive lines)

LOG: consistent recovery state reached at 0/2000000
LOG: database system is ready to accept read only connections
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection

but apparently this is not problematic?

It seems to me that you've unexpectedly killed (e.g., kill -9)
a client connected to the standby during executing a read-only
query.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#49

Takahiro Itagaki

itagaki.takahiro@oss.ntt.co.jp

almost 16 years ago

In reply to: Pavel Stehule (#43)

1 attachment(s)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> wrote:

with actualised oids

I'm checking the patch for commit, and have a couple of comments.

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

* Can we use StringInfo directly as the aggregate context instead of
StringAggState? For the first reason, we need to drop 'delimiter' field
from struct StringAggState. Now it has only StringInfo field.

* We'd better avoiding to call text_to_cstring() for delimitors and elements
for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

string_agg_20100128.patchapplication/octet-stream; name=string_agg_20100128.patchDownload

diff -cprN head/doc/src/sgml/func.sgml work/doc/src/sgml/func.sgml
*** head/doc/src/sgml/func.sgml	Tue Jan 26 11:47:08 2010
--- work/doc/src/sgml/func.sgml	Thu Jan 28 09:58:12 2010
***************
*** 1789,1794 ****
--- 1789,1798 ----
      </tgroup>
     </table>
  
+    <para>
+    See also <xref linkend="functions-aggregate"> about the aggregate
+    function <function>string_agg</function>.
+    </para>
  
     <table id="conversion-names">
      <title>Built-in Conversions</title>
*************** SELECT NULLIF(value, '(none)') ...
*** 9837,9842 ****
--- 9841,9863 ----
       </row>
  
       <row>
+       <entry>
+        <indexterm>
+         <primary>string_agg</primary>
+        </indexterm>
+        <function>string_agg(<replaceable class="parameter">expression</replaceable> 
+                      [, <replaceable class="parameter">expression</replaceable> ] )</function>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>
+        <type>text</type>
+       </entry>
+       <entry>input values concatenated into an string, optionally separated by the second argument</entry>
+      </row>
+ 
+      <row>
        <entry><function>sum(<replaceable class="parameter">expression</replaceable>)</function></entry>
        <entry>
         <type>smallint</type>, <type>int</type>,
diff -cprN head/src/backend/utils/adt/varlena.c work/src/backend/utils/adt/varlena.c
*** head/src/backend/utils/adt/varlena.c	Tue Jan 26 11:47:08 2010
--- work/src/backend/utils/adt/varlena.c	Thu Jan 28 11:29:55 2010
***************
*** 18,26 ****
--- 18,28 ----
  
  #include "access/tuptoaster.h"
  #include "catalog/pg_type.h"
+ #include "lib/stringinfo.h"
  #include "libpq/md5.h"
  #include "libpq/pqformat.h"
  #include "miscadmin.h"
+ #include "nodes/execnodes.h"
  #include "parser/scansup.h"
  #include "regex/regex.h"
  #include "utils/builtins.h"
*************** static bytea *bytea_substring(Datum str,
*** 73,78 ****
--- 75,81 ----
  				int L,
  				bool length_not_specified);
  static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
+ static StringInfo makeStringAggState(fmNodePtr context);
  
  
  /*****************************************************************************
*************** pg_column_size(PG_FUNCTION_ARGS)
*** 3315,3317 ****
--- 3318,3434 ----
  
  	PG_RETURN_INT32(result);
  }
+ 
+ /*
+  * string_agg
+  *  
+  * Concates values and returns string.
+  *
+  * Syntax:
+  *     FUNCTION string_agg(string text, delimiter text = '')
+  *      RETURNS text;
+  *
+  * Note: any NULL value is ignored.
+  */
+ static StringInfo
+ makeStringAggState(fmNodePtr context)
+ {
+ 	StringInfo		state;
+ 	MemoryContext	aggcontext;
+ 	MemoryContext	oldcontext;
+ 
+ 	if (context && IsA(context, AggState))
+ 		aggcontext = ((AggState *) context)->aggcontext;
+ 	else if (context && IsA(context, WindowAggState))
+ 		aggcontext = ((WindowAggState *) context)->wincontext;
+ 	else
+ 	{
+ 		/* cannot be called directly because of internal-type argument */
+ 		elog(ERROR, "string_agg_transfn called in non-aggregate context");
+ 		aggcontext = NULL;		/* keep compiler quiet */
+ 	}
+ 
+ 	/* Create state in aggregate context */
+ 	oldcontext = MemoryContextSwitchTo(aggcontext);
+ 	state = makeStringInfo();
+ 	MemoryContextSwitchTo(oldcontext);
+ 
+ 	return state;
+ }
+ 
+ Datum
+ string_agg1_transfn(PG_FUNCTION_ARGS)
+ {
+ 	StringInfo state;
+ 
+ 	state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);
+ 
+ 	/* Append the element only when it isn't null */
+ 	if (!PG_ARGISNULL(1))
+ 	{
+ 		text *elem = PG_GETARG_TEXT_PP(1);
+ 
+ 		if (state == NULL)
+ 			state = makeStringAggState(fcinfo->context);
+ 		appendBinaryStringInfo(state, VARDATA_ANY(elem),
+ 							   VARSIZE_ANY_EXHDR(elem));
+ 	}
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.	
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum 
+ string_agg2_transfn(PG_FUNCTION_ARGS)
+ {
+ 	StringInfo state;
+ 
+ 	state = PG_ARGISNULL(0) ? NULL : (StringInfo) PG_GETARG_POINTER(0);
+ 
+ 	/* Append the element only when it isn't null */
+ 	if (!PG_ARGISNULL(1))
+ 	{
+ 		text *elem = PG_GETARG_TEXT_PP(1);
+ 
+ 		if (state == NULL)
+ 			state = makeStringAggState(fcinfo->context);
+ 		else if (!PG_ARGISNULL(2))
+ 		{
+ 			text *delim = PG_GETARG_TEXT_PP(2);
+ 			appendBinaryStringInfo(state, VARDATA_ANY(delim),
+ 								   VARSIZE_ANY_EXHDR(delim));
+ 		}
+ 
+ 		appendBinaryStringInfo(state, VARDATA_ANY(elem),
+ 							   VARSIZE_ANY_EXHDR(elem));
+ 	}
+ 
+ 	/*
+ 	 * The transition type for string_agg() is declared to be "internal", which
+ 	 * is a pass-by-value type the same size as a pointer.
+ 	 */
+ 	PG_RETURN_POINTER(state);
+ }
+ 
+ Datum
+ string_agg_finalfn(PG_FUNCTION_ARGS)
+ {
+ 	StringInfo state;
+ 
+ 	if (PG_ARGISNULL(0))
+ 		PG_RETURN_NULL();
+ 
+ 	/* cannot be called directly because of internal-type argument */
+ 	Assert(fcinfo->context &&
+ 		    (IsA(fcinfo->context, AggState) ||
+ 			    IsA(fcinfo->context, WindowAggState)));
+ 
+ 	state = (StringInfo) PG_GETARG_POINTER(0);
+ 	if (state != NULL)
+ 		PG_RETURN_TEXT_P(cstring_to_text(state->data));
+ 	else
+ 		PG_RETURN_NULL();
+ }
diff -cprN head/src/include/catalog/pg_aggregate.h work/src/include/catalog/pg_aggregate.h
*** head/src/include/catalog/pg_aggregate.h	Tue Jan  5 10:25:20 2010
--- work/src/include/catalog/pg_aggregate.h	Thu Jan 28 09:58:12 2010
*************** DATA(insert ( 2901 xmlconcat2	  -					0	
*** 223,228 ****
--- 223,232 ----
  /* array */
  DATA(insert ( 2335	array_agg_transfn	array_agg_finalfn		0	2281	_null_ ));
  
+ /* text */
+ DATA(insert (3537	string_agg1_transfn	string_agg_finalfn			0	2281	_null_ ));
+ DATA(insert (3538	string_agg2_transfn	string_agg_finalfn			0	2281	_null_ ));
+ 
  /*
   * prototypes for functions in pg_aggregate.c
   */
diff -cprN head/src/include/catalog/pg_proc.h work/src/include/catalog/pg_proc.h
*** head/src/include/catalog/pg_proc.h	Tue Jan 26 11:47:08 2010
--- work/src/include/catalog/pg_proc.h	Thu Jan 28 09:58:12 2010
*************** DESCR("COVAR_SAMP(double, double) aggreg
*** 2830,2835 ****
--- 2830,2846 ----
  DATA(insert OID = 2817 (  float8_corr				PGNSP PGUID 12 1 0 0 f f f t f i 1 0 701 "1022" _null_ _null_ _null_ _null_ float8_corr _null_ _null_ _null_ ));
  DESCR("CORR(double, double) aggregate final function");
  
+ DATA(insert OID = 3534 (  string_agg1_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 2 0 2281 "2281 25" _null_ _null_ _null_ _null_ string_agg1_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg one param transition function");
+ DATA(insert OID = 3535 (  string_agg2_transfn   PGNSP PGUID 12 1 0 0 f f f f f i 3 0 2281 "2281 25 25" _null_ _null_ _null_ _null_ string_agg2_transfn _null_ _null_ _null_ ));
+ DESCR("string_agg two params transition function");
+ DATA(insert OID = 3536 (  string_agg_finalfn   PGNSP PGUID 12 1 0 0 f f f f f i 1 0 25 "2281" _null_ _null_ _null_ _null_ string_agg_finalfn _null_ _null_ _null_ ));
+ DESCR("string_agg final function");
+ DATA(insert OID = 3537 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 1 0 25 "25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string");
+ DATA(insert OID = 3538 (  string_agg		   PGNSP PGUID 12 1 0 0 t f f f f i 2 0 25 "25 25" _null_ _null_ _null_ _null_ aggregate_dummy _null_ _null_ _null_ ));
+ DESCR("concatenate aggregate input into an string with delimiter");
+ 
  /* To ASCII conversion */
  DATA(insert OID = 1845 ( to_ascii	PGNSP PGUID 12 1 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_	to_ascii_default _null_ _null_ _null_ ));
  DESCR("encode text from DB encoding to ASCII text");
diff -cprN head/src/include/utils/builtins.h work/src/include/utils/builtins.h
*** head/src/include/utils/builtins.h	Tue Jan 26 11:47:08 2010
--- work/src/include/utils/builtins.h	Thu Jan 28 09:58:12 2010
*************** extern Datum unknownsend(PG_FUNCTION_ARG
*** 724,729 ****
--- 724,733 ----
  
  extern Datum pg_column_size(PG_FUNCTION_ARGS);
  
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
+ 
  /* version.c */
  extern Datum pgsql_version(PG_FUNCTION_ARGS);
  
*************** extern Datum translate(PG_FUNCTION_ARGS)
*** 772,777 ****
--- 776,784 ----
  extern Datum chr (PG_FUNCTION_ARGS);
  extern Datum repeat(PG_FUNCTION_ARGS);
  extern Datum ascii(PG_FUNCTION_ARGS);
+ extern Datum string_agg1_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg2_transfn(PG_FUNCTION_ARGS);
+ extern Datum string_agg_finalfn(PG_FUNCTION_ARGS);
  
  /* inet_net_ntop.c */
  extern char *inet_net_ntop(int af, const void *src, int bits,
diff -cprN head/src/test/regress/expected/aggregates.out work/src/test/regress/expected/aggregates.out
*** head/src/test/regress/expected/aggregates.out	Wed Dec 16 09:17:22 2009
--- work/src/test/regress/expected/aggregates.out	Thu Jan 28 11:41:07 2010
*************** select aggfns(distinct a,a,c order by a,
*** 799,801 ****
--- 799,838 ----
  ERROR:  in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
  LINE 1: select aggfns(distinct a,a,c order by a,b)
                                                  ^
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+   string_agg  
+ --------------
+  aaaabbbbcccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+    string_agg   
+ ----------------
+  aaaa,bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+  string_agg 
+ ------------
+  bbbb,cccc
+ (1 row)
+ 
+ select string_agg(a,',') from (values(null),(null)) g(a);
+  string_agg 
+ ------------
+  
+ (1 row)
+ 
+ select string_agg(elem, delim) from (values('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
+  string_agg 
+ ------------
+  A+B*C
+ (1 row)
+ 
diff -cprN head/src/test/regress/sql/aggregates.sql work/src/test/regress/sql/aggregates.sql
*** head/src/test/regress/sql/aggregates.sql	Wed Dec 16 09:17:22 2009
--- work/src/test/regress/sql/aggregates.sql	Thu Jan 28 11:40:15 2010
*************** select aggfns(distinct a,b,c order by a,
*** 355,357 ****
--- 355,365 ----
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
  select aggfns(distinct a,a,c order by a,b)
    from (values (1,1,'foo')) v(a,b,c), generate_series(1,2) i;
+ 
+ -- string_agg tests
+ select string_agg(a) from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values('aaaa'),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null),('bbbb'),('cccc')) g(a);
+ select string_agg(a,',') from (values(null),(null)) g(a);
+ select string_agg(elem, delim) from (values('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);

#50

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Takahiro Itagaki (#49)

Re: Review: listagg aggregate

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

Ooh, nice.

* Can we use StringInfo directly as the aggregate context instead of
StringAggState? For the first reason, we need to drop 'delimiter' field
from struct StringAggState. Now it has only StringInfo field.

Makes sense.

* We'd better avoiding to call text_to_cstring() for delimitors and elements
for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Yes please.

Comments?

Patch looks great, thank you!

David

#51

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Takahiro Itagaki (#49)

Re: Review: listagg aggregate

2010/1/28 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:

Pavel Stehule <pavel.stehule@gmail.com> wrote:

with actualised oids

I'm checking the patch for commit, and have a couple of comments.

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

no I dislike it. This using is nonsense.

Regards
Pavel

Show quoted text

* Can we use StringInfo directly as the aggregate context instead of
StringAggState? For the first reason, we need to drop 'delimiter' field
from struct StringAggState. Now it has only StringInfo field.

* We'd better avoiding to call text_to_cstring() for delimitors and elements
for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#52

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#50)

Re: Review: listagg aggregate

2010/1/28 David E. Wheeler <david@kineticode.com>:

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

Ooh, nice.

* Can we use StringInfo directly as the aggregate context instead of
StringAggState? For the first reason, we need to drop 'delimiter' field
from struct StringAggState. Now it has only StringInfo field.

Makes sense.

no, has not.

Pavel

Show quoted text

* We'd better avoiding to call text_to_cstring() for delimitors and elements
for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Yes please.

Comments?

Patch looks great, thank you!

David

#53

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#52)

Re: Review: listagg aggregate

2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 David E. Wheeler <david@kineticode.com>:

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

Ooh, nice.

* Can we use StringInfo directly as the aggregate context instead of
StringAggState? For the first reason, we need to drop 'delimiter' field
from struct StringAggState. Now it has only StringInfo field.

Makes sense.

no, has not.

What is use case for this behave??

Pavel

Show quoted text

Pavel

* We'd better avoiding to call text_to_cstring() for delimitors and elements
for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Yes please.

Comments?

Patch looks great, thank you!

David

#54

Takahiro Itagaki

itagaki.takahiro@oss.ntt.co.jp

almost 16 years ago

In reply to: Pavel Stehule (#53)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 David E. Wheeler <david@kineticode.com>:

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

no, has not.

What is use case for this behave??

I also think this usage is nonsense, but seems to be the most consistent
behavior for me. I didn't say anything about use-cases, but just capability.
Since we allow such kinds of usage for now, you need to verify the
delimiter is not changed rather than ignoring it if you want disallow
to change the delimiter during an aggregation.

Of course you can cache the first delimiter at start, and check delimiters
are not changed every calls -- but I think it is just a waste of cpu cycle.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#55

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Takahiro Itagaki (#54)

Re: Review: listagg aggregate

On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 David E. Wheeler <david@kineticode.com>:

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

no, has not.

What is use case for this behave??

I also think this usage is nonsense, but seems to be the most consistent
behavior for me. I didn't say anything about use-cases, but just capability.
Since we allow such kinds of usage for now, you need to verify the
delimiter is not changed rather than ignoring it if you want disallow
to change the delimiter during an aggregation.

Of course you can cache the first delimiter at start, and check delimiters
are not changed every calls -- but I think it is just a waste of cpu cycle.

Agreed. Not caching it seems the simplest solution.

...Robert

#56

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#55)

Re: Review: listagg aggregate

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/28 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 David E. Wheeler <david@kineticode.com>:

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

* I think we cannot cache the delimiter at the first call.
For example,
SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.

no, has not.

What is use case for this behave??

I also think this usage is nonsense, but seems to be the most consistent
behavior for me. I didn't say anything about use-cases, but just capability.
Since we allow such kinds of usage for now, you need to verify the
delimiter is not changed rather than ignoring it if you want disallow
to change the delimiter during an aggregation.

Of course you can cache the first delimiter at start, and check delimiters
are not changed every calls -- but I think it is just a waste of cpu cycle.

Agreed. Not caching it seems the simplest solution.

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Pavel

Show quoted text

...Robert

#57

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#56)

Re: Review: listagg aggregate

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

...Robert

#58

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#57)

Re: Review: listagg aggregate

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

It is only a few lines with zero complexity.

The main issue of Takahiro proposal is "unclean" behave.

we can have a content

c1 c2
-----------
c11, c12,
c21, c22

and result of string_agg(c1, c2)

have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
will be NULL ?? I checked oracle. Oracle doesn't allow variable as
delimiter. We can't check it. But we can fix first value and using it
as constant.

More - Takahiro proposal has one performance gain. It have to deTOAST
separator value in every cycle.

Takahiro has true - we can store length of separator and we can add
separator to cumulative value as binary value.

I prefer original behave with note in documentation - so as separator
is used a value on first row, when is used expression and not
constant.

Regards
Pavel

Show quoted text

...Robert

#59

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Robert Haas (#57)

Re: Review: listagg aggregate

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

Yeah. The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters. The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them. Not sure if this is worth documenting though. Those two
or three people who actually try it will figure it out soon enough.

regards, tom lane

#60

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Tom Lane (#59)

Re: Review: listagg aggregate

On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

Yeah. The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

Right.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters. The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them. Not sure if this is worth documenting though. Those two
or three people who actually try it will figure it out soon enough.

Yeah, I'm thoroughly unworried about it.

...Robert

#61

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Takahiro Itagaki (#49)

Re: Review: listagg aggregate

On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Surely the names of the transition and final functions should match
the name of the aggregate. But if there's a proposal on the table to
call this string_app_with_sep() instead of just string_agg(), +1 from
me.

...Robert

#62

Hitoshi Harada

umi.tanuki@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#58)

Re: Review: listagg aggregate

2010/1/29 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

It is only a few lines with zero complexity.

The main issue of Takahiro proposal is "unclean" behave.

we can have a content

c1 c2
-----------
c11, c12,
c21, c22

and result of string_agg(c1, c2)

have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
will be NULL ?? I checked oracle. Oracle doesn't allow variable as
delimiter. We can't check it. But we can fix first value and using it
as constant.

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

Regards,

--
Hitoshi Harada

#63

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#60)

Re: Review: listagg aggregate

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

Yeah. The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

Right.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters. The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them. Not sure if this is worth documenting though. Those two
or three people who actually try it will figure it out soon enough.

ok - there is one query,

in 99.99% the second argument will be a constant. Can we use this
information and optimize function for this case?

The detoast on every row can take some percent from a performance.

Pavel

Show quoted text

Yeah, I'm thoroughly unworried about it.

...Robert

#64

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Robert Haas (#61)

Re: Review: listagg aggregate

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
They are almost "internal names", but we could have more
like string_agg_with_sep_transfn.

Surely the names of the transition and final functions should match
the name of the aggregate. But if there's a proposal on the table to
call this string_app_with_sep() instead of just string_agg(), +1 from
me.

no, >>>string_app_with_sep<<< is too long for common using.

Pavel

Show quoted text

...Robert

#65

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Pavel Stehule (#63)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> writes:

in 99.99% the second argument will be a constant. Can we use this
information and optimize function for this case?

The detoast on every row can take some percent from a performance.

What detoast? There won't be one for a constant, nor even for a
variable in any sane situation --- who's going to be using
multi-kilobyte delimiter values? And if they do, aren't they likely
to run out of memory for the result long before the repeated detoasts
become an interesting problem? You're arguing about a case that
seems quite irrelevant to the real world.

regards, tom lane

#66

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Hitoshi Harada (#62)

Re: Review: listagg aggregate

2010/1/28 Hitoshi Harada <umi.tanuki@gmail.com>:

2010/1/29 Pavel Stehule <pavel.stehule@gmail.com>:

2010/1/28 Robert Haas <robertmhaas@gmail.com>:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation. I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

It is only a few lines with zero complexity.

The main issue of Takahiro proposal is "unclean" behave.

we can have a content

c1 c2
-----------
c11, c12,
c21, c22

and result of string_agg(c1, c2)

have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
will be NULL ?? I checked oracle. Oracle doesn't allow variable as
delimiter. We can't check it. But we can fix first value and using it
as constant.

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

I newer know so this function exists. Now we can

a) check and allow only stable params
b) when second parameter is stable, then store it and use it as constant.

I prefer a)

Pavel

Show quoted text

Regards,

--
Hitoshi Harada

#67

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Tom Lane (#65)

Re: Review: listagg aggregate

2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

in 99.99% the second argument will be a constant. Can we use this
information and optimize function for this case?

The detoast on every row can take some percent from a performance.

What detoast? There won't be one for a constant, nor even for a
variable in any sane situation --- who's going to be using
multi-kilobyte delimiter values? And if they do, aren't they likely
to run out of memory for the result long before the repeated detoasts
become an interesting problem? You're arguing about a case that
seems quite irrelevant to the real world.

I asking

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

Regards
Pavel

Show quoted text

regards, tom lane

#68

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Hitoshi Harada (#62)

Re: Review: listagg aggregate

Hitoshi Harada <umi.tanuki@gmail.com> writes:

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

Seems quite expensive (possible catalog lookups) and there still isn't
any strong argument why we should bother.

regards, tom lane

#69

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: Pavel Stehule (#67)

Re: Review: listagg aggregate

Pavel Stehule <pavel.stehule@gmail.com> writes:

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

No, that will create its own performance issues ---
get_fn_expr_arg_stable isn't especially cheap.
If there were a really strong reason why we had to do it,
then I'd agree, but frankly the argument for disallowing
a variable delimiter is too thin.

regards, tom lane

#70

Marko Tiikkaja

marko.tiikkaja@cs.helsinki.fi

almost 16 years ago

In reply to: Pavel Stehule (#66)

Re: Review: listagg aggregate

On 2010-01-28 19:17, Pavel Stehule wrote:

2010/1/28 Hitoshi Harada <umi.tanuki@gmail.com>:

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

I newer know so this function exists. Now we can

a) check and allow only stable params
b) when second parameter is stable, then store it and use it as constant.

I prefer a)

Someone might have a perfectly good use case for using different
delimiters. I don't think it's a good idea to be artificially limiting
what you can and can't do.

Regards,
Marko Tiikkaja

#71

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Marko Tiikkaja (#70)

Re: Review: listagg aggregate

On Jan 28, 2010, at 9:29 AM, Marko Tiikkaja wrote:

Someone might have a perfectly good use case for using different
delimiters. I don't think it's a good idea to be artificially limiting
what you can and can't do.

David

#72

Greg Stark

stark@mit.edu

almost 16 years ago

In reply to: Tom Lane (#59)

Re: Review: listagg aggregate

One situation where this could actually matter in the long term is if we
want to have an optimization for aggregate functions whose state variables
can be combined. this could be important if we ever want to do parallel
processing someday.

So we could have, for example two subjobs build two sublists and then
combiner the two lists later. in that case the separator might not be the
one the user expected - our put another way the one the user expected might
not be available when we need it.

We could say this isn't a problem because not all aggregate functions will
be amenable to such an optimization and perhaps this will just be one of
them. Alternately we could just have faith that a solution will be found -
it doesn't seem like it should be an insoluble problem to me.

greg

On 28 Jan 2010 15:57, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule <pavel....

Yeah. The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subs...

#73

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Tom Lane (#69)

Re: Review: listagg aggregate

2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

No, that will create its own performance issues ---
get_fn_expr_arg_stable isn't especially cheap.
If there were a really strong reason why we had to do it,
then I'd agree, but frankly the argument for disallowing
a variable delimiter is too thin.

it could be called only once. But I agree, so could be better, check
it in parser time.

ok - I am only one who like original behave - so I am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

Regards
Pavel Stehule

Show quoted text

regards, tom lane

#74

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#73)

Re: Review: listagg aggregate

On Fri, Jan 29, 2010 at 2:43 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/1/28 Tom Lane <tgl@sss.pgh.pa.us>:

Pavel Stehule <pavel.stehule@gmail.com> writes:

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

No, that will create its own performance issues ---
get_fn_expr_arg_stable isn't especially cheap.
If there were a really strong reason why we had to do it,
then I'd agree, but frankly the argument for disallowing
a variable delimiter is too thin.

it could be called only once. But I agree, so could be better, check
it in parser time.

ok - I am only one who like original behave - so I am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one. But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

...Robert

#75

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Robert Haas (#74)

Re: Review: listagg aggregate

On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one. But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly.

Best,

David

#76

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: David E. Wheeler (#75)

Re: Review: listagg aggregate

On Fri, Jan 29, 2010 at 1:45 PM, David E. Wheeler <david@kineticode.com> wrote:

On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one. But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly.

Want to take a short at it?

...Robert

#77

David E. Wheeler

david@kineticode.com

almost 16 years ago

In reply to: Robert Haas (#76)

Re: Review: listagg aggregate

On Jan 29, 2010, at 10:46 AM, Robert Haas wrote:

I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly.

Want to take a short at it?

ENOTUITS! /me is already sorely over-committed…

David

#78

Takahiro Itagaki

itagaki.takahiro@oss.ntt.co.jp

almost 16 years ago

In reply to: Robert Haas (#74)

Re: Review: listagg aggregate

Robert Haas <robertmhaas@gmail.com> wrote:

ok - I am only one who like original behave - so I ?am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one. But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

Applied with some editorialization. Please check the docs are good enough.

I removed a test pattern for variable delimiters from regression test
because it might be an undefined behavior. We might change the behavior
in the future, so we guarantee only constant delimiters.

The transition functions are renamed to "string_agg_transfn" and
"string_agg_delim_transfn". We cannot use "string_agg_transfn" for
both names because we cast the function name as regproc in bootstrap;
it complains about duplicated function names.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#79

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Takahiro Itagaki (#78)

Re: Review: listagg aggregate

2010/2/1 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:

Robert Haas <robertmhaas@gmail.com> wrote:

ok - I am only one who like original behave - so I ?am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one. But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

Applied with some editorialization. Please check the docs are good enough.

I removed a test pattern for variable delimiters from regression test
because it might be an undefined behavior. We might change the behavior
in the future, so we guarantee only constant delimiters.

The transition functions are renamed to "string_agg_transfn" and
"string_agg_delim_transfn". We cannot use "string_agg_transfn" for
both names because we cast the function name as regproc in bootstrap;
it complains about duplicated function names.

thank you very much

Pavel Stehule

Show quoted text

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#80

Thom Brown

thombrown@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#79)

Re: Review: listagg aggregate

I noticed that the regression test results don't include the following case:

select string_agg(a) from (values(null),(null)) g(a);

There is one similar where a delimiter is provided.

Which leads me to ask for clarification, would it be safe to assume that
string_agg can never itself return null?

Thom

#81

Pavel Stehule

pavel.stehule@gmail.com

almost 16 years ago

In reply to: Thom Brown (#80)

Re: Review: listagg aggregate

2010/2/1 Thom Brown <thombrown@gmail.com>:

I noticed that the regression test results don't include the following case:

select string_agg(a) from (values(null),(null)) g(a);

There is one similar where a delimiter is provided.

Which leads me to ask for clarification, would it be safe to assume that
string_agg can never itself return null?

if all values are null, then result is null.

Pavel

Show quoted text

Thom

#82

Thom Brown

thombrown@gmail.com

almost 16 years ago

In reply to: Pavel Stehule (#81)

Re: Review: listagg aggregate

On 1 February 2010 13:40, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2010/2/1 Thom Brown <thombrown@gmail.com>:

I noticed that the regression test results don't include the following

case:

select string_agg(a) from (values(null),(null)) g(a);

There is one similar where a delimiter is provided.

Which leads me to ask for clarification, would it be safe to assume that
string_agg can never itself return null?

if all values are null, then result is null.

Pavel

Ah, I was looking at the expected results, and couldn't see a NULL outcome,
but then these aren't indicated in such results anyway.

Thom

#83

Robert Haas

robertmhaas@gmail.com

almost 16 years ago

In reply to: Takahiro Itagaki (#78)

Re: Review: listagg aggregate

On Sun, Jan 31, 2010 at 10:29 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Applied with some editorialization. Please check the docs are good enough.

Looks pretty good. I have committed a couple very minor adjustments.

...Robert