patch: to_string, to_array functions
Hello
attached patch contains to_string and to_array functions. These
functions are equivalent of array_to_string and string_to_array
function with maybe more correct NULL handling.
postgres=# select to_array('1,2,3,4,,6',',');
to_array
------------------
{1,2,3,4,NULL,6}
(1 row)
postgres=# select to_array('1,2,3,4,,6',',','***');
to_array
----------------
{1,2,3,4,"",6}
(1 row)
postgres=# select to_string(array[1,2,3,4,NULL,6],',');
to_string
------------
1,2,3,4,,6
(1 row)
postgres=# select to_string(array[1,2,3,4,NULL,6],',','***');
to_string
---------------
1,2,3,4,***,6
(1 row)
Regards
Pavel Stehule
Attachments:
to_array.diffapplication/octet-stream; name=to_array.diffDownload
*** ./doc/src/sgml/func.sgml.orig 2010-04-07 08:12:52.000000000 +0200
--- ./doc/src/sgml/func.sgml 2010-05-05 20:36:31.692244426 +0200
***************
*** 4639,4645 ****
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
--- 4639,4645 ----
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
***************
*** 9501,9506 ****
--- 9501,9512 ----
<primary>string_to_array</primary>
</indexterm>
<indexterm>
+ <primary>to_array</primary>
+ </indexterm>
+ <indexterm>
+ <primary>to_string</primary>
+ </indexterm>
+ <indexterm>
<primary>unnest</primary>
</indexterm>
***************
*** 9643,9648 ****
--- 9649,9676 ----
<row>
<entry>
<literal>
+ <function>to_array</function>(<type>text</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text[]</type></entry>
+ <entry>splits string into array elements using supplied delimiter and null string</entry>
+ <entry><literal>to_array('1,2,3,,5', ',')</literal></entry>
+ <entry><literal>{1,2,3,4,NULL,5}</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
+ <function>to_string</function>(<type>anyarray</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>concatenates array elements using supplied delimiter and null string</entry>
+ <entry><literal>to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')</literal></entry>
+ <entry><literal>1,2,3,*,5</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
<function>unnest</function>(<type>anyarray</type>)
</literal>
</entry>
*** ./src/backend/catalog/system_views.sql.orig 2010-04-26 16:22:37.000000000 +0200
--- ./src/backend/catalog/system_views.sql 2010-05-05 19:59:05.256243744 +0200
***************
*** 487,489 ****
--- 487,497 ----
CREATE OR REPLACE FUNCTION
pg_start_backup(label text, fast boolean DEFAULT false)
RETURNS text STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
+
+ CREATE OR REPLACE FUNCTION
+ to_string(v anyarray, fldsep text, null_string text DEFAULT '')
+ RETURNS text STRICT IMMUTABLE LANGUAGE internal AS 'to_string';
+
+ CREATE OR REPLACE FUNCTION
+ to_array(inputstr text, fldsep text, null_string text DEFAULT '')
+ RETURNS text[] STRICT IMMUTABLE LANGUAGE internal AS 'to_array';
*** ./src/backend/utils/adt/array_userfuncs.c.orig 2010-02-26 03:01:06.000000000 +0100
--- ./src/backend/utils/adt/array_userfuncs.c 2010-05-05 19:15:45.898243688 +0200
***************
*** 407,415 ****
--- 407,417 ----
create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims)
{
Datum dvalues[1];
+ bool nulls[1];
int16 typlen;
bool typbyval;
char typalign;
***************
*** 429,434 ****
--- 431,437 ----
ndims, MAXDIM)));
dvalues[0] = element;
+ nulls[0] = isNull;
for (i = 0; i < ndims; i++)
{
***************
*** 462,468 ****
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, NULL, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
--- 465,471 ----
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, nulls, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
*** ./src/backend/utils/adt/varlena.c.orig 2010-02-26 03:01:10.000000000 +0100
--- ./src/backend/utils/adt/varlena.c 2010-05-05 20:17:53.657243603 +0200
***************
*** 2965,2980 ****
}
/*
! * text_to_array
! * parse input string
! * return text array of elements
! * based on provided field separator
*/
! Datum
! text_to_array(PG_FUNCTION_ARGS)
{
- text *inputstring = PG_GETARG_TEXT_PP(0);
- text *fldsep = PG_GETARG_TEXT_PP(1);
int inputstring_len;
int fldsep_len;
TextPositionState state;
--- 2965,2986 ----
}
/*
! * Returns true when two text params are same.
*/
! static
! bool text_isequal(text *txt1, text *txt2)
! {
! return DatumGetBool(DirectFunctionCall2(texteq,
! PointerGetDatum(txt1),
! PointerGetDatum(txt2)));
! }
!
! /*
! * common code for text_to_array and to_array functions
! */
! static Datum
! _text_to_array(FunctionCallInfo fcinfo, text *inputstring, text *fldsep, text *null_string, bool *isNull)
{
int inputstring_len;
int fldsep_len;
TextPositionState state;
***************
*** 2985,2993 ****
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
text_position_setup(inputstring, fldsep, &state);
!
/*
* Note: we check the converted string length, not the original, because
* they could be different if the input contained invalid encoding.
--- 2991,3000 ----
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
+ bool is_null_string;
text_position_setup(inputstring, fldsep, &state);
!
/*
* Note: we check the converted string length, not the original, because
* they could be different if the input contained invalid encoding.
***************
*** 2999,3005 ****
if (inputstring_len < 1)
{
text_position_cleanup(&state);
! PG_RETURN_NULL();
}
/*
--- 3006,3013 ----
if (inputstring_len < 1)
{
text_position_cleanup(&state);
! *isNull = true;
! return (Datum) 0;
}
/*
***************
*** 3009,3016 ****
if (fldsep_len < 1)
{
text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring), 1));
}
start_posn = 1;
--- 3017,3026 ----
if (fldsep_len < 1)
{
text_position_cleanup(&state);
! is_null_string = null_string != NULL ? text_isequal(null_string, inputstring) : false;
! *isNull = false;
! return (Datum) create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring), is_null_string, 1);
}
start_posn = 1;
***************
*** 3036,3046 ****
/* must build a temp text datum to pass to accumArrayResult */
result_text = cstring_to_text_with_len(start_ptr, chunk_len);
!
/* stash away this field */
astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! false,
TEXTOID,
CurrentMemoryContext);
--- 3046,3057 ----
/* must build a temp text datum to pass to accumArrayResult */
result_text = cstring_to_text_with_len(start_ptr, chunk_len);
! is_null_string = null_string != NULL ? text_isequal(null_string, result_text) : false;
!
/* stash away this field */
astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! is_null_string,
TEXTOID,
CurrentMemoryContext);
***************
*** 3057,3076 ****
text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
! CurrentMemoryContext));
}
/*
! * array_to_text
! * concatenate Cstring representation of input array elements
! * using provided field separator
*/
Datum
! array_to_text(PG_FUNCTION_ARGS)
{
- ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
- char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
int nitems,
*dims,
ndims;
--- 3068,3129 ----
text_position_cleanup(&state);
! *isNull = false;
! return makeArrayResult(astate, CurrentMemoryContext);
}
+
/*
! * text_to_array
! * parse input string
! * return text array of elements
! * based on provided field separator
*/
Datum
! text_to_array(PG_FUNCTION_ARGS)
! {
! text *inputstring = PG_GETARG_TEXT_PP(0);
! text *fldsep = PG_GETARG_TEXT_PP(1);
! bool isNull;
! Datum result;
!
! result = _text_to_array(fcinfo, inputstring, fldsep, NULL, &isNull);
!
! if (isNull)
! PG_RETURN_NULL();
! else
! PG_RETURN_ARRAYTYPE_P(result);
! }
!
! /*
! * to_array
! * Parse string, returns array. string defined as nullsym is replaced by NULL.
! * Default value for nullsym is empty string.
! */
! Datum
! to_array(PG_FUNCTION_ARGS)
! {
! text *inputstring = PG_GETARG_TEXT_PP(0);
! text *fldsep = PG_GETARG_TEXT_PP(1);
! text *null_string = PG_GETARG_TEXT_PP(2);
! bool isNull;
! Datum result;
!
! result = _text_to_array(fcinfo, inputstring, fldsep, null_string, &isNull);
!
! if (isNull)
! PG_RETURN_NULL();
! else
! PG_RETURN_ARRAYTYPE_P(result);
! }
!
! /*
! * common code for array_to_text and to_string function. null_string can be
! * NULL (only for array_to_text function).
! */
! static text *
! _array_to_text(FunctionCallInfo fcinfo, ArrayType *v, char *fldsep, char *null_string)
{
int nitems,
*dims,
ndims;
***************
*** 3092,3098 ****
/* if there are no elements, return an empty string */
if (nitems == 0)
! PG_RETURN_TEXT_P(cstring_to_text(""));
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
--- 3145,3151 ----
/* if there are no elements, return an empty string */
if (nitems == 0)
! return cstring_to_text("");
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
***************
*** 3140,3146 ****
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls */
}
else
{
--- 3193,3207 ----
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls, when null_string isn't defined (is NULL) */
! if (null_string != NULL)
! {
! if (printed)
! appendStringInfo(&buf, "%s%s", fldsep, null_string);
! else
! appendStringInfoString(&buf, null_string);
! printed = true;
! }
}
else
{
***************
*** 3170,3178 ****
}
}
! PG_RETURN_TEXT_P(cstring_to_text_with_len(buf.data, buf.len));
}
#define HEXBASE 16
/*
* Convert a int32 to a string containing a base 16 (hex) representation of
--- 3231,3269 ----
}
}
! return cstring_to_text_with_len(buf.data, buf.len);
}
+ /*
+ * array_to_text
+ * concatenate Cstring representation of input array elements
+ * using provided field separator
+ */
+ Datum
+ array_to_text(PG_FUNCTION_ARGS)
+ {
+ ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
+ char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+ PG_RETURN_TEXT_P(_array_to_text(fcinfo, v, fldsep, NULL));
+ }
+
+ /*
+ * to_string
+ * concatenate Cstring representation of input array elements
+ * using provided field separator and null string
+ */
+ Datum
+ to_string(PG_FUNCTION_ARGS)
+ {
+ ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
+ char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+ char *null_string = text_to_cstring(PG_GETARG_TEXT_PP(2));
+
+ PG_RETURN_TEXT_P(_array_to_text(fcinfo, v, fldsep, null_string));
+ }
+
+
#define HEXBASE 16
/*
* Convert a int32 to a string containing a base 16 (hex) representation of
*** ./src/include/catalog/pg_proc.h.orig 2010-02-26 03:01:21.000000000 +0100
--- ./src/include/catalog/pg_proc.h 2010-05-05 19:41:37.518244240 +0200
***************
*** 1022,1027 ****
--- 1022,1031 ----
DESCR("split delimited text into text[]");
DATA(insert OID = 395 ( array_to_string PGNSP PGUID 12 1 0 0 f f f t f i 2 0 25 "2277 25" _null_ _null_ _null_ _null_ array_to_text _null_ _null_ _null_ ));
DESCR("concatenate array elements, using delimiter, into text");
+ DATA(insert OID = 950 ( to_array PGNSP PGUID 12 1 0 0 f f f t f i 3 0 1009 "25 25 25" _null_ _null_ _null_ _null_ to_array _null_ _null_ _null_ ));
+ DESCR("split delimited text into text[], possible to set null string");
+ DATA(insert OID = 951 ( to_string PGNSP PGUID 12 1 0 0 f f f t f i 3 0 25 "2277 25 25" _null_ _null_ _null_ _null_ to_string _null_ _null_ _null_ ));
+ DESCR("concatenate array elements, using delimiter, into text, possible to set null string");
DATA(insert OID = 515 ( array_larger PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 516 ( array_smaller PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_smaller _null_ _null_ _null_ ));
*** ./src/include/utils/array.h.orig 2010-01-02 17:58:09.000000000 +0100
--- ./src/include/utils/array.h 2010-05-05 18:02:44.191119209 +0200
***************
*** 274,279 ****
--- 274,280 ----
extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims);
extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
*** ./src/include/utils/builtins.h.orig 2010-02-26 03:01:28.000000000 +0100
--- ./src/include/utils/builtins.h 2010-05-05 18:36:03.468243638 +0200
***************
*** 714,719 ****
--- 714,721 ----
extern Datum split_text(PG_FUNCTION_ARGS);
extern Datum text_to_array(PG_FUNCTION_ARGS);
extern Datum array_to_text(PG_FUNCTION_ARGS);
+ extern Datum to_array(PG_FUNCTION_ARGS);
+ extern Datum to_string(PG_FUNCTION_ARGS);
extern Datum to_hex32(PG_FUNCTION_ARGS);
extern Datum to_hex64(PG_FUNCTION_ARGS);
extern Datum md5_text(PG_FUNCTION_ARGS);
*** ./src/test/regress/expected/arrays.out.orig 2010-02-18 19:41:47.000000000 +0100
--- ./src/test/regress/expected/arrays.out 2010-05-05 20:19:26.000000000 +0200
***************
*** 1208,1210 ****
--- 1208,1265 ----
[5:5]={"(42,43)"}
(1 row)
+ -- check to_string and to_array functions
+ select to_array('abc','');
+ to_array
+ ----------
+ {abc}
+ (1 row)
+
+ select to_array('abc','','abc');
+ to_array
+ ----------
+ {NULL}
+ (1 row)
+
+ select to_array('abc',',');
+ to_array
+ ----------
+ {abc}
+ (1 row)
+
+ select to_array('abc',',','abc');
+ to_array
+ ----------
+ {NULL}
+ (1 row)
+
+ select to_array('1,2,3,4,,6',',');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_array('1,2,3,4,,6',',','');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_array('1,2,3,4,*,6',',','*');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_string(array[1,2,3,4,NULL,6],',');
+ to_string
+ ------------
+ 1,2,3,4,,6
+ (1 row)
+
+ select to_string(array[1,2,3,4,NULL,6],',','*');
+ to_string
+ -------------
+ 1,2,3,4,*,6
+ (1 row)
+
*** ./src/test/regress/sql/arrays.sql.orig 2010-02-18 19:41:47.000000000 +0100
--- ./src/test/regress/sql/arrays.sql 2010-05-05 20:13:12.194118475 +0200
***************
*** 412,414 ****
--- 412,426 ----
select * from t1;
update t1 set f1[5].q2 = 43;
select * from t1;
+
+ -- check to_string and to_array functions
+
+ select to_array('abc','');
+ select to_array('abc','','abc');
+ select to_array('abc',',');
+ select to_array('abc',',','abc');
+ select to_array('1,2,3,4,,6',',');
+ select to_array('1,2,3,4,,6',',','');
+ select to_array('1,2,3,4,*,6',',','*');
+ select to_string(array[1,2,3,4,NULL,6],',');
+ select to_string(array[1,2,3,4,NULL,6],',','*');
On 6 May 2010 04:42, Pavel Stehule <pavel.stehule@gmail.com> wrote:
attached patch contains to_string and to_array functions. These
functions are equivalent of array_to_string and string_to_array
function with maybe more correct NULL handling.
Hi Pavel,
I am reviewing your patch for the commitfest.
Overall the patch looks good, although there were some bogus
whitespace changes in the patch and some messy punctuation/grammar in
some of the code comments. I also thought it was worth mentioning in
the docs the default value for null_string is ''. I made an attempt
to clean those items up and have attached a v2 of the patch.
Regarding the behaviour of the third argument (null_string), I was a
little surprised by the results when I passed in a NULL.
postgres=# select to_string(array['a', 'b', 'c', 'd'], '/', NULL);
to_string
-----------
Now, if the array had some NULL elements in it, I could understand why
the resulting string would be NULL (because str || NULL is NULL), but
in this case there are no NULLs. Why is the result NULL? Surely it
should be 'a/b/c/d' regardless of how the third parameter is set?
In the reverse case:
postgres=# select to_array('a/b/c/d', '/', NULL);
to_array
----------
(1 row)
Again I find this a bit weird. I have left the null_string NULL,
which means it is unknown. It can't possibly match any value in the
string, so effectively passing in a NULL null_string should mean that
the user doesn't want any string items whatsoever to translate into
NULLs in the resulting array. I would expect this call to return
{a,b,c,d}.
Cheers,
BJ
Attachments:
to_array-v2.difftext/plain; charset=US-ASCII; name=to_array-v2.diffDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 70dab53..2256d9c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9544,6 +9544,12 @@ SELECT NULLIF(value, '(none)') ...
<primary>string_to_array</primary>
</indexterm>
<indexterm>
+ <primary>to_array</primary>
+ </indexterm>
+ <indexterm>
+ <primary>to_string</primary>
+ </indexterm>
+ <indexterm>
<primary>unnest</primary>
</indexterm>
@@ -9686,6 +9692,28 @@ SELECT NULLIF(value, '(none)') ...
<row>
<entry>
<literal>
+ <function>to_array</function>(<type>text</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text[]</type></entry>
+ <entry>splits string into array elements using supplied delimiter and null string (defaults to the empty string)</entry>
+ <entry><literal>to_array('1,2,3,,5', ',')</literal></entry>
+ <entry><literal>{1,2,3,4,NULL,5}</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
+ <function>to_string</function>(<type>anyarray</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>concatenates array elements using supplied delimiter and null string (defaults to the empty string)</entry>
+ <entry><literal>to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')</literal></entry>
+ <entry><literal>1,2,3,*,5</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
<function>unnest</function>(<type>anyarray</type>)
</literal>
</entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8852326..24a79e8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -487,3 +487,11 @@ COMMENT ON FUNCTION ts_debug(text) IS
CREATE OR REPLACE FUNCTION
pg_start_backup(label text, fast boolean DEFAULT false)
RETURNS text STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
+
+CREATE OR REPLACE FUNCTION
+ to_string(v anyarray, fldsep text, null_string text DEFAULT '')
+ RETURNS text STRICT IMMUTABLE LANGUAGE internal AS 'to_string';
+
+CREATE OR REPLACE FUNCTION
+ to_array(inputstr text, fldsep text, null_string text DEFAULT '')
+ RETURNS text[] STRICT IMMUTABLE LANGUAGE internal AS 'to_array';
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index 00e9c54..7f88b82 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -407,9 +407,11 @@ ArrayType *
create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims)
{
Datum dvalues[1];
+ bool nulls[1];
int16 typlen;
bool typbyval;
char typalign;
@@ -429,6 +431,7 @@ create_singleton_array(FunctionCallInfo fcinfo,
ndims, MAXDIM)));
dvalues[0] = element;
+ nulls[0] = isNull;
for (i = 0; i < ndims; i++)
{
@@ -462,7 +465,7 @@ create_singleton_array(FunctionCallInfo fcinfo,
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
- return construct_md_array(dvalues, NULL, ndims, dims, lbs, element_type,
+ return construct_md_array(dvalues, nulls, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index af28c15..676fdc1 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2965,16 +2965,22 @@ split_text(PG_FUNCTION_ARGS)
}
/*
- * text_to_array
- * parse input string
- * return text array of elements
- * based on provided field separator
+ * Return true when two text params are equal.
*/
-Datum
-text_to_array(PG_FUNCTION_ARGS)
+static
+bool text_isequal(text *txt1, text *txt2)
+{
+ return DatumGetBool(DirectFunctionCall2(texteq,
+ PointerGetDatum(txt1),
+ PointerGetDatum(txt2)));
+}
+
+/*
+ * Common code for text_to_array and to_array functions.
+ */
+static Datum
+_text_to_array(FunctionCallInfo fcinfo, text *inputstring, text *fldsep, text *null_string, bool *isNull)
{
- text *inputstring = PG_GETARG_TEXT_PP(0);
- text *fldsep = PG_GETARG_TEXT_PP(1);
int inputstring_len;
int fldsep_len;
TextPositionState state;
@@ -2985,9 +2991,10 @@ text_to_array(PG_FUNCTION_ARGS)
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
+ bool is_null_string;
text_position_setup(inputstring, fldsep, &state);
-
+
/*
* Note: we check the converted string length, not the original, because
* they could be different if the input contained invalid encoding.
@@ -2999,7 +3006,8 @@ text_to_array(PG_FUNCTION_ARGS)
if (inputstring_len < 1)
{
text_position_cleanup(&state);
- PG_RETURN_NULL();
+ *isNull = true;
+ return (Datum) 0;
}
/*
@@ -3009,8 +3017,10 @@ text_to_array(PG_FUNCTION_ARGS)
if (fldsep_len < 1)
{
text_position_cleanup(&state);
- PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
- PointerGetDatum(inputstring), 1));
+ is_null_string = null_string != NULL ? text_isequal(null_string, inputstring) : false;
+ *isNull = false;
+ return (Datum) create_singleton_array(fcinfo, TEXTOID,
+ PointerGetDatum(inputstring), is_null_string, 1);
}
start_posn = 1;
@@ -3036,11 +3046,12 @@ text_to_array(PG_FUNCTION_ARGS)
/* must build a temp text datum to pass to accumArrayResult */
result_text = cstring_to_text_with_len(start_ptr, chunk_len);
-
+ is_null_string = null_string != NULL ? text_isequal(null_string, result_text) : false;
+
/* stash away this field */
astate = accumArrayResult(astate,
PointerGetDatum(result_text),
- false,
+ is_null_string,
TEXTOID,
CurrentMemoryContext);
@@ -3057,20 +3068,65 @@ text_to_array(PG_FUNCTION_ARGS)
text_position_cleanup(&state);
- PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
- CurrentMemoryContext));
+ *isNull = false;
+ return makeArrayResult(astate, CurrentMemoryContext);
}
/*
- * array_to_text
- * concatenate Cstring representation of input array elements
- * using provided field separator
+ * text_to_array
+ * parse input string
+ * return text array of elements
+ * based on provided field separator
*/
Datum
-array_to_text(PG_FUNCTION_ARGS)
+text_to_array(PG_FUNCTION_ARGS)
+{
+ text *inputstring = PG_GETARG_TEXT_PP(0);
+ text *fldsep = PG_GETARG_TEXT_PP(1);
+ bool isNull;
+ Datum result;
+
+ result = _text_to_array(fcinfo, inputstring, fldsep, NULL, &isNull);
+
+ if (isNull)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_ARRAYTYPE_P(result);
+}
+
+/*
+ * to_array
+ *
+ * Parse a string delimited by fldsep, and return an array. Any instances of
+ * null_string in the inputstring are replaced by NULL elements in the result
+ * array.
+ *
+ * The default value for null_string is the empty string.
+ */
+Datum
+to_array(PG_FUNCTION_ARGS)
+{
+ text *inputstring = PG_GETARG_TEXT_PP(0);
+ text *fldsep = PG_GETARG_TEXT_PP(1);
+ text *null_string = PG_GETARG_TEXT_PP(2);
+ bool isNull;
+ Datum result;
+
+ result = _text_to_array(fcinfo, inputstring, fldsep, null_string, &isNull);
+
+ if (isNull)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_ARRAYTYPE_P(result);
+}
+
+/*
+ * Common code for array_to_text and to_string functions. null_string can be
+ * NULL (only for array_to_text function).
+ */
+static text *
+_array_to_text(FunctionCallInfo fcinfo, ArrayType *v, char *fldsep, char *null_string)
{
- ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
- char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
int nitems,
*dims,
ndims;
@@ -3092,7 +3148,7 @@ array_to_text(PG_FUNCTION_ARGS)
/* if there are no elements, return an empty string */
if (nitems == 0)
- PG_RETURN_TEXT_P(cstring_to_text(""));
+ return cstring_to_text("");
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
@@ -3140,7 +3196,15 @@ array_to_text(PG_FUNCTION_ARGS)
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
- /* we ignore nulls */
+ /* we ignore nulls, when null_string isn't defined (is NULL) */
+ if (null_string != NULL)
+ {
+ if (printed)
+ appendStringInfo(&buf, "%s%s", fldsep, null_string);
+ else
+ appendStringInfoString(&buf, null_string);
+ printed = true;
+ }
}
else
{
@@ -3170,9 +3234,41 @@ array_to_text(PG_FUNCTION_ARGS)
}
}
- PG_RETURN_TEXT_P(cstring_to_text_with_len(buf.data, buf.len));
+ return cstring_to_text_with_len(buf.data, buf.len);
}
+/*
+ * array_to_text
+ * concatenate Cstring representation of input array elements
+ * using provided field separator
+ */
+Datum
+array_to_text(PG_FUNCTION_ARGS)
+{
+ ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
+ char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+ PG_RETURN_TEXT_P(_array_to_text(fcinfo, v, fldsep, NULL));
+}
+
+/*
+ * to_string
+ *
+ * Concatenate cstring representation of input array elements using the
+ * provided field separator. NULL elements in the input array are rendered
+ * using the provided null_string, which defaults to the empty string.
+ */
+Datum
+to_string(PG_FUNCTION_ARGS)
+{
+ ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
+ char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+ char *null_string = text_to_cstring(PG_GETARG_TEXT_PP(2));
+
+ PG_RETURN_TEXT_P(_array_to_text(fcinfo, v, fldsep, null_string));
+}
+
+
#define HEXBASE 16
/*
* Convert a int32 to a string containing a base 16 (hex) representation of
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6036493..999b331 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -1022,6 +1022,10 @@ DATA(insert OID = 394 ( string_to_array PGNSP PGUID 12 1 0 0 f f f t f i 2 0
DESCR("split delimited text into text[]");
DATA(insert OID = 395 ( array_to_string PGNSP PGUID 12 1 0 0 f f f t f i 2 0 25 "2277 25" _null_ _null_ _null_ _null_ array_to_text _null_ _null_ _null_ ));
DESCR("concatenate array elements, using delimiter, into text");
+DATA(insert OID = 950 ( to_array PGNSP PGUID 12 1 0 0 f f f t f i 3 0 1009 "25 25 25" _null_ _null_ _null_ _null_ to_array _null_ _null_ _null_ ));
+DESCR("split delimited text into text[], possible to set null string");
+DATA(insert OID = 951 ( to_string PGNSP PGUID 12 1 0 0 f f f t f i 3 0 25 "2277 25 25" _null_ _null_ _null_ _null_ to_string _null_ _null_ _null_ ));
+DESCR("concatenate array elements, using delimiter, into text, possible to set null string");
DATA(insert OID = 515 ( array_larger PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 516 ( array_smaller PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_smaller _null_ _null_ _null_ ));
diff --git a/src/include/utils/array.h b/src/include/utils/array.h
index 1376f8c..db9a615 100644
--- a/src/include/utils/array.h
+++ b/src/include/utils/array.h
@@ -274,6 +274,7 @@ extern Datum array_cat(PG_FUNCTION_ARGS);
extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims);
extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index 45123fd..6c539e1 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -715,6 +715,8 @@ extern text *replace_text_regexp(text *src_text, void *regexp,
extern Datum split_text(PG_FUNCTION_ARGS);
extern Datum text_to_array(PG_FUNCTION_ARGS);
extern Datum array_to_text(PG_FUNCTION_ARGS);
+extern Datum to_array(PG_FUNCTION_ARGS);
+extern Datum to_string(PG_FUNCTION_ARGS);
extern Datum to_hex32(PG_FUNCTION_ARGS);
extern Datum to_hex64(PG_FUNCTION_ARGS);
extern Datum md5_text(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index 3ab18be..df9bec6 100644
--- a/src/test/regress/expected/arrays.out
+++ b/src/test/regress/expected/arrays.out
@@ -1208,3 +1208,58 @@ select * from t1;
[5:5]={"(42,43)"}
(1 row)
+-- check to_string and to_array functions
+select to_array('abc','');
+ to_array
+----------
+ {abc}
+(1 row)
+
+select to_array('abc','','abc');
+ to_array
+----------
+ {NULL}
+(1 row)
+
+select to_array('abc',',');
+ to_array
+----------
+ {abc}
+(1 row)
+
+select to_array('abc',',','abc');
+ to_array
+----------
+ {NULL}
+(1 row)
+
+select to_array('1,2,3,4,,6',',');
+ to_array
+------------------
+ {1,2,3,4,NULL,6}
+(1 row)
+
+select to_array('1,2,3,4,,6',',','');
+ to_array
+------------------
+ {1,2,3,4,NULL,6}
+(1 row)
+
+select to_array('1,2,3,4,*,6',',','*');
+ to_array
+------------------
+ {1,2,3,4,NULL,6}
+(1 row)
+
+select to_string(array[1,2,3,4,NULL,6],',');
+ to_string
+------------
+ 1,2,3,4,,6
+(1 row)
+
+select to_string(array[1,2,3,4,NULL,6],',','*');
+ to_string
+-------------
+ 1,2,3,4,*,6
+(1 row)
+
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index f65bc45..dd149bb 100644
--- a/src/test/regress/sql/arrays.sql
+++ b/src/test/regress/sql/arrays.sql
@@ -412,3 +412,15 @@ insert into t1 (f1[5].q1) values(42);
select * from t1;
update t1 set f1[5].q2 = 43;
select * from t1;
+
+-- check to_string and to_array functions
+
+select to_array('abc','');
+select to_array('abc','','abc');
+select to_array('abc',',');
+select to_array('abc',',','abc');
+select to_array('1,2,3,4,,6',',');
+select to_array('1,2,3,4,,6',',','');
+select to_array('1,2,3,4,*,6',',','*');
+select to_string(array[1,2,3,4,NULL,6],',');
+select to_string(array[1,2,3,4,NULL,6],',','*');
Hello
2010/7/16 Brendan Jurd <direvus@gmail.com>:
On 6 May 2010 04:42, Pavel Stehule <pavel.stehule@gmail.com> wrote:
attached patch contains to_string and to_array functions. These
functions are equivalent of array_to_string and string_to_array
function with maybe more correct NULL handling.Hi Pavel,
I am reviewing your patch for the commitfest.
Overall the patch looks good, although there were some bogus
whitespace changes in the patch and some messy punctuation/grammar in
some of the code comments. I also thought it was worth mentioning in
the docs the default value for null_string is ''. I made an attempt
to clean those items up and have attached a v2 of the patch.Regarding the behaviour of the third argument (null_string), I was a
little surprised by the results when I passed in a NULL.postgres=# select to_string(array['a', 'b', 'c', 'd'], '/', NULL);
to_string
-----------Now, if the array had some NULL elements in it, I could understand why
the resulting string would be NULL (because str || NULL is NULL), but
in this case there are no NULLs. Why is the result NULL? Surely it
should be 'a/b/c/d' regardless of how the third parameter is set?In the reverse case:
postgres=# select to_array('a/b/c/d', '/', NULL);
to_array
----------(1 row)
I didn't thinking about NULL as separator before. Current behave isn't
practical. When default separator is empty string, then NULL can be
used as ignore NULLs - so it can emulate current string_to_array and
array_to_string behave. It can be, because NULL can't be a separator
ever.
select to_string(array[1,2,3,null,5], ',') -> 1,2,3,,5
select to_string(array[1,2,3,null,5], ',', null) -> 1,2,3,5
maybe - next idea and maybe better - we can check NOT NULL for
separator and to add other parameter with default = false -
ignore_null
select to_string(array[1,2,3,null,5], ',', ignore_null := true) -> 1,2,3,5
what do you think?
Regards
Pavel
Show quoted text
Again I find this a bit weird. I have left the null_string NULL,
which means it is unknown. It can't possibly match any value in the
string, so effectively passing in a NULL null_string should mean that
the user doesn't want any string items whatsoever to translate into
NULLs in the resulting array. I would expect this call to return
{a,b,c,d}.Cheers,
BJ
On 17 July 2010 02:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/16 Brendan Jurd <direvus@gmail.com>:
Regarding the behaviour of the third argument (null_string), I was a
little surprised by the results when I passed in a NULL.I didn't thinking about NULL as separator before. Current behave isn't
practical. When default separator is empty string, then NULL can be
used as ignore NULLs - so it can emulate current string_to_array and
array_to_string behave. It can be, because NULL can't be a separator
ever.select to_string(array[1,2,3,null,5], ',') -> 1,2,3,,5
select to_string(array[1,2,3,null,5], ',', null) -> 1,2,3,5maybe - next idea and maybe better - we can check NOT NULL for
separator and to add other parameter with default = false -
ignore_nullselect to_string(array[1,2,3,null,5], ',', ignore_null := true) -> 1,2,3,5
what do you think?
I don't have any problem with null_string = NULL in to_string taking
the meaning "skip over NULL elements". It's a slightly strange
outcome but it's more useful than returning NULL, and I do like that
it gives us a path to the current array_to_string() treatment even if
those functions are ultimately deprecated. I think adding a fourth
keyword argument might be sacrificing a little too much convenience in
the calling convention.
As for to_array, null_string = NULL should mean that there is no
string which should result in a NULL element. So I would be happy to
see the following set of behaviours:
to_string(array[1, 2, 3, 4, 5], ',', null) = '1,2,3,4,5'
to_string(array[1, 2, 3, null, 5], ',', null) = '1,2,3,5'
to_array('1,2,3,,5', ',', null) = '{1,2,3,"",5}'
Also, if we're going to make the function non-strict, we need to
consider how to respond when the user specifies NULL for the other
arguments. If the field separator is NULL, bearing in mind that NULL
can't match any string, I would expect that to_array would return the
undivided string as a single array element, and that to_string would
throw an error:
to_array('1,2,3,4,5', null) = '{"1,2,3,4,5"}'
to_string(array[1,2,3,4,5], null) = ERROR: the field separator for
to_string may not be NULL
If the first argument is NULL for either function, I think it would be
reasonable to return NULL. But I could be convinced that we should
throw an error in that case too.
Cheers,
BJ
2010/7/16 Brendan Jurd <direvus@gmail.com>:
On 17 July 2010 02:15, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/16 Brendan Jurd <direvus@gmail.com>:
Regarding the behaviour of the third argument (null_string), I was a
little surprised by the results when I passed in a NULL.I didn't thinking about NULL as separator before. Current behave isn't
practical. When default separator is empty string, then NULL can be
used as ignore NULLs - so it can emulate current string_to_array and
array_to_string behave. It can be, because NULL can't be a separator
ever.select to_string(array[1,2,3,null,5], ',') -> 1,2,3,,5
select to_string(array[1,2,3,null,5], ',', null) -> 1,2,3,5maybe - next idea and maybe better - we can check NOT NULL for
separator and to add other parameter with default = false -
ignore_nullselect to_string(array[1,2,3,null,5], ',', ignore_null := true) -> 1,2,3,5
what do you think?
I don't have any problem with null_string = NULL in to_string taking
the meaning "skip over NULL elements". It's a slightly strange
outcome but it's more useful than returning NULL, and I do like that
it gives us a path to the current array_to_string() treatment even if
those functions are ultimately deprecated. I think adding a fourth
keyword argument might be sacrificing a little too much convenience in
the calling convention.As for to_array, null_string = NULL should mean that there is no
string which should result in a NULL element. So I would be happy to
see the following set of behaviours:to_string(array[1, 2, 3, 4, 5], ',', null) = '1,2,3,4,5'
to_string(array[1, 2, 3, null, 5], ',', null) = '1,2,3,5'
to_array('1,2,3,,5', ',', null) = '{1,2,3,"",5}'Also, if we're going to make the function non-strict, we need to
consider how to respond when the user specifies NULL for the other
arguments. If the field separator is NULL, bearing in mind that NULL
can't match any string, I would expect that to_array would return the
undivided string as a single array element, and that to_string would
throw an error:
ok, it has a sense.
other question is empty string as separator - but I think, it can has
same behave like string_to_array and array_to_string functions.
to_array('1,2,3,4,5', null) = '{"1,2,3,4,5"}'
to_string(array[1,2,3,4,5], null) = ERROR: the field separator for
to_string may not be NULLIf the first argument is NULL for either function, I think it would be
reasonable to return NULL. But I could be convinced that we should
throw an error in that case too.
I agree - I prefer a NULL
Thank You very much
Pavel
Show quoted text
Cheers,
BJ
On 17 July 2010 04:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/16 Brendan Jurd <direvus@gmail.com>:
Also, if we're going to make the function non-strict, we need to
consider how to respond when the user specifies NULL for the other
arguments. If the field separator is NULL, bearing in mind that NULL
can't match any string, I would expect that to_array would return the
undivided string as a single array element, and that to_string would
throw an error:ok, it has a sense.
other question is empty string as separator - but I think, it can has
same behave like string_to_array and array_to_string functions.
Agreed. Those behaviours seem sensible.
If the first argument is NULL for either function, I think it would be
reasonable to return NULL. But I could be convinced that we should
throw an error in that case too.I agree - I prefer a NULL
Thank You very much
No worries; I will await a revised patch from you which updates these
behaviours -- please incorporate the doc/comment changes I posted
earlier -- I will then do a further review before handing off to a
committer.
Cheers,
BJ
Hello
here is a new version - new these functions are not a strict and
function to_string is marked as stable.
both functions share code with older version.
Regards
Pavel
2010/7/16 Brendan Jurd <direvus@gmail.com>:
Show quoted text
On 17 July 2010 04:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/16 Brendan Jurd <direvus@gmail.com>:
Also, if we're going to make the function non-strict, we need to
consider how to respond when the user specifies NULL for the other
arguments. If the field separator is NULL, bearing in mind that NULL
can't match any string, I would expect that to_array would return the
undivided string as a single array element, and that to_string would
throw an error:ok, it has a sense.
other question is empty string as separator - but I think, it can has
same behave like string_to_array and array_to_string functions.Agreed. Those behaviours seem sensible.
If the first argument is NULL for either function, I think it would be
reasonable to return NULL. But I could be convinced that we should
throw an error in that case too.I agree - I prefer a NULL
Thank You very much
No worries; I will await a revised patch from you which updates these
behaviours -- please incorporate the doc/comment changes I posted
earlier -- I will then do a further review before handing off to a
committer.Cheers,
BJ
Attachments:
arraytext.diffapplication/octet-stream; name=arraytext.diffDownload
*** ./doc/src/sgml/func.sgml.orig 2010-07-03 19:21:48.000000000 +0200
--- ./doc/src/sgml/func.sgml 2010-07-20 08:44:59.267804371 +0200
***************
*** 4652,4658 ****
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
--- 4652,4658 ----
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
***************
*** 9544,9549 ****
--- 9544,9555 ----
<primary>string_to_array</primary>
</indexterm>
<indexterm>
+ <primary>to_array</primary>
+ </indexterm>
+ <indexterm>
+ <primary>to_string</primary>
+ </indexterm>
+ <indexterm>
<primary>unnest</primary>
</indexterm>
***************
*** 9686,9691 ****
--- 9692,9719 ----
<row>
<entry>
<literal>
+ <function>to_array</function>(<type>text</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text[]</type></entry>
+ <entry>splits string into array elements using supplied delimiter and null string</entry>
+ <entry><literal>to_array('1,2,3,,5', ',')</literal></entry>
+ <entry><literal>{1,2,3,4,NULL,5}</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
+ <function>to_string</function>(<type>anyarray</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>concatenates array elements using supplied delimiter and null string</entry>
+ <entry><literal>to_string(ARRAY[1, 2, 3, NULL, 5], ',', '*')</literal></entry>
+ <entry><literal>1,2,3,*,5</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
<function>unnest</function>(<type>anyarray</type>)
</literal>
</entry>
*** ./src/backend/catalog/system_views.sql.orig 2010-07-20 08:44:18.940804380 +0200
--- ./src/backend/catalog/system_views.sql 2010-07-20 11:20:20.415803330 +0200
***************
*** 487,489 ****
--- 487,497 ----
CREATE OR REPLACE FUNCTION
pg_start_backup(label text, fast boolean DEFAULT false)
RETURNS text STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
+
+ CREATE OR REPLACE FUNCTION
+ to_string(v anyarray, fldsep text, null_string text DEFAULT '')
+ RETURNS text STABLE LANGUAGE internal AS 'to_string';
+
+ CREATE OR REPLACE FUNCTION
+ to_array(inputstr text, fldsep text, null_string text DEFAULT '')
+ RETURNS text[] IMMUTABLE LANGUAGE internal AS 'to_array';
*** ./src/backend/utils/adt/array_userfuncs.c.orig 2010-07-20 08:44:18.941805977 +0200
--- ./src/backend/utils/adt/array_userfuncs.c 2010-07-20 08:44:59.271806788 +0200
***************
*** 407,415 ****
--- 407,417 ----
create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims)
{
Datum dvalues[1];
+ bool nulls[1];
int16 typlen;
bool typbyval;
char typalign;
***************
*** 429,434 ****
--- 431,437 ----
ndims, MAXDIM)));
dvalues[0] = element;
+ nulls[0] = isNull;
for (i = 0; i < ndims; i++)
{
***************
*** 462,468 ****
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, NULL, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
--- 465,471 ----
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, nulls, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
*** ./src/backend/utils/adt/varlena.c.orig 2010-07-20 08:44:18.943806309 +0200
--- ./src/backend/utils/adt/varlena.c 2010-07-20 13:19:45.006803452 +0200
***************
*** 75,80 ****
--- 75,83 ----
static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
static StringInfo makeStringAggState(FunctionCallInfo fcinfo);
+ static Datum _to_array(PG_FUNCTION_ARGS);
+ static Datum _to_text(PG_FUNCTION_ARGS);
+
/*****************************************************************************
* CONVERSION ROUTINES EXPORTED FOR USE BY C CODE *
***************
*** 2965,2970 ****
--- 2968,2984 ----
}
/*
+ * Returns true when two text params are same.
+ */
+ static
+ bool text_isequal(text *txt1, text *txt2)
+ {
+ return DatumGetBool(DirectFunctionCall2(texteq,
+ PointerGetDatum(txt1),
+ PointerGetDatum(txt2)));
+ }
+
+ /*
* text_to_array
* parse input string
* return text array of elements
***************
*** 2973,3076 ****
Datum
text_to_array(PG_FUNCTION_ARGS)
{
! text *inputstring = PG_GETARG_TEXT_PP(0);
! text *fldsep = PG_GETARG_TEXT_PP(1);
int inputstring_len;
- int fldsep_len;
- TextPositionState state;
- int fldnum;
- int start_posn;
- int end_posn;
- int chunk_len;
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
! text_position_setup(inputstring, fldsep, &state);
!
! /*
! * Note: we check the converted string length, not the original, because
! * they could be different if the input contained invalid encoding.
! */
! inputstring_len = state.len1;
! fldsep_len = state.len2;
!
! /* return NULL for empty input string */
! if (inputstring_len < 1)
! {
! text_position_cleanup(&state);
PG_RETURN_NULL();
! }
! /*
! * empty field separator return one element, 1D, array using the input
! * string
*/
! if (fldsep_len < 1)
! {
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring), 1));
! }
!
! start_posn = 1;
! /* start_ptr points to the start_posn'th character of inputstring */
! start_ptr = VARDATA_ANY(inputstring);
!
! for (fldnum = 1;; fldnum++) /* field number is 1 based */
{
! CHECK_FOR_INTERRUPTS();
!
! end_posn = text_position_next(start_posn, &state);
! if (end_posn == 0)
{
! /* fetch last field */
! chunk_len = ((char *) inputstring + VARSIZE_ANY(inputstring)) - start_ptr;
}
! else
{
! /* fetch non-last field */
! chunk_len = charlen_to_bytelen(start_ptr, end_posn - start_posn);
}
! /* must build a temp text datum to pass to accumArrayResult */
! result_text = cstring_to_text_with_len(start_ptr, chunk_len);
! /* stash away this field */
! astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! false,
! TEXTOID,
! CurrentMemoryContext);
! pfree(result_text);
! if (end_posn == 0)
! break;
! start_posn = end_posn;
! start_ptr += chunk_len;
! start_posn += fldsep_len;
! start_ptr += charlen_to_bytelen(start_ptr, fldsep_len);
}
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
! CurrentMemoryContext));
}
/*
* array_to_text
* concatenate Cstring representation of input array elements
! * using provided field separator
*/
Datum
array_to_text(PG_FUNCTION_ARGS)
{
! ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
! char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
int nitems,
*dims,
ndims;
--- 2987,3197 ----
Datum
text_to_array(PG_FUNCTION_ARGS)
{
! text *inputstr = PG_GETARG_TEXT_PP(0);
!
! /* function string_to_array returns a NULL for empty string */
! if (VARSIZE_ANY_EXHDR(inputstr) < 1)
! PG_RETURN_NULL();
!
! return _to_array(fcinfo);
! }
!
! /*
! * just only outer wrapper
! */
! Datum
! to_array(PG_FUNCTION_ARGS)
! {
! return _to_array(fcinfo);
! }
!
! /*
! * _to_array - common code for to_array and text_to_array functions
! */
! static Datum
! _to_array(PG_FUNCTION_ARGS)
! {
int inputstring_len;
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
+ bool is_null;
+ text *inputstring;
+ text *fldsep;
+ text *null_string;
! /* when input string is NULL, then result is NULL too */
! if (PG_ARGISNULL(0))
PG_RETURN_NULL();
!
! inputstring = PG_GETARG_TEXT_PP(0);
! /*
! * when null string is NULL, there are not string representation for NULL,
! * the behave is compatible old text_to_array function. to_array function has
! * three arguments - last argument has default value. text_to_array function
! * has two arguments. When null_string is NULL, then behave is same like
! * old text_to_array (string_to_array) function.
*/
! if (PG_NARGS() > 2)
! null_string = (!PG_ARGISNULL(2)) ? PG_GETARG_TEXT_PP(2) : NULL;
! else
! null_string = NULL;
!
! /*
! * when separator is NULL, then returns every character as field,
! * there are no any character used as separator - etc separator is
! * space between chars.
! */
! if (!PG_ARGISNULL(1))
{
! TextPositionState state;
! int fldnum;
! int start_posn;
! int end_posn;
! int chunk_len;
!
! fldsep = PG_GETARG_TEXT_PP(1);
!
! text_position_setup(inputstring, fldsep, &state);
! /* return empty array for empty input string */
! if (state.len1 < 1)
{
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
}
!
! /*
! * empty field separator return one element, 1D, array using the input
! * string
! */
! if (state.len2 < 1)
{
! text_position_cleanup(&state);
! /* single element can be a NULL too */
! is_null = (null_string != NULL) ? text_isequal(null_string, inputstring) : false;
! PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring),
! is_null, 1));
}
+
+ start_posn = 1;
+ /* start_ptr points to the start_posn'th character of inputstring */
+ start_ptr = VARDATA_ANY(inputstring);
+
+ for (fldnum = 1;; fldnum++) /* field number is 1 based */
+ {
+ CHECK_FOR_INTERRUPTS();
! end_posn = text_position_next(start_posn, &state);
!
! if (end_posn == 0)
! {
! /* fetch last field */
! chunk_len = ((char *) inputstring + VARSIZE_ANY(inputstring)) - start_ptr;
! }
! else
! {
! /* fetch non-last field */
! chunk_len = charlen_to_bytelen(start_ptr, end_posn - start_posn);
! }
! /* must build a temp text datum to pass to accumArrayResult */
! result_text = cstring_to_text_with_len(start_ptr, chunk_len);
! is_null = (null_string != NULL) ? text_isequal(null_string, result_text) : false;
!
! /* stash away this field */
! astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! is_null,
! TEXTOID,
! CurrentMemoryContext);
! pfree(result_text);
! if (end_posn == 0)
! break;
! start_posn = end_posn;
! start_ptr += chunk_len;
! start_posn += state.len2;
! start_ptr += charlen_to_bytelen(start_ptr, state.len2);
! }
!
! text_position_cleanup(&state);
}
+ else
+ {
+ /*
+ * simple mode
+ * when field separator is NULL, then divide chars from string.
+ */
+ inputstring_len = VARSIZE_ANY_EXHDR(inputstring);
+
+ /* returns empty array for empty string */
+ if (inputstring_len < 1)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
+
+ start_ptr = VARDATA_ANY(inputstring);
+
+ while (inputstring_len > 0)
+ {
+ int chunk_len;
+
+ chunk_len = pg_mblen(start_ptr);
+ result_text = cstring_to_text_with_len(start_ptr, chunk_len);
+
+ is_null = (null_string != NULL) ? text_isequal(null_string, result_text) : false;
+
+ /* stash away this field */
+ astate = accumArrayResult(astate,
+ PointerGetDatum(result_text),
+ is_null,
+ TEXTOID,
+ CurrentMemoryContext);
! pfree(result_text);
! start_ptr += chunk_len;
! inputstring_len -= chunk_len;
! }
! }
! PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext));
}
/*
* array_to_text
* concatenate Cstring representation of input array elements
! * using provided field separator - just only wrapper for
! * strict version of function.
*/
Datum
array_to_text(PG_FUNCTION_ARGS)
{
! return _to_text(fcinfo);
! }
!
! /*
! * to_string
! * concatenate Cstring representation of input array elements
! * using provided field separator and null string - just only
! * wrapper for non strict three params function.
! */
! Datum
! to_string(PG_FUNCTION_ARGS)
! {
! return _to_text(fcinfo);
! }
!
! /*
! * common code for array_to_text and to_string function. null_string can be
! * NULL (only for array_to_text function).
! */
! static Datum
! _to_text(PG_FUNCTION_ARGS)
! {
! ArrayType *v;
int nitems,
*dims,
ndims;
***************
*** 3085,3090 ****
--- 3206,3241 ----
int bitmask;
int i;
ArrayMetaState *my_extra;
+ char *fieldsep;
+ char *null_string;
+ text *result;
+
+
+ /* returns NULL when input parameter is NULL */
+ if (PG_ARGISNULL(0))
+ PG_RETURN_NULL();
+
+ v = PG_GETARG_ARRAYTYPE_P(0);
+
+
+ /* field separator cannot be a NULL */
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("field separator cannot be NULL")));
+
+ fieldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+ /* get a NULL string */
+ if (PG_NARGS() > 2)
+ {
+ if (!PG_ARGISNULL(2))
+ null_string = text_to_cstring(PG_GETARG_TEXT_PP(2));
+ else
+ null_string = NULL;
+ }
+ else
+ null_string = NULL;
ndims = ARR_NDIM(v);
dims = ARR_DIMS(v);
***************
*** 3092,3098 ****
/* if there are no elements, return an empty string */
if (nitems == 0)
! PG_RETURN_TEXT_P(cstring_to_text(""));
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
--- 3243,3249 ----
/* if there are no elements, return an empty string */
if (nitems == 0)
! PG_RETURN_TEXT_P(cstring_to_text_with_len("", 0));
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
***************
*** 3140,3146 ****
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls */
}
else
{
--- 3291,3305 ----
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls, when null_string isn't defined (is NULL) */
! if (null_string != NULL)
! {
! if (printed)
! appendStringInfo(&buf, "%s%s", fieldsep, null_string);
! else
! appendStringInfoString(&buf, null_string);
! printed = true;
! }
}
else
{
***************
*** 3149,3155 ****
value = OutputFunctionCall(&my_extra->proc, itemvalue);
if (printed)
! appendStringInfo(&buf, "%s%s", fldsep, value);
else
appendStringInfoString(&buf, value);
printed = true;
--- 3308,3314 ----
value = OutputFunctionCall(&my_extra->proc, itemvalue);
if (printed)
! appendStringInfo(&buf, "%s%s", fieldsep, value);
else
appendStringInfoString(&buf, value);
printed = true;
***************
*** 3169,3176 ****
}
}
}
! PG_RETURN_TEXT_P(cstring_to_text_with_len(buf.data, buf.len));
}
#define HEXBASE 16
--- 3328,3338 ----
}
}
}
+
+ result = cstring_to_text_with_len(buf.data, buf.len);
+ pfree(buf.data);
! PG_RETURN_TEXT_P(result);
}
#define HEXBASE 16
*** ./src/include/catalog/pg_proc.h.orig 2010-07-20 08:44:18.944804554 +0200
--- ./src/include/catalog/pg_proc.h 2010-07-20 11:17:39.173804133 +0200
***************
*** 1022,1027 ****
--- 1022,1031 ----
DESCR("split delimited text into text[]");
DATA(insert OID = 395 ( array_to_string PGNSP PGUID 12 1 0 0 f f f t f i 2 0 25 "2277 25" _null_ _null_ _null_ _null_ array_to_text _null_ _null_ _null_ ));
DESCR("concatenate array elements, using delimiter, into text");
+ DATA(insert OID = 950 ( to_array PGNSP PGUID 12 1 0 0 f f f f f i 3 0 1009 "25 25 25" _null_ _null_ _null_ _null_ to_array _null_ _null_ _null_ ));
+ DESCR("split delimited text into text[], possible to set null string");
+ DATA(insert OID = 951 ( to_string PGNSP PGUID 12 1 0 0 f f f f f i 3 0 25 "2277 25 25" _null_ _null_ _null_ _null_ to_string _null_ _null_ _null_ ));
+ DESCR("concatenate array elements, using delimiter, into text, possible to set null string");
DATA(insert OID = 515 ( array_larger PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 516 ( array_smaller PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_smaller _null_ _null_ _null_ ));
*** ./src/include/utils/array.h.orig 2010-07-20 08:44:18.946806422 +0200
--- ./src/include/utils/array.h 2010-07-20 08:44:59.279804706 +0200
***************
*** 274,279 ****
--- 274,280 ----
extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims);
extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
*** ./src/include/utils/builtins.h.orig 2010-07-13 22:57:19.000000000 +0200
--- ./src/include/utils/builtins.h 2010-07-20 08:44:59.279804706 +0200
***************
*** 715,720 ****
--- 715,722 ----
extern Datum split_text(PG_FUNCTION_ARGS);
extern Datum text_to_array(PG_FUNCTION_ARGS);
extern Datum array_to_text(PG_FUNCTION_ARGS);
+ extern Datum to_array(PG_FUNCTION_ARGS);
+ extern Datum to_string(PG_FUNCTION_ARGS);
extern Datum to_hex32(PG_FUNCTION_ARGS);
extern Datum to_hex64(PG_FUNCTION_ARGS);
extern Datum md5_text(PG_FUNCTION_ARGS);
*** ./src/test/regress/expected/arrays.out.orig 2010-07-20 08:44:18.948805776 +0200
--- ./src/test/regress/expected/arrays.out 2010-07-20 13:31:41.000000000 +0200
***************
*** 1208,1210 ****
--- 1208,1297 ----
[5:5]={"(42,43)"}
(1 row)
+ -- check to_string and to_array functions
+ select to_array('abc','');
+ to_array
+ ----------
+ {abc}
+ (1 row)
+
+ select to_array('abc','','abc');
+ to_array
+ ----------
+ {NULL}
+ (1 row)
+
+ select to_array('abc',',');
+ to_array
+ ----------
+ {abc}
+ (1 row)
+
+ select to_array('abc',',','abc');
+ to_array
+ ----------
+ {NULL}
+ (1 row)
+
+ select to_array('1,2,3,4,,6',',');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_array('1,2,3,4,,6',',','');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_array('1,2,3,4,*,6',',','*');
+ to_array
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select to_array(NULL, ',') is NULL;
+ ?column?
+ ----------
+ t
+ (1 row)
+
+ select to_array('', ',');
+ to_array
+ ----------
+ {}
+ (1 row)
+
+ select to_string(NULL::int4[], ',') is NULL;
+ ?column?
+ ----------
+ t
+ (1 row)
+
+ select to_string('{}'::int4[], ',');
+ to_string
+ -----------
+
+ (1 row)
+
+ select to_string(array[1,2,3,4,NULL,6],',');
+ to_string
+ ------------
+ 1,2,3,4,,6
+ (1 row)
+
+ select to_string(array[1,2,3,4,NULL,6],',','*');
+ to_string
+ -------------
+ 1,2,3,4,*,6
+ (1 row)
+
+ select to_string(array[1,2,3,4,NULL,6],NULL); -- should fail
+ ERROR: field separator cannot be NULL
+ select to_string(array[1,2,3,4,NULL,6],',', NULL); -- ignore NULL value
+ to_string
+ -----------
+ 1,2,3,4,6
+ (1 row)
+
*** ./src/test/regress/sql/arrays.sql.orig 2010-07-20 08:44:18.949804300 +0200
--- ./src/test/regress/sql/arrays.sql 2010-07-20 13:31:25.789928470 +0200
***************
*** 412,414 ****
--- 412,432 ----
select * from t1;
update t1 set f1[5].q2 = 43;
select * from t1;
+
+ -- check to_string and to_array functions
+
+ select to_array('abc','');
+ select to_array('abc','','abc');
+ select to_array('abc',',');
+ select to_array('abc',',','abc');
+ select to_array('1,2,3,4,,6',',');
+ select to_array('1,2,3,4,,6',',','');
+ select to_array('1,2,3,4,*,6',',','*');
+ select to_array(NULL, ',') is NULL;
+ select to_array('', ',');
+ select to_string(NULL::int4[], ',') is NULL;
+ select to_string('{}'::int4[], ',');
+ select to_string(array[1,2,3,4,NULL,6],',');
+ select to_string(array[1,2,3,4,NULL,6],',','*');
+ select to_string(array[1,2,3,4,NULL,6],NULL); -- should fail
+ select to_string(array[1,2,3,4,NULL,6],',', NULL); -- ignore NULL value
2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.
We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?
If to_string and to_array is in the SQL standard, we can accept the
name changes.
But if there are no standard, I'd like to keep the existing function names.
--
Itagaki Takahiro
2010/7/21 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
I have to repeat it, the behave of this functions are little bit
different. string_to_array and array_to_string are buggy.
* it isn't support a NULL
* it doesn't differentiate a empty array and NULL
* we cannot to change default behave of existing functions
* array_to_string is badly marked as IMMUTABLE
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?
If to_string and to_array is in the SQL standard, we can accept the
name changes.
But if there are no standard, I'd like to keep the existing function names.
no it isn't in standard, but I am thinking, so we have to gently alone
a old functions
Regards
Pavel Stehule
Show quoted text
--
Itagaki Takahiro
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?
The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.
Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.
I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg. I am not against to possible name
changes. But I am strong in opinion so current string_to_array and
array_to_string are buggy and have to be deprecated.
Regards
Pavel
p.s. can we use a names - text_to_array, array_to_text ?
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg.
But is it really bad enough to introduce custom versions of every
function that might want to do this sort of thing?
I am not against to possible name
changes. But I am strong in opinion so current string_to_array and
array_to_string are buggy and have to be deprecated.
But I don't think anyone else agrees with you. The current behavior
isn't the only one anyone might want, but it's one reasonable
behavior.
p.s. can we use a names - text_to_array, array_to_text ?
That's not going to reduce confusion one bit...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg.But is it really bad enough to introduce custom versions of every
function that might want to do this sort of thing?I am not against to possible name
changes. But I am strong in opinion so current string_to_array and
array_to_string are buggy and have to be deprecated.But I don't think anyone else agrees with you. The current behavior
isn't the only one anyone might want, but it's one reasonable
behavior.
see on discus to these function - this is Marlin Moncure proposal
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151503.html
these functions was designed in reaction to reporting bugs and
problems with serialisation and deserialisation of arrays with null
fields.
you can't to parse string to array with null values now
postgres=# select string_to_array('1,2,3,null,5',',')::int[];
ERROR: invalid input syntax for integer: "null"
postgres=#
Regards
Pavel Stehule
Show quoted text
p.s. can we use a names - text_to_array, array_to_text ?
That's not going to reduce confusion one bit...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 Pavel Stehule <pavel.stehule@gmail.com>:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg.But is it really bad enough to introduce custom versions of every
function that might want to do this sort of thing?
please look on http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151475.html
I am not alone in opinion so current string to array functions has
not good design
Regards
Pavel
Show quoted text
I am not against to possible name
changes. But I am strong in opinion so current string_to_array and
array_to_string are buggy and have to be deprecated.But I don't think anyone else agrees with you. The current behavior
isn't the only one anyone might want, but it's one reasonable
behavior.see on discus to these function - this is Marlin Moncure proposal
http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151503.html
these functions was designed in reaction to reporting bugs and
problems with serialisation and deserialisation of arrays with null
fields.you can't to parse string to array with null values now
postgres=# select string_to_array('1,2,3,null,5',',')::int[];
ERROR: invalid input syntax for integer: "null"
postgres=#Regards
Pavel Stehule
p.s. can we use a names - text_to_array, array_to_text ?
That's not going to reduce confusion one bit...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 8:14 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Pavel Stehule <pavel.stehule@gmail.com>:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 7:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:39 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).
Do we think it is good idea to have different names for them? IMHO, we'd
better use 3 arguments version of array_to_string() instead of the
new to_string() ?The worst part is that the new names are not very mnemonic.
I think maybe what we really need here is array equivalents of
COALESCE() and NULLIF(). It looks like the proposed to_string()
function is basically equivalent to replacing each NULL entry with the
array with a given value, and then doing array_to_string() as usual.
And it looks like the proposed to_array function basically does the
same thing as to_array(), and then replaces empty strings with NULL or
some other value.Maybe we just need a function array_replace(anyarray, anyelement,
anyelement) that replaces any element in the array that IS NOT
DISTINCT FROM $2 with $3 and returns the new array. That could be
useful for other things besides this particular case, too.I don't agree. Building or updating any array is little bit expensive.
There can be same performance issue like combination array_agg and
array_to_string versus string_agg.But is it really bad enough to introduce custom versions of every
function that might want to do this sort of thing?please look on http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg151475.html
I am not alone in opinion so current string to array functions has
not good design
OK, I stand corrected, although I'm not totally convinced. I still
think to_array() and to_string() are not a good choice of names. I am
not sure if we should reuse the existing names (adding a third
parameter) or pick something else, like array_concat() and
split_to_array().
Also, should we consider putting these in contrib/stringfunc rather
than core? Or is there enough support for core that we should stick
with doing it that way?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
OK, I stand corrected, although I'm not totally convinced. I still
think to_array() and to_string() are not a good choice of names. I am
not sure if we should reuse the existing names (adding a third
parameter) or pick something else, like array_concat() and
split_to_array().
It was discussed before. I would to see some symmetry in names. The
bad thing is so great names like string_to_array and array_to_string
is used, and second bad thing was done three years ago when nobody
thinking about NULL values. I don't think, so we are able to repair
older functions - simply the default behave isn't optimal.
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.
Also, should we consider putting these in contrib/stringfunc rather
than core? Or is there enough support for core that we should stick
with doing it that way?
so it is one variant. I am not against to moving these function to
contrib/stringfunc.
I am thinking, so we have to solve question about marking
string_to_array and array_to_string functions as deprecated first.
Then we can move forward?? My opinion is known - I am for removing of
these function in future and replacing by modernized functions.
Others opinions???
Can we move forward?
Regards
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
It was discussed before. I would to see some symmetry in names.
That's reasonable.
The
bad thing is so great names like string_to_array and array_to_string
is used,
Yeah, those names are not too good.
and second bad thing was done three years ago when nobody
thinking about NULL values. I don't think, so we are able to repair
older functions - simply the default behave isn't optimal.
This is a matter of opinion, but certainly it's not right for everyone.
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.
Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On 22 July 2010 01:55, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first.Well, -1 from me for deprecating string_to_array and array_to_string.
For what it's worth, I agree with Pavel about the current behaviour in
core. It's broken whenever NULLs come into play. We need to improve
on this one way or another, and I think it would be a shame to deal
with a problem in core by adding something to contrib.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.
What about implode() and explode()? It's got symmetry and it's
possibly more descriptive.
Cheers,
BJ
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
It was discussed before. I would to see some symmetry in names.
That's reasonable.
The
bad thing is so great names like string_to_array and array_to_string
is used,Yeah, those names are not too good.
and second bad thing was done three years ago when nobody
thinking about NULL values. I don't think, so we are able to repair
older functions - simply the default behave isn't optimal.This is a matter of opinion, but certainly it's not right for everyone.
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.
I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 12:08 PM, Brendan Jurd <direvus@gmail.com> wrote:
On 22 July 2010 01:55, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jul 21, 2010 at 9:39 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first.Well, -1 from me for deprecating string_to_array and array_to_string.
For what it's worth, I agree with Pavel about the current behaviour in
core. It's broken whenever NULLs come into play. We need to improve
on this one way or another, and I think it would be a shame to deal
with a problem in core by adding something to contrib.
Fair enough. I'm OK with putting it in core if we can come up with
suitable names.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.What about implode() and explode()? It's got symmetry and it's
possibly more descriptive.
Hmm, it's a thought.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.
Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.
array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Jul 21, 2010, at 12:30 , Robert Haas wrote:
array_split() and array_join(), following Perl?
+1. Seems common in other languages such as Ruby, Python, and Java as well.
Michael Glaesemann
grzm seespotcode net
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?
I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a "split" or "join". split
is method of str class in Java. So when I am following Perl, I feel
better with just only "split" and "join", but "join" is keyword :( -
step back, maybe string_split X array_join ?
select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');
so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.
Pavel
p.s. It is typical use case for packages - with it, we can have the
functions string.split() and array.join()
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a "split" or "join". split
is method of str class in Java. So when I am following Perl, I feel
better with just only "split" and "join", but "join" is keyword :( -
step back, maybe string_split X array_join ?select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.
Well, I guess I prefer my suggestion to any of those (I know... what a
surprise), but I think I could live with #3, #4, or #5. It's hard for
me to imagine that we really want to create a function called just
join(), given the other meanings that JOIN already has in SQL.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a "split" or "join". split
is method of str class in Java. So when I am following Perl, I feel
better with just only "split" and "join", but "join" is keyword :( -
step back, maybe string_split X array_join ?select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.Well, I guess I prefer my suggestion to any of those (I know... what a
surprise), but I think I could live with #3, #4, or #5. It's hard for
me to imagine that we really want to create a function called just
join(), given the other meanings that JOIN already has in SQL.
it hasn't any relation to SQL language - but I don't expect so some
like this can be accepted by Tom :). So for this moment we are in
agreement on #3, #4, #5. I think, we can wait one or two days for
opinions of others - and than I'll fix patch. ok?
Regards
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 2:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a "split" or "join". split
is method of str class in Java. So when I am following Perl, I feel
better with just only "split" and "join", but "join" is keyword :( -
step back, maybe string_split X array_join ?select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.Well, I guess I prefer my suggestion to any of those (I know... what a
surprise), but I think I could live with #3, #4, or #5. It's hard for
me to imagine that we really want to create a function called just
join(), given the other meanings that JOIN already has in SQL.it hasn't any relation to SQL language - but I don't expect so some
like this can be accepted by Tom :). So for this moment we are in
agreement on #3, #4, #5. I think, we can wait one or two days for
opinions of others - and than I'll fix patch. ok?
Yeah, I'd like some more votes, too. Aside from what I suggested
(array_join/array_split), I think my favorite is your #5.
We might also want to put some work into documentating the differences
between the old and new functions clearly.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 2:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 1:48 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2010/7/21 Robert Haas <robertmhaas@gmail.com>:
On Wed, Jul 21, 2010 at 12:08 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I am thinking so we have to do decision about string_to_array and
array_to_string deprecation first. If these function will be
deprecated, then we can use a similar names (and probably we should to
use a similar names) - so text_to_array or array_to_string can be
acceptable. If not, then this discus is needless - then to_string and
to_array have to be maximally in contrib - stringfunc is good idea -
and maybe we don't need thinking about new names.Well, -1 from me for deprecating string_to_array and array_to_string.
I am not in favor of the names to_string and to_array even if we put
them in contrib, though. The problem with string_to_array and
array_to_string is that they aren't descriptive enough, and
to_string/to_array is even less so.I am not a English native speaker, so I have a different feeling.
These functions do array_serialisation and array_deseralisation, but
this names are too long. I have not idea about better names - it is
descriptive well (for me) text->array, array->text - and these names
shows very cleanly symmetry between functions. I have to repeat - it
is very clean for not native speaker.Well, the problem is that array_to_string(), for example, tells you
that an array is being converted to a string, but not how. And
to_string() tells you that you're getting a string, but it doesn't
tell you either what you're getting it from or how you're getting it.
We already have a function to_char() which can be used to format a
whole bunch of different types as strings; I can't see adding a new
function with almost the same name that does something completely
different.array_split() and array_join(), following Perl? array_implode() and
array_explode(), along the lines suggested by Brendan?I have a problem with array_split - because there string is split. I
looked on net - and languages usually uses a "split" or "join". split
is method of str class in Java. So when I am following Perl, I feel
better with just only "split" and "join", but "join" is keyword :( -
step back, maybe string_split X array_join ?select string_split('1,2,3,4',',');
select array_join(array[1,2,3,4],',');so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.Well, I guess I prefer my suggestion to any of those (I know... what a
surprise), but I think I could live with #3, #4, or #5. It's hard for
me to imagine that we really want to create a function called just
join(), given the other meanings that JOIN already has in SQL.it hasn't any relation to SQL language - but I don't expect so some
like this can be accepted by Tom :). So for this moment we are in
agreement on #3, #4, #5. I think, we can wait one or two days for
opinions of others - and than I'll fix patch. ok?Yeah, I'd like some more votes, too. Aside from what I suggested
(array_join/array_split), I think my favorite is your #5.
ok
#5 - it is absolutely out of me - explode, implode are used in Czech
only with relation to bombs. In this moment I have a problem to decide
what is related to string_to_array and array_to_string - it is nothing
against to your opinion, just it means, so it hasn't any meaning for
me - and probably for lot of foreign developers. But I found on net,
that people use this names.
We might also want to put some work into documentating the differences
between the old and new functions clearly.
sure
Pavel
Show quoted text
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Wed, Jul 21, 2010 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, I'd like some more votes, too. Aside from what I suggested
(array_join/array_split), I think my favorite is your #5.
-1 for me for any name that is of the form of:
type_operation();
we don't have bytea_encode, array_unnest(), date_to_char(), etc. the
non-internal ones that we do have (mostly array funcs), are improperly
named imo. this is sql, not c. suppose we want to extend string
serialization to row types?
why not serialize/unserialize?
merlin
On Wed, Jul 21, 2010 at 02:38:17PM -0400, Merlin Moncure wrote:
On Wed, Jul 21, 2010 at 2:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, I'd like some more votes, too. �Aside from what I suggested
(array_join/array_split), I think my favorite is your #5.-1 for me for any name that is of the form of:
type_operation();we don't have bytea_encode, array_unnest(), date_to_char(), etc. the
non-internal ones that we do have (mostly array funcs), are improperly
named imo. this is sql, not c. suppose we want to extend string
serialization to row types?why not serialize/unserialize?
Because that's not what the function actually does.
FWIW, I'm for (im|ex)plode, as join()/split(), which I'd otherwise
like, would run into too many issues with JOIN.
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Robert Haas <robertmhaas@gmail.com> writes:
so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.Yeah, I'd like some more votes, too.
I still don't see a compelling reason not to extend existing functions
with a third argument. Or are we talking about deprecating them in the
future (like remove their mention in the docs) and have the new names to
replace them, with the new behavior as the default and the extended call
convention as the old behavior?
I'm not sure about that, so I think extending existing function is ok.
Or we would have to have the new functions work well with other types
too, so that it's compelling to move from the old ones.
Regards,
--
dim
2010/7/23 Dimitri Fontaine <dfontaine@hi-media.com>:
Robert Haas <robertmhaas@gmail.com> writes:
so my preferences:
1. split, join - I checked - we are able to create "join" function
2. split, array_join - when only "join" can be a problem
3. string_split, array_join - there are not clean symmetry, but it
respect wide used a semantics - string.split, array.join
4. explode, implode
5. array_explode, array_implode
-- I cannot to like array_split - it is contradiction for me.Yeah, I'd like some more votes, too.
I still don't see a compelling reason not to extend existing functions
with a third argument. Or are we talking about deprecating them in the
future (like remove their mention in the docs) and have the new names to
replace them, with the new behavior as the default and the extended call
convention as the old behavior?
just incomplete default behave :(. We can enhance old functions, but
we cannot to change default behave - it is mean, so we will to ignore
a NULLs in arrays forever - but it isn't true a three years. It is a
feature now. Please look to archive. There was a discus about it.
I'm not sure about that, so I think extending existing function is ok.
Or we would have to have the new functions work well with other types
too, so that it's compelling to move from the old ones.
I would not to replace or enhance a to_char function. I plan to use a
"implode", "explode" names
Regards
Pavel Stehule
Show quoted text
Regards,
--
dim
Hello Dimitry
I still don't see a compelling reason not to extend existing functions
with a third argument. Or are we talking about deprecating them in the
future (like remove their mention in the docs) and have the new names to
replace them, with the new behavior as the default and the extended call
convention as the old behavior?just incomplete default behave :(. We can enhance old functions, but
we cannot to change default behave - it is mean, so we will to ignore
a NULLs in arrays forever - but it isn't true a three years. It is a
feature now. Please look to archive. There was a discus about it.
The reason, why I am strong in change of default behave is only one -
I know only one use-case for curent behave - when array_to_string
ignore a NULL, - when you would to remove NULLs from array, you can do
string_to_array(array_to_string(x,'###'), '###') - I don't know other
use-case. When I have a NULL in array, then it have a some sense
there. And I can remove NULLs from array via more secure and faster
way
SELECT array(SELECT v FROM unnest(x) g(x) WHERE v IS NOT NULL)
using string_to_array and array_to_string is slower and for some
domains can be wrong (for text).
Regards
Pavel
p.s. I expect so anybody who has NULLs in an array not only for a joy.
Hello
I am sending a actualised patch. There is only one significant change
to last patch. Function to_string was renamed to "implode" and
to_array was renamed "explode".
Regards
Pavel Stehule
Attachments:
implode.difftext/x-patch; charset=US-ASCII; name=implode.diffDownload
*** ./doc/src/sgml/func.sgml.orig 2010-07-23 21:18:04.698690857 +0200
--- ./doc/src/sgml/func.sgml 2010-07-23 21:51:08.860689007 +0200
***************
*** 4652,4658 ****
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
--- 4652,4658 ----
</para>
<para>
! If case-independent matching is specified,
the effect is much as if all case distinctions had vanished from the
alphabet.
When an alphabetic that exists in multiple cases appears as an
***************
*** 9541,9546 ****
--- 9541,9552 ----
<primary>array_upper</primary>
</indexterm>
<indexterm>
+ <primary>explode</primary>
+ </indexterm>
+ <indexterm>
+ <primary>implode</primary>
+ </indexterm>
+ <indexterm>
<primary>string_to_array</primary>
</indexterm>
<indexterm>
***************
*** 9675,9680 ****
--- 9681,9708 ----
<row>
<entry>
<literal>
+ <function>explode</function>(<type>text</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text[]</type></entry>
+ <entry>splits string into array elements using supplied delimiter and null string</entry>
+ <entry><literal>exolode('1,2,3,,5', ',')</literal></entry>
+ <entry><literal>{1,2,3,4,NULL,5}</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
+ <function>implode</function>(<type>anyarray</type>, <type>text</type> <optional>, <type>text</type></optional>)
+ </literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>concatenates array elements using supplied delimiter and null string</entry>
+ <entry><literal>implode(ARRAY[1, 2, 3, NULL, 5], ',', '*')</literal></entry>
+ <entry><literal>1,2,3,*,5</literal></entry>
+ </row>
+ <row>
+ <entry>
+ <literal>
<function>string_to_array</function>(<type>text</type>, <type>text</type>)
</literal>
</entry>
*** ./src/backend/catalog/system_views.sql.orig 2010-07-23 21:18:04.806852641 +0200
--- ./src/backend/catalog/system_views.sql 2010-07-23 22:03:56.329687977 +0200
***************
*** 487,489 ****
--- 487,497 ----
CREATE OR REPLACE FUNCTION
pg_start_backup(label text, fast boolean DEFAULT false)
RETURNS text STRICT VOLATILE LANGUAGE internal AS 'pg_start_backup';
+
+ CREATE OR REPLACE FUNCTION
+ implode(v anyarray, fldsep text, null_string text DEFAULT '')
+ RETURNS text STABLE LANGUAGE internal AS 'implode';
+
+ CREATE OR REPLACE FUNCTION
+ explode(inputstr text, fldsep text, null_string text DEFAULT '')
+ RETURNS text[] IMMUTABLE LANGUAGE internal AS 'explode';
*** ./src/backend/utils/adt/array_userfuncs.c.orig 2010-07-23 21:18:04.880689496 +0200
--- ./src/backend/utils/adt/array_userfuncs.c 2010-07-23 21:18:36.467693435 +0200
***************
*** 407,415 ****
--- 407,417 ----
create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims)
{
Datum dvalues[1];
+ bool nulls[1];
int16 typlen;
bool typbyval;
char typalign;
***************
*** 429,434 ****
--- 431,437 ----
ndims, MAXDIM)));
dvalues[0] = element;
+ nulls[0] = isNull;
for (i = 0; i < ndims; i++)
{
***************
*** 462,468 ****
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, NULL, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
--- 465,471 ----
typbyval = my_extra->typbyval;
typalign = my_extra->typalign;
! return construct_md_array(dvalues, nulls, ndims, dims, lbs, element_type,
typlen, typbyval, typalign);
}
*** ./src/backend/utils/adt/varlena.c.orig 2010-07-23 21:18:04.911693316 +0200
--- ./src/backend/utils/adt/varlena.c 2010-07-23 22:00:47.832690000 +0200
***************
*** 75,80 ****
--- 75,83 ----
static bytea *bytea_overlay(bytea *t1, bytea *t2, int sp, int sl);
static StringInfo makeStringAggState(FunctionCallInfo fcinfo);
+ static Datum _explode(PG_FUNCTION_ARGS);
+ static Datum _implode(PG_FUNCTION_ARGS);
+
/*****************************************************************************
* CONVERSION ROUTINES EXPORTED FOR USE BY C CODE *
***************
*** 2965,2970 ****
--- 2968,2984 ----
}
/*
+ * Returns true when two text params are same.
+ */
+ static
+ bool text_isequal(text *txt1, text *txt2)
+ {
+ return DatumGetBool(DirectFunctionCall2(texteq,
+ PointerGetDatum(txt1),
+ PointerGetDatum(txt2)));
+ }
+
+ /*
* text_to_array
* parse input string
* return text array of elements
***************
*** 2973,3076 ****
Datum
text_to_array(PG_FUNCTION_ARGS)
{
! text *inputstring = PG_GETARG_TEXT_PP(0);
! text *fldsep = PG_GETARG_TEXT_PP(1);
int inputstring_len;
- int fldsep_len;
- TextPositionState state;
- int fldnum;
- int start_posn;
- int end_posn;
- int chunk_len;
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
! text_position_setup(inputstring, fldsep, &state);
!
! /*
! * Note: we check the converted string length, not the original, because
! * they could be different if the input contained invalid encoding.
! */
! inputstring_len = state.len1;
! fldsep_len = state.len2;
!
! /* return NULL for empty input string */
! if (inputstring_len < 1)
! {
! text_position_cleanup(&state);
PG_RETURN_NULL();
! }
! /*
! * empty field separator return one element, 1D, array using the input
! * string
*/
! if (fldsep_len < 1)
! {
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring), 1));
! }
!
! start_posn = 1;
! /* start_ptr points to the start_posn'th character of inputstring */
! start_ptr = VARDATA_ANY(inputstring);
!
! for (fldnum = 1;; fldnum++) /* field number is 1 based */
{
! CHECK_FOR_INTERRUPTS();
!
! end_posn = text_position_next(start_posn, &state);
! if (end_posn == 0)
{
! /* fetch last field */
! chunk_len = ((char *) inputstring + VARSIZE_ANY(inputstring)) - start_ptr;
}
! else
{
! /* fetch non-last field */
! chunk_len = charlen_to_bytelen(start_ptr, end_posn - start_posn);
}
! /* must build a temp text datum to pass to accumArrayResult */
! result_text = cstring_to_text_with_len(start_ptr, chunk_len);
! /* stash away this field */
! astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! false,
! TEXTOID,
! CurrentMemoryContext);
! pfree(result_text);
! if (end_posn == 0)
! break;
! start_posn = end_posn;
! start_ptr += chunk_len;
! start_posn += fldsep_len;
! start_ptr += charlen_to_bytelen(start_ptr, fldsep_len);
}
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
! CurrentMemoryContext));
}
/*
* array_to_text
* concatenate Cstring representation of input array elements
! * using provided field separator
*/
Datum
array_to_text(PG_FUNCTION_ARGS)
{
! ArrayType *v = PG_GETARG_ARRAYTYPE_P(0);
! char *fldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
int nitems,
*dims,
ndims;
--- 2987,3197 ----
Datum
text_to_array(PG_FUNCTION_ARGS)
{
! text *inputstr = PG_GETARG_TEXT_PP(0);
!
! /* function string_to_array returns a NULL for empty string */
! if (VARSIZE_ANY_EXHDR(inputstr) < 1)
! PG_RETURN_NULL();
!
! return _explode(fcinfo);
! }
!
! /*
! * just only outer wrapper
! */
! Datum
! explode(PG_FUNCTION_ARGS)
! {
! return _explode(fcinfo);
! }
!
! /*
! * _to_array - common code for to_array and text_to_array functions
! */
! static Datum
! _explode(PG_FUNCTION_ARGS)
! {
int inputstring_len;
char *start_ptr;
text *result_text;
ArrayBuildState *astate = NULL;
+ bool is_null;
+ text *inputstring;
+ text *fldsep;
+ text *null_string;
! /* when input string is NULL, then result is NULL too */
! if (PG_ARGISNULL(0))
PG_RETURN_NULL();
!
! inputstring = PG_GETARG_TEXT_PP(0);
! /*
! * when null string is NULL, there are not string representation for NULL,
! * the behave is compatible old text_to_array function. to_array function has
! * three arguments - last argument has default value. text_to_array function
! * has two arguments. When null_string is NULL, then behave is same like
! * old text_to_array (string_to_array) function.
*/
! if (PG_NARGS() > 2)
! null_string = (!PG_ARGISNULL(2)) ? PG_GETARG_TEXT_PP(2) : NULL;
! else
! null_string = NULL;
!
! /*
! * when separator is NULL, then returns every character as field,
! * there are no any character used as separator - etc separator is
! * space between chars.
! */
! if (!PG_ARGISNULL(1))
{
! TextPositionState state;
! int fldnum;
! int start_posn;
! int end_posn;
! int chunk_len;
!
! fldsep = PG_GETARG_TEXT_PP(1);
!
! text_position_setup(inputstring, fldsep, &state);
! /* return empty array for empty input string */
! if (state.len1 < 1)
{
! text_position_cleanup(&state);
! PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
}
!
! /*
! * empty field separator return one element, 1D, array using the input
! * string
! */
! if (state.len2 < 1)
{
! text_position_cleanup(&state);
! /* single element can be a NULL too */
! is_null = (null_string != NULL) ? text_isequal(null_string, inputstring) : false;
! PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
! PointerGetDatum(inputstring),
! is_null, 1));
}
+
+ start_posn = 1;
+ /* start_ptr points to the start_posn'th character of inputstring */
+ start_ptr = VARDATA_ANY(inputstring);
+
+ for (fldnum = 1;; fldnum++) /* field number is 1 based */
+ {
+ CHECK_FOR_INTERRUPTS();
! end_posn = text_position_next(start_posn, &state);
!
! if (end_posn == 0)
! {
! /* fetch last field */
! chunk_len = ((char *) inputstring + VARSIZE_ANY(inputstring)) - start_ptr;
! }
! else
! {
! /* fetch non-last field */
! chunk_len = charlen_to_bytelen(start_ptr, end_posn - start_posn);
! }
! /* must build a temp text datum to pass to accumArrayResult */
! result_text = cstring_to_text_with_len(start_ptr, chunk_len);
! is_null = (null_string != NULL) ? text_isequal(null_string, result_text) : false;
!
! /* stash away this field */
! astate = accumArrayResult(astate,
PointerGetDatum(result_text),
! is_null,
! TEXTOID,
! CurrentMemoryContext);
! pfree(result_text);
! if (end_posn == 0)
! break;
! start_posn = end_posn;
! start_ptr += chunk_len;
! start_posn += state.len2;
! start_ptr += charlen_to_bytelen(start_ptr, state.len2);
! }
!
! text_position_cleanup(&state);
}
+ else
+ {
+ /*
+ * simple mode
+ * when field separator is NULL, then divide chars from string.
+ */
+ inputstring_len = VARSIZE_ANY_EXHDR(inputstring);
+
+ /* returns empty array for empty string */
+ if (inputstring_len < 1)
+ PG_RETURN_ARRAYTYPE_P(construct_empty_array(TEXTOID));
+
+ start_ptr = VARDATA_ANY(inputstring);
+
+ while (inputstring_len > 0)
+ {
+ int chunk_len;
+
+ chunk_len = pg_mblen(start_ptr);
+ result_text = cstring_to_text_with_len(start_ptr, chunk_len);
+
+ is_null = (null_string != NULL) ? text_isequal(null_string, result_text) : false;
+
+ /* stash away this field */
+ astate = accumArrayResult(astate,
+ PointerGetDatum(result_text),
+ is_null,
+ TEXTOID,
+ CurrentMemoryContext);
! pfree(result_text);
! start_ptr += chunk_len;
! inputstring_len -= chunk_len;
! }
! }
! PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate, CurrentMemoryContext));
}
/*
* array_to_text
* concatenate Cstring representation of input array elements
! * using provided field separator - just only wrapper for
! * strict version of function.
*/
Datum
array_to_text(PG_FUNCTION_ARGS)
{
! return _implode(fcinfo);
! }
!
! /*
! * implode
! * concatenate Cstring representation of input array elements
! * using provided field separator and null string - just only
! * wrapper for non strict three params function.
! */
! Datum
! implode(PG_FUNCTION_ARGS)
! {
! return _implode(fcinfo);
! }
!
! /*
! * common code for array_to_text and to_string function. null_string can be
! * NULL (only for array_to_text function).
! */
! static Datum
! _implode(PG_FUNCTION_ARGS)
! {
! ArrayType *v;
int nitems,
*dims,
ndims;
***************
*** 3085,3090 ****
--- 3206,3241 ----
int bitmask;
int i;
ArrayMetaState *my_extra;
+ char *fieldsep;
+ char *null_string;
+ text *result;
+
+
+ /* returns NULL when input parameter is NULL */
+ if (PG_ARGISNULL(0))
+ PG_RETURN_NULL();
+
+ v = PG_GETARG_ARRAYTYPE_P(0);
+
+
+ /* field separator cannot be a NULL */
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("field separator cannot be NULL")));
+
+ fieldsep = text_to_cstring(PG_GETARG_TEXT_PP(1));
+
+ /* get a NULL string */
+ if (PG_NARGS() > 2)
+ {
+ if (!PG_ARGISNULL(2))
+ null_string = text_to_cstring(PG_GETARG_TEXT_PP(2));
+ else
+ null_string = NULL;
+ }
+ else
+ null_string = NULL;
ndims = ARR_NDIM(v);
dims = ARR_DIMS(v);
***************
*** 3092,3098 ****
/* if there are no elements, return an empty string */
if (nitems == 0)
! PG_RETURN_TEXT_P(cstring_to_text(""));
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
--- 3243,3249 ----
/* if there are no elements, return an empty string */
if (nitems == 0)
! PG_RETURN_TEXT_P(cstring_to_text_with_len("", 0));
element_type = ARR_ELEMTYPE(v);
initStringInfo(&buf);
***************
*** 3140,3146 ****
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls */
}
else
{
--- 3291,3305 ----
/* Get source element, checking for NULL */
if (bitmap && (*bitmap & bitmask) == 0)
{
! /* we ignore nulls, when null_string isn't defined (is NULL) */
! if (null_string != NULL)
! {
! if (printed)
! appendStringInfo(&buf, "%s%s", fieldsep, null_string);
! else
! appendStringInfoString(&buf, null_string);
! printed = true;
! }
}
else
{
***************
*** 3149,3155 ****
value = OutputFunctionCall(&my_extra->proc, itemvalue);
if (printed)
! appendStringInfo(&buf, "%s%s", fldsep, value);
else
appendStringInfoString(&buf, value);
printed = true;
--- 3308,3314 ----
value = OutputFunctionCall(&my_extra->proc, itemvalue);
if (printed)
! appendStringInfo(&buf, "%s%s", fieldsep, value);
else
appendStringInfoString(&buf, value);
printed = true;
***************
*** 3169,3176 ****
}
}
}
! PG_RETURN_TEXT_P(cstring_to_text_with_len(buf.data, buf.len));
}
#define HEXBASE 16
--- 3328,3338 ----
}
}
}
+
+ result = cstring_to_text_with_len(buf.data, buf.len);
+ pfree(buf.data);
! PG_RETURN_TEXT_P(result);
}
#define HEXBASE 16
*** ./src/include/catalog/pg_proc.h.orig 2010-07-23 21:18:04.970977412 +0200
--- ./src/include/catalog/pg_proc.h 2010-07-23 21:53:09.567690848 +0200
***************
*** 1022,1027 ****
--- 1022,1031 ----
DESCR("split delimited text into text[]");
DATA(insert OID = 395 ( array_to_string PGNSP PGUID 12 1 0 0 f f f t f i 2 0 25 "2277 25" _null_ _null_ _null_ _null_ array_to_text _null_ _null_ _null_ ));
DESCR("concatenate array elements, using delimiter, into text");
+ DATA(insert OID = 950 ( explode PGNSP PGUID 12 1 0 0 f f f f f i 3 0 1009 "25 25 25" _null_ _null_ _null_ _null_ explode _null_ _null_ _null_ ));
+ DESCR("split delimited text into text[], possible to set null string");
+ DATA(insert OID = 951 ( implode PGNSP PGUID 12 1 0 0 f f f f f i 3 0 25 "2277 25 25" _null_ _null_ _null_ _null_ implode _null_ _null_ _null_ ));
+ DESCR("concatenate array elements, using delimiter, into text, possible to set null string");
DATA(insert OID = 515 ( array_larger PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_larger _null_ _null_ _null_ ));
DESCR("larger of two");
DATA(insert OID = 516 ( array_smaller PGNSP PGUID 12 1 0 0 f f f t f i 2 0 2277 "2277 2277" _null_ _null_ _null_ _null_ array_smaller _null_ _null_ _null_ ));
*** ./src/include/utils/array.h.orig 2010-07-23 21:18:05.025427635 +0200
--- ./src/include/utils/array.h 2010-07-23 21:18:36.532691416 +0200
***************
*** 274,279 ****
--- 274,280 ----
extern ArrayType *create_singleton_array(FunctionCallInfo fcinfo,
Oid element_type,
Datum element,
+ bool isNull,
int ndims);
extern Datum array_agg_transfn(PG_FUNCTION_ARGS);
*** ./src/include/utils/builtins.h.orig 2010-07-22 03:22:35.000000000 +0200
--- ./src/include/utils/builtins.h 2010-07-23 21:52:27.718688550 +0200
***************
*** 716,721 ****
--- 716,723 ----
extern Datum split_text(PG_FUNCTION_ARGS);
extern Datum text_to_array(PG_FUNCTION_ARGS);
extern Datum array_to_text(PG_FUNCTION_ARGS);
+ extern Datum exlode(PG_FUNCTION_ARGS);
+ extern Datum implode(PG_FUNCTION_ARGS);
extern Datum to_hex32(PG_FUNCTION_ARGS);
extern Datum to_hex64(PG_FUNCTION_ARGS);
extern Datum md5_text(PG_FUNCTION_ARGS);
*** ./src/test/regress/expected/arrays.out.orig 2010-07-23 21:18:05.106040250 +0200
--- ./src/test/regress/expected/arrays.out 2010-07-23 22:06:16.000000000 +0200
***************
*** 1208,1210 ****
--- 1208,1297 ----
[5:5]={"(42,43)"}
(1 row)
+ -- check to_string and to_array functions
+ select explode('abc','');
+ explode
+ ---------
+ {abc}
+ (1 row)
+
+ select explode('abc','','abc');
+ explode
+ ---------
+ {NULL}
+ (1 row)
+
+ select explode('abc',',');
+ explode
+ ---------
+ {abc}
+ (1 row)
+
+ select explode('abc',',','abc');
+ explode
+ ---------
+ {NULL}
+ (1 row)
+
+ select explode('1,2,3,4,,6',',');
+ explode
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select explode('1,2,3,4,,6',',','');
+ explode
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select explode('1,2,3,4,*,6',',','*');
+ explode
+ ------------------
+ {1,2,3,4,NULL,6}
+ (1 row)
+
+ select explode(NULL, ',') is NULL;
+ ?column?
+ ----------
+ t
+ (1 row)
+
+ select explode('', ',');
+ explode
+ ---------
+ {}
+ (1 row)
+
+ select implode(NULL::int4[], ',') is NULL;
+ ?column?
+ ----------
+ t
+ (1 row)
+
+ select implode('{}'::int4[], ',');
+ implode
+ ---------
+
+ (1 row)
+
+ select implode(array[1,2,3,4,NULL,6],',');
+ implode
+ ------------
+ 1,2,3,4,,6
+ (1 row)
+
+ select implode(array[1,2,3,4,NULL,6],',','*');
+ implode
+ -------------
+ 1,2,3,4,*,6
+ (1 row)
+
+ select implode(array[1,2,3,4,NULL,6],NULL); -- should fail
+ ERROR: field separator cannot be NULL
+ select implode(array[1,2,3,4,NULL,6],',', NULL); -- ignore NULL value
+ implode
+ -----------
+ 1,2,3,4,6
+ (1 row)
+
*** ./src/test/regress/sql/arrays.sql.orig 2010-07-23 21:18:05.123458199 +0200
--- ./src/test/regress/sql/arrays.sql 2010-07-23 21:55:46.646687842 +0200
***************
*** 412,414 ****
--- 412,432 ----
select * from t1;
update t1 set f1[5].q2 = 43;
select * from t1;
+
+ -- check to_string and to_array functions
+
+ select explode('abc','');
+ select explode('abc','','abc');
+ select explode('abc',',');
+ select explode('abc',',','abc');
+ select explode('1,2,3,4,,6',',');
+ select explode('1,2,3,4,,6',',','');
+ select explode('1,2,3,4,*,6',',','*');
+ select explode(NULL, ',') is NULL;
+ select explode('', ',');
+ select implode(NULL::int4[], ',') is NULL;
+ select implode('{}'::int4[], ',');
+ select implode(array[1,2,3,4,NULL,6],',');
+ select implode(array[1,2,3,4,NULL,6],',','*');
+ select implode(array[1,2,3,4,NULL,6],NULL); -- should fail
+ select implode(array[1,2,3,4,NULL,6],',', NULL); -- ignore NULL value
Apparently, the message I sent (quoted below) didn't make it to
-hackers. I know that Pavel received the message, as he replied to
it. I'm calling shenanigans on the mailing list server, but in the
meantime, here are those diffs again.
Show quoted text
On 31 July 2010 07:37, Brendan Jurd <direvus@gmail.com> wrote:
Hi Pavel,
I've reviewed your latest patch (which I refer to as v3 to keep
continuity with previous versions under the "to_array" naming system).You didn't quite complete the rename of the functions; in-code
comments and regression tests still referred to the old names. I
cleanup that up for you and also reworded some of the in-code comments
for clarity.Otherwise the patch looks good and the functions now work exactly as I
would expect.I also went ahead and added some more documentation to explain how
(im|ex)plode differ from their foo_to_bar counterparts, and what kind
of behaviour you'll get by specifying the arguments as NULL.I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Cheers,
BJ
Import Notes
Reply to msg id not found: AANLkTinpYDT9XDfeqE7VxpwV0X4tNAzt5HR3_x1wOWN8@mail.gmail.com
Pavel Stehule wrote:
2010/7/21 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
2010/7/20 Pavel Stehule <pavel.stehule@gmail.com>:
here is a new version - new these functions are not a strict and
function to_string is marked as stable.We have array_to_string(anyarray, text) and string_to_array(text, text),
and you'll introduce to_string(anyarray, text, text) and
to_array(text, text, text).I have to repeat it, the behave of this functions are little bit
different. string_to_array and array_to_string are buggy.* it isn't support a NULL
* it doesn't differentiate a empty array and NULL
* we cannot to change default behave of existing functions
* array_to_string is badly marked as IMMUTABLE
This email thread linked to from our TODO list explains that arrays
combined with NULLs have many inconsistenciess:
http://archives.postgresql.org/pgsql-bugs/2008-11/msg00009.php
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Looking at this, I want to question the implode/explode naming. I think
those names are too cute by half, not particularly mnemonic, not visibly
related to the similar existing functions, and not friendly to any
future extension in the same area.
My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.
[ Sorry for not complaining about this before, but I was on vacation
when the previous naming discussion went on. ]
regards, tom lane
On Mon, Aug 9, 2010 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Looking at this, I want to question the implode/explode naming. I think
those names are too cute by half, not particularly mnemonic, not visibly
related to the similar existing functions, and not friendly to any
future extension in the same area.My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.[ Sorry for not complaining about this before, but I was on vacation
when the previous naming discussion went on. ]
I can live with that, as long as it's clearly explained in the docs.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
On Mon, Aug 9, 2010 at 4:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Looking at this, I want to question the implode/explode naming. I think
those names are too cute by half, not particularly mnemonic, not visibly
related to the similar existing functions, and not friendly to any
future extension in the same area.My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.
Is there any reason why array functions need the type prefix when
other type conversion functions don't? Why didn't we name unnest()
array_unnest()?
merlin
Merlin Moncure <mmoncure@gmail.com> writes:
Is there any reason why array functions need the type prefix when
other type conversion functions don't? Why didn't we name unnest()
array_unnest()?
UNNEST() is in the standard, IIRC, so you'd have to ask the SQL
committee that. (And no, they're not exactly being consistent either,
see array_agg() for example.)
But anyway, my point here is that these functions are close enough to
the existing string_to_array/array_to_string functions that they should
be presented as variants of those, not arbitrarily assigned unrelated
new names. Whether we'd have chosen different names if we had it to do
over is academic.
regards, tom lane
On Aug 9, 2010, at 1:10 PM, Robert Haas wrote:
My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.[ Sorry for not complaining about this before, but I was on vacation
when the previous naming discussion went on. ]I can live with that, as long as it's clearly explained in the docs.
+1
David
On Mon, Aug 9, 2010 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
Is there any reason why array functions need the type prefix when
other type conversion functions don't? Why didn't we name unnest()
array_unnest()?UNNEST() is in the standard, IIRC, so you'd have to ask the SQL
committee that. (And no, they're not exactly being consistent either,
see array_agg() for example.)But anyway, my point here is that these functions are close enough to
the existing string_to_array/array_to_string functions that they should
be presented as variants of those, not arbitrarily assigned unrelated
new names. Whether we'd have chosen different names if we had it to do
over is academic.
I don't array_agg is the same case, because you're aggregating into an
array, not from one. all the same, +1 to your names (didn't like
explode much).
merlin
David E. Wheeler wrote:
On Aug 9, 2010, at 1:10 PM, Robert Haas wrote:
My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.[ Sorry for not complaining about this before, but I was on vacation
when the previous naming discussion went on. ]I can live with that, as long as it's clearly explained in the docs.
+1
+1
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
Hello
2010/8/9 Tom Lane <tgl@sss.pgh.pa.us>:
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Looking at this, I want to question the implode/explode naming. I think
those names are too cute by half, not particularly mnemonic, not visibly
related to the similar existing functions, and not friendly to any
future extension in the same area.My first thought is that we should go back to the string_to_array and
array_to_string names. The key reason not to use those names was the
conflict with the old functions if you didn't specify a third argument,
but where is the advantage of not specifying the third argument? It
would be a lot simpler for people to understand if we just said "the
two-argument forms work like this, while the three-argument forms work
like that". This is especially reasonable because the difference in
behavior is about nulls in the array, which is exactly what the third
argument exists to specify.
The name isn't important - I believe so you or Robert can choose the
best name. Important is default behave. On an start is idea, so
functions that lost some information isn't optimal - and it is
array_to_string problem - because this function quietly skip NULL
fields, if there are. So it was a motivation to write these functions.
Regards
Pavel Stehule
Show quoted text
[ Sorry for not complaining about this before, but I was on vacation
when the previous naming discussion went on. ]regards, tom lane
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Applied, with the discussed changes and some code editing.
regards, tom lane
2010/8/10 Tom Lane <tgl@sss.pgh.pa.us>:
Brendan Jurd <direvus@gmail.com> writes:
I have attached v4 of the patch against HEAD, and also an incremental
patch showing just my changes against v3.I'll mark this as ready for committer.
Applied, with the discussed changes and some code editing.
regards, tom lane
Thank you very much
Regards
Pavel Stehule