what is good solution for support NULL inside string_to_array function?

Started by Pavel Stehuleover 15 years ago10 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I understand why we don't support expression 'null'::sometype. But it
does problems with array deserialisation.

postgres=# select array_to_string(ARRAY[10,20,30,NULL,30], '|');
array_to_string
-----------------
10|20|30|30
(1 row)

quietly removing NULL is maybe good for compatibility but is wrong for
functionality. Can we enhance function array_to_string and
string_to_array like:

CREATE OR REPLACE FUNCTION array_to_string(dta anyarray, sep text,
nullsym text)
RETURNS text AS $$
SELECT array_to_string(ARRAY(SELECT coalesce(v::text,$3)
FROM unnest($1) g(v)),$2)
$$ LANGUAGE sql;
CREATE FUNCTION
Time: 231.445 ms
postgres=# select array_to_string(ARRAY[10,20,30,NULL,30], '|', '');
array_to_string
-----------------
10|20|30||30
(1 row)

Time: 230.879 ms
postgres=# select array_to_string(ARRAY[10,20,30,NULL,30], '|', 'NULL');
array_to_string
------------------
10|20|30|NULL|30
(1 row)

Time: 2.031 ms

CREATE OR REPLACE FUNCTION string_to_array(str text, sep text, nullsym text)
RETURNS text[] AS $$
SELECT ARRAY(SELECT CASE WHEN v <> $3 THEN v ELSE NULL END
FROM unnest(string_to_array($1,$2)) g(v))
$$ LANGUAGE sql;
CREATE FUNCTION
Time: 29.044 ms

postgres=# SELECT string_to_array('10,20,30,,40',',','');
string_to_array
--------------------
{10,20,30,NULL,40}
(1 row)

postgres=# SELECT string_to_array('10,20,30,,40',',','')::int[];
string_to_array
--------------------
{10,20,30,NULL,40}
(1 row)

it is correct?

other ideas?

Regards
Pavel Stehule

#2Josh Berkus
josh@agliodbs.com
In reply to: Pavel Stehule (#1)
Re: what is good solution for support NULL inside string_to_array function?

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree. I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#2)
Re: what is good solution for support NULL inside string_to_array function?

Josh Berkus <josh@agliodbs.com> writes:

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree. I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

That's just as broken as Pavel's suggestion. Unless you have something
that is guaranteed distingishable from the output of any non-null value,
you really can't make a significant improvement here.

regards, tom lane

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#3)
Re: what is good solution for support NULL inside string_to_array function?

Tom Lane wrote:

Default display for NULL should be a zero-length string.

That's just as broken as Pavel's suggestion. Unless you have something
that is guaranteed distingishable from the output of any non-null value,
you really can't make a significant improvement here.

Right. This is the problem we solved in CSV processing by distinguishing
between quoted and unquoted values that could be null - the CSV rule is
that a null value isn't quoted.

cheers

andrew

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#3)
Re: what is good solution for support NULL inside string_to_array function?

2010/5/4 Tom Lane <tgl@sss.pgh.pa.us>:

Josh Berkus <josh@agliodbs.com> writes:

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree.  I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

That's just as broken as Pavel's suggestion.  Unless you have something
that is guaranteed distingishable from the output of any non-null value,
you really can't make a significant improvement here.

I wouldn't modify current two params string_to_array and
array_to_string function. So there are not any default string (maybe
empty string) for NULL. My proposal is new three params functions with

explicit<<< "null string" definition. This cannot break

compatibility and enhance functionality - It is just short cut for
code from my proposal - in C this functionality can by implemented
much faster.

Regards
Pavel

Show quoted text

                       regards, tom lane

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Josh Berkus (#2)
Re: what is good solution for support NULL inside string_to_array function?

2010/5/4 Josh Berkus <josh@agliodbs.com>:

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree.  I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

I disagree - NULL is NULL, not empty string (Oracle is different)

if array_to_string is equivalent to

x[1] || sep || x[2] || sep || x[3] || sep ....

then correct result is NULL

and then string_to_array and array_to_string are correct, because
string_to_array cannot contain any NULL symbol.

Regards
Pavel Stehule

Show quoted text

--
                                 -- Josh Berkus
                                    PostgreSQL Experts Inc.
                                    http://www.pgexperts.com

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#5)
1 attachment(s)
Re: what is good solution for support NULL inside string_to_array function?

2010/5/4 Pavel Stehule <pavel.stehule@gmail.com>:

2010/5/4 Tom Lane <tgl@sss.pgh.pa.us>:

Josh Berkus <josh@agliodbs.com> writes:

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree.  I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

That's just as broken as Pavel's suggestion.  Unless you have something
that is guaranteed distingishable from the output of any non-null value,
you really can't make a significant improvement here.

I wouldn't modify current two params string_to_array and
array_to_string function. So there are not any default string (maybe
empty string) for NULL. My proposal is new three params functions with

explicit<<< "null string" definition. This cannot break

compatibility and enhance functionality - It is just short cut for
code from my proposal - in C this functionality can by implemented
much faster.

I did some coding - the patch can be very simple

postgres=# select array_to_string(array[1,2,3,4,5,null],',','*');
array_to_string
-----------------
1,2,3,4,5,*
(1 row)

Time: 0,501 ms
postgres=# select
string_to_array(array_to_string(array[1,2,3,4,5,null],',','*'),',','*');
string_to_array
------------------
{1,2,3,4,5,NULL}
(1 row)

Time: 0,617 ms

postgres=# select string_to_array('1,2,3,4,5,*',',','*')::int[];
string_to_array
------------------
{1,2,3,4,5,NULL}
(1 row)

Time: 0,652 ms

and then string_to_array and array_to_string are orthogonal with NULL.

Pavel

Show quoted text

Regards
Pavel

                       regards, tom lane

Attachments:

3params_arrayfce.diffapplication/octet-stream; name=3params_arrayfce.diffDownload
*** ./src/backend/utils/adt/varlena.c.orig	2010-02-26 03:01:10.000000000 +0100
--- ./src/backend/utils/adt/varlena.c	2010-05-04 15:55:45.677424338 +0200
***************
*** 2985,2990 ****
--- 2985,2992 ----
  	char	   *start_ptr;
  	text	   *result_text;
  	ArrayBuildState *astate = NULL;
+ 	text			*null_sym;
+ 	bool	isnull;
  
  	text_position_setup(inputstring, fldsep, &state);
  
***************
*** 3012,3017 ****
--- 3014,3033 ----
  		PG_RETURN_ARRAYTYPE_P(create_singleton_array(fcinfo, TEXTOID,
  										   PointerGetDatum(inputstring), 1));
  	}
+ 	
+ 	/*
+ 	 * Third argument (if exists is text used instead NULL value.
+ 	 */
+ 	if (PG_NARGS() > 2)
+ 	{
+ 		if (PG_ARGISNULL(2))
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 					 errmsg("text for NULL substitution cannot be NULL")));
+ 		null_sym = PG_GETARG_TEXT_PP(2);
+ 	}
+ 	else
+ 		null_sym = NULL;
  
  	start_posn = 1;
  	/* start_ptr points to the start_posn'th character of inputstring */
***************
*** 3036,3046 ****
  
  		/* must build a temp text datum to pass to accumArrayResult */
  		result_text = cstring_to_text_with_len(start_ptr, chunk_len);
  
  		/* stash away this field */
  		astate = accumArrayResult(astate,
  								  PointerGetDatum(result_text),
! 								  false,
  								  TEXTOID,
  								  CurrentMemoryContext);
  
--- 3052,3073 ----
  
  		/* must build a temp text datum to pass to accumArrayResult */
  		result_text = cstring_to_text_with_len(start_ptr, chunk_len);
+ 		
+ 		/* check if this text is equal to null symbol */
+ 		if (null_sym != NULL)
+ 		{
+ 			
+ 			isnull = DatumGetBool(DirectFunctionCall2(texteq,
+ 										PointerGetDatum(null_sym),
+ 										PointerGetDatum(result_text)));
+ 		}
+ 		else
+ 			isnull = false;
  
  		/* stash away this field */
  		astate = accumArrayResult(astate,
  								  PointerGetDatum(result_text),
! 								  isnull,
  								  TEXTOID,
  								  CurrentMemoryContext);
  
***************
*** 3062,3067 ****
--- 3089,3103 ----
  }
  
  /*
+  * Just wrapper for three params variant string_to_array function
+  */
+ Datum
+ text_to_array_with_nullsym(PG_FUNCTION_ARGS)
+ {
+ 	return text_to_array(fcinfo);
+ }
+ 
+ /*
   * array_to_text
   * concatenate Cstring representation of input array elements
   * using provided field separator
***************
*** 3085,3090 ****
--- 3121,3127 ----
  	int			bitmask;
  	int			i;
  	ArrayMetaState *my_extra;
+ 	char		*null_sym;
  
  	ndims = ARR_NDIM(v);
  	dims = ARR_DIMS(v);
***************
*** 3132,3137 ****
--- 3169,3185 ----
  	bitmap = ARR_NULLBITMAP(v);
  	bitmask = 1;
  
+ 	if (PG_NARGS() > 2)
+ 	{
+ 		if (PG_ARGISNULL(2))
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ 					 errmsg("text for NULL substitution cannot be NULL")));
+ 		null_sym = text_to_cstring(PG_GETARG_TEXT_PP(2));
+ 	}
+ 	else
+ 		null_sym = NULL;
+ 
  	for (i = 0; i < nitems; i++)
  	{
  		Datum		itemvalue;
***************
*** 3140,3146 ****
  		/* Get source element, checking for NULL */
  		if (bitmap && (*bitmap & bitmask) == 0)
  		{
! 			/* we ignore nulls */
  		}
  		else
  		{
--- 3188,3201 ----
  		/* Get source element, checking for NULL */
  		if (bitmap && (*bitmap & bitmask) == 0)
  		{
! 			if (null_sym != NULL)
! 			{
! 				if (printed)
! 					appendStringInfo(&buf, "%s%s", fldsep, null_sym);
! 				else
! 					appendStringInfoString(&buf, null_sym);
! 				printed = true;
! 			}
  		}
  		else
  		{
***************
*** 3173,3178 ****
--- 3228,3243 ----
  	PG_RETURN_TEXT_P(cstring_to_text_with_len(buf.data, buf.len));
  }
  
+ /*
+  * Just only wrapper for three params array_to_string function.
+  */
+ Datum
+ array_to_text_with_nullsym(PG_FUNCTION_ARGS)
+ {
+ 	return array_to_text_with_nullsym(fcinfo);
+ }
+ 
+ 
  #define HEXBASE 16
  /*
   * Convert a int32 to a string containing a base 16 (hex) representation of
#8Merlin Moncure
mmoncure@gmail.com
In reply to: Pavel Stehule (#7)
Re: what is good solution for support NULL inside string_to_array function?

On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

and then string_to_array and array_to_string are orthogonal with NULL.

I like the behavior, but should it share the name with the 2 argument
version given the incompatibility? Maybe make a new function
to_string(anyarray, sep, nullsym='') and deprecate the old one?

merlin

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Merlin Moncure (#8)
Re: what is good solution for support NULL inside string_to_array function?

2010/5/4 Merlin Moncure <mmoncure@gmail.com>:

On Tue, May 4, 2010 at 10:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

and then string_to_array and array_to_string are orthogonal with NULL.

I like the behavior, but should it share the name with the 2 argument
version given the incompatibility? Maybe make a new function
to_string(anyarray, sep, nullsym='') and deprecate the old one?

maybe to_string X to_array ... Why not? It shorter, maybe it is cleaner

Regards
Pavel

Show quoted text

merlin

#10Steve Crawford
scrawford@pinpointresearch.com
In reply to: Tom Lane (#3)
Re: what is good solution for support NULL inside string_to_array function?

Tom Lane wrote:

Josh Berkus <josh@agliodbs.com> writes:

quietly removing NULL is maybe good for compatibility but is wrong for
functionality.

I agree. I wasn't aware of this little misfeature.

Default display for NULL should be a zero-length string.

That's just as broken as Pavel's suggestion. Unless you have something
that is guaranteed distingishable from the output of any non-null value,
you really can't make a significant improvement here.

regards, tom lane

Is this, perhaps, a generalized case of this long-running discussion
from last year?:
http://archives.postgresql.org/pgsql-hackers/2009-03/msg01350.php

Cheers,
Steve