Supporting NULL elements in arrays

Started by Tom Laneabout 20 years ago22 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I'm starting to think about what it'll take to allow arrays to contain
elements that are NULL. The behavioral semantics seem pretty
straightforward, but there are a couple of areas that need discussion.

One trouble spot is what the I/O representation should look like.
Since 8.0, the array input parser has rejected empty elements:

regression=# select '{a,,c}'::text[];
ERROR: malformed array literal: "{a,,c}"
regression=# select '{a, ,c}'::text[];
ERROR: malformed array literal: "{a, ,c}"
-- the right way to specify a zero-length string is:
regression=# select '{a,"",c}'::text[];
text
----------
{a,"",c}
(1 row)

and so the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

1. "Since 8.0" isn't really old enough --- there may well be applications
still out there that think '{a,,c}'::text[] should produce a zero-length
string element and not a NULL element. (Note: this isn't a hazard for
reloading old dump files, because the array output routine has dumped
empty strings as "" since 7.0 if not before.)

2. Even today, the array documentation at
http://developer.postgresql.org/docs/postgres/arrays.html
doesn't actually *say* that empty elements are disallowed.

I don't see any alternatives that seem better, though, and the
empty-element convention at least has the virtue of being reasonably
compatible with what we did for I/O of composite data types. Anyone
have a better idea?

Another issue is what to do with the internal representation. I think
that basically we want to insert a null bitmap just following the
dimension info, with the option to omit the bitmap if there are no
null values. Now, when the bitmap is present, it would be fairly tedious
to calculate the length of the bitmap to determine the offset to the
actual data; and that's an operation that we'd like to be cheap. What
I'm thinking of doing is commandeering the whole "flags" field of
ArrayType (which is currently unused and should be always zero), and
redefining it thus:
zero: bitmap not present (fully backwards-compatible case)
not zero: bitmap is present; value is offset to start of data
I wouldn't propose doing this if I thought we had any pressing reason
to save some array flag bits for some other purpose; but I can't think
of anything else we might want 'em for. Did anyone have any pet ideas
this would foreclose?

Also, with respect to the binary I/O representation, I'd suggest replacing
the "flag" word with "zero if no bitmap, 1 if bitmap present". This would
require both sender and receiver to calculate the bitmap length from the
given dimensions, but it seems best to require that for error-checking
reasons.

Thoughts anyone?

regards, tom lane

#2Pavel Stehule
pavel.stehule@hotmail.com
In reply to: Tom Lane (#1)
Re: Supporting NULL elements in arrays

Hello,

it's great news. My personal opinion about formating NULL values
'{a,"",b} -- wrong, "" means empty string, isn't null
'{a,,b} ' -- ok, maybe not unique,
'{a, NULL, b}' -- longer, clean NULL is NULL
'{a, "NULL", b}' -- "NULL" is not null varchar 'NULL'

Flags for array? Maybe bit isHash? So, hash array can be in line

array->null array->hash array

or flag for nor regular array (sparse array), not type unique arrays (array
can contains different types)

Regards
Pavel Stehule

_________________________________________________________________
Citite se osamele? Poznejte nekoho vyjmecneho diky Match.com.
http://www.msn.cz/

#3Pollard, Mike
mpollard@cincom.com
In reply to: Pavel Stehule (#2)
Re: Supporting NULL elements in arrays

and so the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of

grounds:

Can you use a default to allow the user to specify the default value for
an element? May look a little strange, though, if the user specifies a
default array and a default element value, like:

CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL);

So the first default is the array default; specify NULL if you don't
want one but do want an array element default; the second, if present,
is the array element default. I'm not sure I like this or not, but it's
an idea.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#2)
Re: Supporting NULL elements in arrays

"Pavel Stehule" <pavel.stehule@hotmail.com> writes:

it's great news. My personal opinion about formating NULL values
'{a, NULL, b}' -- longer, clean NULL is NULL

Unfortunately, that already has a meaning, and it's not that.

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
Re: Supporting NULL elements in arrays

Tom Lane wrote:

"Pavel Stehule" <pavel.stehule@hotmail.com> writes:

it's great news. My personal opinion about formating NULL values
'{a, NULL, b}' -- longer, clean NULL is NULL

Unfortunately, that already has a meaning, and it's not that.

What a pity. I don't see any alternative to the empty element proposal -
it's worth the possible breakage.

cheers

andrew

#6Pollard, Mike
mpollard@cincom.com
In reply to: Andrew Dunstan (#5)
Re: Supporting NULL elements in arrays

and so the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of

grounds:

Can you use a default to allow the user to specify the default value for
an element? May look a little strange, though, if the user specifies a
default array and a default element value, like:

CREATE TABLE ARR(ARR CHAR(30)[] DEFAULT '{"hello", "good bye"}' NULL);

So the first default is the array default; specify NULL if you don't
want one but do want an array element default; the second, if present,
is the array element default. I'm not sure I like this or not, but it's
an idea.

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc

#7Harald Fuchs
hf0923x@protecting.net
In reply to: Pavel Stehule (#2)
Re: Supporting NULL elements in arrays

In article <4370B990.7090100@dunslane.net>,
Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

"Pavel Stehule" <pavel.stehule@hotmail.com> writes:

it's great news. My personal opinion about formating NULL values
'{a, NULL, b}' -- longer, clean NULL is NULL

Unfortunately, that already has a meaning, and it's not that.

What a pity. I don't see any alternative to the empty element proposal
-
it's worth the possible breakage.

How about '{a, \N, b}'?

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pollard, Mike (#6)
Re: Supporting NULL elements in arrays

"Pollard, Mike" <mpollard@cincom.com> writes:

Can you use a default to allow the user to specify the default value for
an element?

There's no mechanism for specifying a default value for individual
elements within an array; and I've never heard one asked for. Certainly
it's far less interesting than allowing nulls within an array. So
I don't think we need to give pride of syntax place to that.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Harald Fuchs (#7)
Re: Supporting NULL elements in arrays

Harald Fuchs <hf0923x@protecting.net> writes:

Tom Lane wrote:

Unfortunately, that already has a meaning, and it's not that.

How about '{a, \N, b}'?

That's valid syntax too, ie, adopting that syntax would break
applications that are not broken today.

Not to mention that it would be gratuitously different from the notation
for NULLs in composite-type literals.

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Supporting NULL elements in arrays

I wrote:

... the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

For backwards compatibility it should mean an empty array, but then
there's no way to represent ARRAY(NULL) in data dumps, which won't
do either.

The only workaround that comes to mind is to allow explicit
specification of what's meant: '[1:1]{}' would be needed to represent
the one-null case. Ugly.

Ideas anyone?

regards, tom lane

#11Noname
mark@mark.mielke.cc
In reply to: Tom Lane (#10)
Re: Supporting NULL elements in arrays

On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:

I wrote:

... the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

To take another perspective on this, though - it isn't possible to have
NULL array elements right now, therefore, there is no precident, and who
is to say that {NULL} <> {}?

For example:

vhosts=> select ('{1, 3}'::text[])[4];
text
------

(1 row)

vhosts=> select ('{}'::text[])[4];
text
------

(1 row)

Perhaps NULL at end of array never needs to be stored, and arrays can
be considered to have an infinite number of NULL values at the end?

For array operations that require a length, such as cross-product, or
whatever, the 'length' of the array, would be the number of elements
before the infinite number of NULL values at the end.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#12Jim C. Nasby
jnasby@pervasive.com
In reply to: Tom Lane (#10)
Re: Supporting NULL elements in arrays

On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:

I wrote:

... the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

For backwards compatibility it should mean an empty array, but then
there's no way to represent ARRAY(NULL) in data dumps, which won't
do either.

The only workaround that comes to mind is to allow explicit
specification of what's meant: '[1:1]{}' would be needed to represent
the one-null case. Ugly.

Instead of bending over backwards to try and support older cases, would
a compatability mode be possible? Seems that would solve a lot of
problems.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#13Joe Conway
mail@joeconway.com
In reply to: Jim C. Nasby (#12)
Re: Supporting NULL elements in arrays

Jim C. Nasby wrote:

On Tue, Nov 08, 2005 at 07:21:34PM -0500, Tom Lane wrote:

... the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

For backwards compatibility it should mean an empty array, but then
there's no way to represent ARRAY(NULL) in data dumps, which won't
do either.

The only workaround that comes to mind is to allow explicit
specification of what's meant: '[1:1]{}' would be needed to represent
the one-null case. Ugly.

Instead of bending over backwards to try and support older cases, would
a compatability mode be possible? Seems that would solve a lot of
problems.

Last time I thought about this problem, that's what I concluded. I don't
think there is a reasonable and backward compatible solution.

I also think the best non-compatible solution is to require non-numeric
elements to be delimited (double quotes, configurable?), and use NULL
unadorned to represent NULL.

Joe

#14Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Joe Conway (#13)
Re: Supporting NULL elements in arrays

I also think the best non-compatible solution is to require non-numeric
elements to be delimited (double quotes, configurable?), and use NULL
unadorned to represent NULL.

I think the ultimate solution should have null values represented by
NULL... I mean NULL is NULL :)

Chris

#15Marc Munro
marc@bloodnok.com
In reply to: Christopher Kings-Lynne (#14)
Re: Supporting NULL elements in arrays

How about representing null array elements with a special-case cast-like
null? Something like {::null}

__
Marc

On Tue, 2005-11-08 at 23:30 -0400, pgsql-hackers-owner@postgresql.org
wrote:

Show quoted text

Date: Tue, 08 Nov 2005 19:21:34 -0500
From: Tom Lane <tgl@sss.pgh.pa.us>
To: pgsql-hackers@postgreSQL.org
Subject: Re: Supporting NULL elements in arrays
Message-ID: <10097.1131495694@sss.pgh.pa.us>

I wrote:

... the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of

grounds:

I just thought of another, potentially fatal objection: it's ambiguous
whether '{}'::text[] should be taken to mean an empty (zero-length)
array or an array containing a single NULL element.

For backwards compatibility it should mean an empty array, but then
there's no way to represent ARRAY(NULL) in data dumps, which won't
do either.

The only workaround that comes to mind is to allow explicit
specification of what's meant: '[1:1]{}' would be needed to represent
the one-null case. Ugly.

Ideas anyone?

regards, tom lane

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marc Munro (#15)
Re: Supporting NULL elements in arrays

Marc Munro <marc@bloodnok.com> writes:

How about representing null array elements with a special-case cast-like
null? Something like {::null}

The problem with *any* proposal for let's-spell-NULL-like-this
is that "this" is inevitably a string that is valid as a text
data value. Making the string weirder may reduce the odds of
a failure but the risk is still there, and meanwhile you're making
the null marker less readable. (The above has the additional problem
of being indistinguishable from a sub-array.)

I think we really need something that is rejected as syntactically
invalid by the existing array input parser.

regards, tom lane

#17Noname
mark@mark.mielke.cc
In reply to: Tom Lane (#16)
Re: Supporting NULL elements in arrays

On Wed, Nov 09, 2005 at 11:22:37AM -0500, Tom Lane wrote:

Marc Munro <marc@bloodnok.com> writes:

How about representing null array elements with a special-case cast-like
null? Something like {::null}

The problem with *any* proposal for let's-spell-NULL-like-this
is that "this" is inevitably a string that is valid as a text
data value. Making the string weirder may reduce the odds of
a failure but the risk is still there, and meanwhile you're making
the null marker less readable. (The above has the additional problem
of being indistinguishable from a sub-array.)

I think we really need something that is rejected as syntactically
invalid by the existing array input parser.

Hehehe... Feeling silly here:

{1,2,}NULL{,4,5,}NULL

What are the feelings on representing it similar to how it would be
in memory?

{1,2,,4,5,}(,,NULL,,,NULL)

Then, there is always the possibility of including a new array text format
for arrays that contain nulls (Q for quote?):

Q{1,2,NULL,4,5,NULL}

This would activate a mode that would fully quote any non-numeric,
non-array arguments, allowing NULL to be unambiguous?

Q{'a','b',NULL,'c','d',NULL}

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#18Ron Mayer
rm_pg@cheapcomplexdevices.com
In reply to: Joe Conway (#13)
Re: Supporting NULL elements in arrays

Joe Conway wrote:

Last time I thought about this problem, that's what I concluded. I don't
think there is a reasonable and backward compatible solution.

I also think the best non-compatible solution is to require non-numeric
elements to be delimited (double quotes, configurable?), and use NULL
unadorned to represent NULL.

If we're going non-computable, would something that's a superset
of the SQL Standard's array value constructor be useful; or is
the standard's arrays so limited as to not even address the
things that cause issues for postgresql arrays?

If I read the confusing thing right, I think the standard
does use NULL for nulls in arrays, single-quotes for strings,
etc. like "ARRAY['FOO',null,'BAR']" and unadorned numbers
for numbers in arrays. That's similar to what I think Joe
suggested, but with single rather than double quotes?

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ron Mayer (#18)
Re: Supporting NULL elements in arrays

Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:

If we're going non-computable, would something that's a superset
of the SQL Standard's array value constructor be useful;

I think you've confused ARRAY[] expressions with the I/O representation
for array values. There are many good reasons why those are not
identical, starting with the fact that string literals inside an ARRAY[]
would be subject to different escaping conventions than an I/O value is.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#13)
Re: Supporting NULL elements in arrays

Joe Conway <mail@joeconway.com> writes:

Jim C. Nasby wrote:

Instead of bending over backwards to try and support older cases, would
a compatability mode be possible? Seems that would solve a lot of
problems.

Last time I thought about this problem, that's what I concluded. I don't
think there is a reasonable and backward compatible solution.

I also think the best non-compatible solution is to require non-numeric
elements to be delimited (double quotes, configurable?), and use NULL
unadorned to represent NULL.

After further thought I'm starting to agree with this point of view as
well. I propose the following details:

1. A null element is represented as the unquoted string NULL
(case-insensitive on input). Any use of quotes or backslashes
turns it into a simple string value "NULL" instead. array_out
will need to be careful to quote any string that matches NULL.

2. For backwards compatibility, we'll invent a GUC parameter
enable_array_nulls that defeats recognition of NULL in array_in.
(Any better ideas about the name of the parameter?)

This isn't ideal because of the compatibility risk, but once we get past
the transition period it's a reasonable definition.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Supporting NULL elements in arrays

I wrote:

... Now, when the bitmap is present, it would be fairly tedious
to calculate the length of the bitmap to determine the offset to the
actual data; and that's an operation that we'd like to be cheap. What
I'm thinking of doing is commandeering the whole "flags" field of
ArrayType (which is currently unused and should be always zero), and
redefining it thus:
zero: bitmap not present (fully backwards-compatible case)
not zero: bitmap is present; value is offset to start of data
I wouldn't propose doing this if I thought we had any pressing reason
to save some array flag bits for some other purpose; but I can't think
of anything else we might want 'em for. Did anyone have any pet ideas
this would foreclose?

On trying to recompile things, I find that contrib/intarray is broken
by this change, because it's using the flags field for its own purposes:

/*
* flags for gist__int_ops, use ArrayType->flags
* which is unused (see array.h)
*/
#define LEAFKEY (1<<31)
#define ISLEAFKEY(x) ( ((ArrayType*)(x))->flags & LEAFKEY )

It seems likely that intarray is going to need some rather significant
work anyway to deal with null elements, so this seems to me to be not
necessarily a fatal objection. But why exactly does intarray need to
play games with the contents of an array value?

regards, tom lane

#22Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#21)
Re: Supporting NULL elements in arrays

On trying to recompile things, I find that contrib/intarray is broken
by this change, because it's using the flags field for its own purposes:

/*
* flags for gist__int_ops, use ArrayType->flags
* which is unused (see array.h)
*/
#define LEAFKEY (1<<31)
#define ISLEAFKEY(x) ( ((ArrayType*)(x))->flags & LEAFKEY )

It seems likely that intarray is going to need some rather significant
work anyway to deal with null elements, so this seems to me to be not
necessarily a fatal objection. But why exactly does intarray need to
play games with the contents of an array value?

Sorry, intarray was first our module for PgSQL. I'll remove usage of
ArrayType->flags soon.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/