[PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
On 17 March 2013 05:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brendan Jurd <direvus@gmail.com> writes:
On 16 March 2013 09:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The thing is that that syntax creates an array of zero dimensions,
not one that has 1 dimension and zero elements.I'm going to ask the question that immediately comes to mind: Is there
anything good at all about being able to define a zero-dimensional
array?Perhaps not. I think for most uses, a 1-D zero-length array would be
just as good. I guess what I'd want to know is whether we also need
to support higher-dimensional zero-size arrays, and if so, what does
the I/O syntax for those look like?Another fly in the ointment is that if we do redefine '{}' as meaning
something other than a zero-D array, how will we handle existing
database entries that are zero-D arrays?
Hello hackers,
I submit a patch to rectify the weird and confusing quirk of Postgres
to use "zero dimensions" to signify an empty array.
Rationale:
The concept of an array without dimensions is a) incoherent, and b)
leads to astonishing results from our suite of user-facing array
functions. array_length, array_dims, array_ndims, array_lower and
array_upper all return NULL when presented such an array.
When a user writes ARRAY[]::int[], they expect to get an empty array,
but instead we've been giving them a bizarre semi-functional
proto-array. Not cool.
Approach:
The patch teaches postgres to regard zero as an invalid number of
dimensions (which it very much is), and allows instead for fully armed
and operational empty arrays.
The simplest form of empty array is the 1-D empty array (ARRAY[] or
'{}') but the patch also allows for empty arrays over multiple
dimensions, meaning that the final dimension has a length of zero, but
the other dimensions may have any valid length. For example,
'{{},{},{}}' is a 2-D empty array with dimension lengths {3,0} and
dimension bounds [1:3][1:0].
An empty array dimension may have any valid lower bound, but by
default the lower bound will be 1 and the upper bound will therefore
be 0.
Any zero-D arrays read in from existing database entries will be
output as 1-D empty arrays from array_recv.
There is an existing limitation where you cannot extend a multi-D
array by assigning values to subscripts or slices. I've made no
attempt to address that limitation.
The patch improves some error reporting, particularly by adding
errdetail messages for ereports of problems parsing a curly-brace
array literal.
There is some miscellaneous code cleanup included; I moved the
hardcoded characters '{', '}', etc. into constants, unwound a
superfluous inner loop from ArrayCount, and corrected some mistakes in
code comments and error texts. If preferred for reviewing, I can
split those changes (and/or the new errdetails) out into a separate
patch.
Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One can
future-proof against this by changing e.g.
IF array_length(arr, 1) IS NULL ...
to
IF coalesce(array_length(arr, 1), 0) = 0 ...
There is also a change in the way array subscript assignment works.
Previously it wasn't possible to make a distinction between assigning
to an empty array and assigning to a NULL, so in either case an
expression like "arr[4] := 1" would create "[4:4]={1}". Now there is
a distinction. If you assign to an empty array you will get "{NULL,
NULL, NULL, 1}", whereas if you assign to a NULL you'll get
"[4:4]={1}".
Regression tests:
The regression tests were updated to reflect behavioural changes.
Documentation:
A minor update to the prose in chapter 8.15 is included in the patch.
Issues:
Fixed-length arrays (like oidvector) are zero-based, which means that
they are rendered into string form with their dimension info shown.
So an empty oidvector now renders as "[0:-1]={}", which is correct but
ugly. I'm not delighted with this side-effect but I'm not sure what
can be done about it. I'm open to ideas.
Diffstat:
doc/src/sgml/array.sgml | 4 +-
src/backend/executor/execQual.c | 77 +++++-
src/backend/utils/adt/array_userfuncs.c | 23 +-
src/backend/utils/adt/arrayfuncs.c | 671
+++++++++++++++++++++++++----------------------
src/backend/utils/adt/arrayutils.c | 4 +
src/backend/utils/adt/xml.c | 19 +-
src/include/c.h | 1 +
src/include/utils/array.h | 4 +
src/pl/plpython/plpy_typeio.c | 3 -
src/test/isolation/expected/timeouts.out | 16 +-
src/test/isolation/specs/timeouts.spec | 8 +-
src/test/regress/expected/arrays.out | 55 ++--
src/test/regress/expected/create_function_3.out | 2 +-
src/test/regress/expected/polymorphism.out | 2 +-
src/test/regress/sql/arrays.sql | 6 +-
15 files changed, 519 insertions(+), 376 deletions(-)
I'll add this to the Commit Fest app. It is intended to resolve TODO
item "Improve handling of empty arrays".
Cheers,
BJ
Attachments:
On Mar 20, 2013, at 4:45 PM, Brendan Jurd <direvus@gmail.com> wrote:
I submit a patch to rectify the weird and confusing quirk of Postgres
to use "zero dimensions" to signify an empty array.
Epic. Thank you. I’m very glad now that I complained about this (again)!
Best,
David
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/20/2013 04:45 PM, Brendan Jurd wrote:
Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One can
As a heavy user of arrays, I support this patch as being worth the
breakage of backwards compatibility. However, that means it certainly
will need to wait for 9.4 to provide adequate warning.
Brendan, how hard would it be to create a GUC for backwards-compatible
behavior?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25 March 2013 13:02, Josh Berkus <josh@agliodbs.com> wrote:
On 03/20/2013 04:45 PM, Brendan Jurd wrote:
Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One canAs a heavy user of arrays, I support this patch as being worth the
breakage of backwards compatibility. However, that means it certainly
will need to wait for 9.4 to provide adequate warning.
Thanks Josh, I appreciate the support.
Brendan, how hard would it be to create a GUC for backwards-compatible
behavior?
Good idea. I don't know how hard it would be (never messed with GUCs
before), but I'll take a swing at it and see how it works out.
Cheers,
BJ
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Brendan Jurd <direvus@gmail.com> writes:
On 25 March 2013 13:02, Josh Berkus <josh@agliodbs.com> wrote:
Brendan, how hard would it be to create a GUC for backwards-compatible
behavior?
Good idea.
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 March 2013 00:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Brendan Jurd <direvus@gmail.com> writes:
On 25 March 2013 13:02, Josh Berkus <josh@agliodbs.com> wrote:
Brendan, how hard would it be to create a GUC for backwards-compatible
behavior?Good idea.
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.
They are? Well okay then. I'm not deeply attached to the GUC thing,
it just seemed like a friendly way to ease the upgrade path. But if
it's dangerous somehow I'm happy to drop it.
Cheers,
BJ
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Brendan Jurd <direvus@gmail.com> writes:
On 26 March 2013 00:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.
They are?
Yeah, they are, because things break when they're set wrong.
Sometimes we have to have one anyway, but AFAICS this patch is just
proposing to change corner cases that most apps don't hit. I think
we'd be better off without a knob for it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 21 March 2013 10:45, Brendan Jurd <direvus@gmail.com> wrote:
src/test/isolation/expected/timeouts.out | 16 +-
src/test/isolation/specs/timeouts.spec | 8 +-
Oops, looks like some unrelated changes made their way into the
original patch. Apologies. Here's a -v2 patch, sans stowaways.
Cheers,
BJ
Attachments:
Tom,
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.
Really? I thought that standard_conforming_strings was a great example
of how to ease our users into a backwards-compatibility break. My
thought was that we change the behavior in 9.4, provide a
backwards-compatible GUC with warnings in the logs for two versions, and
then take the GUC away.
Whether that's worth the effort, though, is a question of how many
people are going to be affected by the compatibility break. Maybe I can
do a survey.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 25, 2013 at 5:14 PM, Josh Berkus <josh@agliodbs.com> wrote:
Tom,
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.Really? I thought that standard_conforming_strings was a great example
of how to ease our users into a backwards-compatibility break.
It is, but it was made in full knowledge of the risks and costs.
What do you do if you're given two modules that have opposite
expectations for this variable? One module or the other might have
hidden unexpected bugs or even security holes if the variable is set
in a way it doesn't expect. You have to pick one value for the whole
database.
I'm not as sanguine as Tom is about how likely these corner cases will
be met actually. As far as I can tell checking IS NULL on
array_length() was the supported way to check for 0-length arrays
previously so I suspect a lot of applications will need to be updated.
But it's not a hard update to do and is the kind of update that's
often needed on major database upgrades.
At least if it's a clean break then everyone on 9.3 knows they need to
do IS NULL and everyone on 9.4 knows they need to check for 0. If
there's a GUC then people need to code defensively without knowing
which semantics they'll get. And if they test they need to test twice
under both settings just in case the user's database has the other
setting.
--
greg
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 March 2013 05:26, Greg Stark <stark@mit.edu> wrote:
I'm not as sanguine as Tom is about how likely these corner cases will
be met actually. As far as I can tell checking IS NULL on
array_length() was the supported way to check for 0-length arrays
previously
Correct. There was no other way to check for empty arrays, because
every empty array was zero-D, and zero-D returns NULL from all array
interrogation functions, even array_ndims (which is totally bonkers).
At least if it's a clean break then everyone on 9.3 knows they need to
do IS NULL and everyone on 9.4 knows they need to check for 0.
And, as I pointed out in at the top of the thread, you can write
"coalesce(array_length(a,1), 0) = 0" if you want to be confident your
code will continue work in either case, and for some folks I expect
that will be the least painful way to upgrade from 9.3 -> 9.4.
Cheers,
BJ
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/25/2013 10:28 PM, Tom Lane wrote:
Yeah, they are, because things break when they're set wrong.
They also make debugging and support harder; you need to get an
ever-growing list of GUC values from the user to figure out what their
query does. bytea_output, standard_conforming_strings, etc. Yick.
That said, I don't have a better answer for introducing non-BC changes.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013.03.25 5:55 PM, Craig Ringer wrote:
On 03/25/2013 10:28 PM, Tom Lane wrote:
Yeah, they are, because things break when they're set wrong.
They also make debugging and support harder; you need to get an
ever-growing list of GUC values from the user to figure out what their
query does. bytea_output, standard_conforming_strings, etc. Yick.That said, I don't have a better answer for introducing non-BC changes.
Given the general trouble GUC values cause, is there a plan to deprecate and
remove each of the existing ones over time? As long as post-removal there isn't
any actual loss of functionality, but users might have to change their code to
do it "the one true way", that would seem a good thing. -- Darren Duncan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013.03.25 6:03 PM, Darren Duncan wrote:
On 2013.03.25 5:55 PM, Craig Ringer wrote:
On 03/25/2013 10:28 PM, Tom Lane wrote:
Yeah, they are, because things break when they're set wrong.
They also make debugging and support harder; you need to get an
ever-growing list of GUC values from the user to figure out what their
query does. bytea_output, standard_conforming_strings, etc. Yick.That said, I don't have a better answer for introducing non-BC changes.
Given the general trouble GUC values cause, is there a plan to deprecate and
remove each of the existing ones over time? As long as post-removal there isn't
any actual loss of functionality, but users might have to change their code to
do it "the one true way", that would seem a good thing. -- Darren Duncan
To clarify, I mean GUC related to backwards compatibility matters, such as
bytea_output or standard_conforming_strings, things that affect the logical
behavior of code. I don't mean all GUC, not at all, most of the ones I know
about should remain configurable. -- Darren Duncan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Mar 25, 2013 at 10:14:15AM -0700, Josh Berkus wrote:
Tom,
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.Really? I thought that standard_conforming_strings was a great example
of how to ease our users into a backwards-compatibility break. My
thought was that we change the behavior in 9.4, provide a
backwards-compatible GUC with warnings in the logs for two versions, and
then take the GUC away.
standard_conforming_strings is not a good example because it took 5+
years to implement the change, and issued warnings about non-standard
use for several releases --- it is not a pattern to follow.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
On Mon, Mar 25, 2013 at 10:14:15AM -0700, Josh Berkus wrote:
No, it *isn't* a good idea. GUCs that change application-visible
semantics are dangerous. We should have learned this lesson by now.
Really? I thought that standard_conforming_strings was a great example
of how to ease our users into a backwards-compatibility break. My
thought was that we change the behavior in 9.4, provide a
backwards-compatible GUC with warnings in the logs for two versions, and
then take the GUC away.
standard_conforming_strings is not a good example because it took 5+
years to implement the change, and issued warnings about non-standard
use for several releases --- it is not a pattern to follow.
s_c_s was an example of the worst possible case: where the behavioral
change not merely breaks applications, but breaks them in a way that
creates easily-exploitable security holes. We *had* to take that one
really slow, and issue warnings for several years beforehand (and IIRC,
there were still gripes from people who complained that we'd caused them
security problems). I can't imagine that we'd go to that kind of
trouble for any less-sensitive behavioral change.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 03/20/2013 04:45 PM, Brendan Jurd wrote:
Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One canAs a heavy user of arrays, I support this patch as being worth the
breakage of backwards compatibility. However, that means it certainly
will need to wait for 9.4 to provide adequate warning.
I expect to lose this argument, but I think this is a terrible idea.
Users really hate it when they try to upgrade and find that they, uh,
can't, because of some application-level incompatibility like this.
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.
The way to make a change like this without breaking things for users
is to introduce a new type with different behavior and gradually
deprecate the old one. Now, maybe it doesn't seem worth doing that
for a change this small. But if so, I think that's evidence that this
isn't worth changing in the first place, not that it's worth changing
without regard for backwards-compatibility.
Personally, I think if we're going to start whacking around the
behavior here and risk inconveniencing our users, we ought to think a
little larger. The fundamental thing that's dictating the current
behavior is that we have arrays of between 1 and 6 dimensions all
rolled up under a single data type. But in many cases, if not nearly
all cases, what people want is, specifically, a one-dimensional array.
If we were going to actually bite the bullet and create separate data
types for each possible number of array dimensions... and maybe fix
some other problems at the same time... then the effort involved in
ensuring backward-compatibility might seem like time better spent.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/3/26 Robert Haas <robertmhaas@gmail.com>:
On Sun, Mar 24, 2013 at 10:02 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 03/20/2013 04:45 PM, Brendan Jurd wrote:
Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One canAs a heavy user of arrays, I support this patch as being worth the
breakage of backwards compatibility. However, that means it certainly
will need to wait for 9.4 to provide adequate warning.I expect to lose this argument, but I think this is a terrible idea.
Users really hate it when they try to upgrade and find that they, uh,
can't, because of some application-level incompatibility like this.
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.The way to make a change like this without breaking things for users
is to introduce a new type with different behavior and gradually
deprecate the old one. Now, maybe it doesn't seem worth doing that
for a change this small. But if so, I think that's evidence that this
isn't worth changing in the first place, not that it's worth changing
without regard for backwards-compatibility.Personally, I think if we're going to start whacking around the
behavior here and risk inconveniencing our users, we ought to think a
little larger. The fundamental thing that's dictating the current
behavior is that we have arrays of between 1 and 6 dimensions all
rolled up under a single data type. But in many cases, if not nearly
all cases, what people want is, specifically, a one-dimensional array.
If we were going to actually bite the bullet and create separate data
types for each possible number of array dimensions... and maybe fix
some other problems at the same time... then the effort involved in
ensuring backward-compatibility might seem like time better spent.
I understand, but I don't agree. W have to fix impractical design of
arrays early. A ARRAY is 1st class - so there is not possible to use
varchar2 trick.
if we don't would to use GUC, what do you think about compatible
extension? We can overload a system functions behave. This can solve a
problem with updates and migrations.
Regards
Pavel
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 March 2013 22:57, Robert Haas <robertmhaas@gmail.com> wrote:
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.
We can't sensibly test for whether an array is empty. I'd call that a
functional problem.
The NULL return from array_{length,lower,upper,ndims} is those
functions' way of saying their arguments failed a sanity check. So we
cannot distinguish in a disciplined way between a valid, empty array,
and bad arguments. If the zero-D implementation had been more
polished and say, array_ndims returned zero, we had provided an
array_empty function, or the existing functions threw errors for silly
arguments instead of returning NULL, then I'd be more inclined to see
your point. But as it stands, the zero-D implementation has always
been half-baked and slightly broken, we just got used to working
around it.
Cheers,
BJ
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I expect to lose this argument, but I think this is a terrible idea.
Users really hate it when they try to upgrade and find that they, uh,
can't, because of some application-level incompatibility like this.
They hate it twice as much when the change is essentially cosmetic.
There's no functional problems with arrays as they exist today that
this change would solve.
Sure there is. How do you distinguish between an array which is NULL
and an array which is empty?
Also, the whole array_dims is NULL thing trips up pretty much every
single PG user who uses arrays for the first time. I'd expect when we
announce the fix, we'll find that many users where doing the wrong thing
in the first place and didn't know why it wasn't working.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers