Null row vs. row of nulls in plpgsql

Started by Tom Laneover 17 years ago10 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?

regards, tom lane

#2Oleg Serov
serovov@gmail.com
In reply to: Tom Lane (#1)
Fwd: Null row vs. row of nulls in plpgsql

---------- Forwarded message ----------
From: Oleg Serov <serovov@gmail.com>
Date: 2008/9/27
Subject: Re: Null row vs. row of nulls in plpgsql
To: Tom Lane <tgl@sss.pgh.pa.us>

I'm newbie, but i think that adding bool flag to PLpgSQL_row isnull will
handle the problem(like in PLpgSQL_var);

2008/9/27 Tom Lane <tgl@sss.pgh.pa.us>

I looked a bit at the bug report here:

Show quoted text

http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?

regards, tom lane

#3Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Tom Lane (#1)
Re: Null row vs. row of nulls in plpgsql

On Sat, 2008-09-27 at 14:56 -0400, Tom Lane wrote:

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

How hard would it be to have a RECORD that has pointers to those
multiple scalar variables ?

Referring again to my favorite ordinary programming language python, you
can have a very elegant way of assigning a "record" (a tuple in
pythonese) to a set of variables and vice versa

rec = 1,2,3
rec

(1, 2, 3)

a,b,c = rec
a

1

c

3

c,b,a

(3, 2, 1)

In other words, tuples are more or less automatically composed and
decomposed on demand.

I have not yet looked how hard the implementation of this would be for
postgreSQL, but at least the concept should be applicable.

----------------
Hannu

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#3)
Re: Null row vs. row of nulls in plpgsql

Hannu Krosing <hannu@2ndQuadrant.com> writes:

Referring again to my favorite ordinary programming language python, you
can have a very elegant way of assigning a "record" (a tuple in
pythonese) to a set of variables and vice versa

There are already perfectly good ways to do that in plpgsql, viz
rowvar := row(x,y,z);
select rowvar.* into x,y,z;

regards, tom lane

#5Greg Stark
greg.stark@enterprisedb.com
In reply to: Tom Lane (#1)
Re: Null row vs. row of nulls in plpgsql

Iirc the reason for this fuzziness came from the SQL spec definition
of IS NULL for rows. As long as you maintain that level of spec-
compliance I don't think there are any other important constraints on
pg behaviour.

greg

--sorry for the top posting but the phone makes it hard to do anything
else.

On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes
needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD
variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to
describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Hannu Krosing
hannu@2ndQuadrant.com
In reply to: Greg Stark (#5)
Re: Null row vs. row of nulls in plpgsql

On Sun, 2008-09-28 at 04:03 +0300, Greg Stark wrote:

Iirc the reason for this fuzziness came from the SQL spec definition
of IS NULL for rows. As long as you maintain that level of spec-
compliance I don't think there are any other important constraints on
pg behaviour.

What does SQL spec say about recursive IS NULL for rows ?

Should we check that IS NULL is true for each row element, or must they
actually be NULL's ?

hannu=# select row(null, null) is NULL;
?column?
----------
t
(1 row)

hannu=# select row(null, row(null, null)) is NULL;
?column?
----------
f
(1 row)

--------------
Hannu

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#5)
Re: Null row vs. row of nulls in plpgsql

Greg Stark <greg.stark@enterprisedb.com> writes:

On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

Iirc the reason for this fuzziness came from the SQL spec definition
of IS NULL for rows. As long as you maintain that level of spec-
compliance I don't think there are any other important constraints on
pg behaviour.

I started to poke into this and found out that it was a bit subtler than
I thought. It'd be possible to associate a "rowisnull" state value
with a row variable, but the problem is that plpgsql treats the row
fields as independent variables that can be accessed without touching
the row. In particular you can assign null or nonnull values to
individual fields. So consider

-- presumably, this'll set rowisnull to TRUE:
rowvar := NULL;
-- this had better cause rowisnull to become FALSE:
rowvar.field1 := 42;
-- does this cause it to become TRUE again?
rowvar.field1 := NULL;

There are a bunch of implementation problems with making any such
behavior happen, since the row field variables don't currently "know"
that they are members of a row, and indeed it's possible for the same
variable to be a member of more than one row. But the core issue is
that this interaction seems to fuzz the distinction between "row is
null" and "all the row's elements are null". In particular, if you
think that rowisnull should be TRUE after the above sequence, then
I think you are saying they are the same thing. So maybe the spec
authors are smarter than we are.

Thoughts? What would a consistent behavior look like?

regards, tom lane

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#7)
Re: Null row vs. row of nulls in plpgsql

2008/9/29 Tom Lane <tgl@sss.pgh.pa.us>:

Greg Stark <greg.stark@enterprisedb.com> writes:

On 27 Sep 2008, at 09:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

Iirc the reason for this fuzziness came from the SQL spec definition
of IS NULL for rows. As long as you maintain that level of spec-
compliance I don't think there are any other important constraints on
pg behaviour.

I started to poke into this and found out that it was a bit subtler than
I thought. It'd be possible to associate a "rowisnull" state value
with a row variable, but the problem is that plpgsql treats the row
fields as independent variables that can be accessed without touching
the row. In particular you can assign null or nonnull values to
individual fields. So consider

-- presumably, this'll set rowisnull to TRUE:
rowvar := NULL;
-- this had better cause rowisnull to become FALSE:
rowvar.field1 := 42;
-- does this cause it to become TRUE again?
rowvar.field1 := NULL;

this sequence is wrong. in SQL rowvar has same behave as pointer. When
you would to fill rowvar you should to call constructor first.

some like
rowvar := NULL; -- null value
rowvar := constructor(null);
rowvar := constructor();
rowvar.field = 42;

regards
Pavel Stehule

Show quoted text

There are a bunch of implementation problems with making any such
behavior happen, since the row field variables don't currently "know"
that they are members of a row, and indeed it's possible for the same
variable to be a member of more than one row. But the core issue is
that this interaction seems to fuzz the distinction between "row is
null" and "all the row's elements are null". In particular, if you
think that rowisnull should be TRUE after the above sequence, then
I think you are saying they are the same thing. So maybe the spec
authors are smarter than we are.

Thoughts? What would a consistent behavior look like?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Null row vs. row of nulls in plpgsql

I assume this is a TODO, right?

---------------------------------------------------------------------------

Tom Lane wrote:

I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, "null::somerowtype") and a
row of null values (eg, "row(null,null,...)::somerowtype"). When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere. For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness. I think it
could be dealt with by adding an explicit "the row as a whole is null"
flag to struct PLpgSQL_row. I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes needed
to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD variables
by storing a HeapTuple instead of a list of per-field variables. But
I soon found out that the reason to have them is to be able to describe
the assignment target of SQL statements that assign to multiple scalar
variables, eg "SELECT ... INTO x,y,z".

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: Null row vs. row of nulls in plpgsql

Bruce Momjian <bruce@momjian.us> writes:

I assume this is a TODO, right?

Yah.

regards, tom lane