Different results in a loop with RECORD vs ROWTYPE...

Started by Sean Chittendenalmost 23 years ago11 messagesbugs
Jump to latest
#1Sean Chittenden
sean@chittenden.org

I'm looping through a single column from a set of results. If I store
the result in a ROWTYPE of the table, I get NULL values back. If I
store them in a RECORD type, I get the proper values and life is
happy. I'm not sure why there'd be any difference, but it screams bug
to me and I haven't been able to reproduce it outside of my production
schema.

I haven't been able to bottle up a test sequence that'd allow others
to repeat this out in the wild, unfortunately, but simply changing
between s.t%ROWTYPE and RECORD fixes the problem... and it shouldn't
as far as I can tell. Here is a munged version of the
environment/function I'm using:

CREATE TABLE s.t (
a BIGINT NOT NULL,
b BIGINT NOT NULL,
c INT NOT NULL DEFAULT 1::INT,
d BIGINT NOT NULL
);

CREATE FUNCTION s.t_ins()
RETURNS TRIGGER
EXTERNAL SECURITY DEFINER
AS '
DECLARE
r_t RECORD; -- s.t%ROWTYPE;
BEGIN
SELECT CURRVAL(''s1.seq1'') INTO NEW.d;
FOR r_t IN SELECT z.b FROM s.t z WHERE z.c = NEW.c LOOP
RAISE INFO ''b: %, c: %'', r_t.b, NEW.c;
PERFORM s.f(r_t.b,NEW.c);
END LOOP;

RETURN NEW;
END;
' LANGUAGE 'plpgsql';

CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW
EXECUTE PROCEDURE s.t_ins();

I suppose the specifics don't matter, but, here's the difference in
output if I change between RECORD and ROWTYPE:

Using RECORD:

psql:file.sql:1429: INFO: b: 1, c: 4
psql:file.sql:1429: INFO: b: 2, c: 4
psql:file.sql:1429: INFO: b: 3, c: 4
INSERT 2164848 1

And using ROWTYPE:

psql:file.sql:1429: INFO: b: <NULL>, c: 4
psql:file.sql:1429: ERROR: ExecInsert: Fail to add null value in not null attribute b
CONTEXT: PL/pgSQL function f line 13 at SQL statement
PL/pgSQL function t_ins line 7 at unknown

??? The _only_ difference between the two is changing things from
RECORD to ROWTYPE. Inbetween each test run, I'm doing a drop database
and recreating everything from scratch so the test should be clean.

Like I've said, I've tried bottling up a test case that someone else
can use to debug this, but I haven't been able to find anything
reproducible or anything in the code that'd suggest what the problem
is. I'm hoping that someone who's familiar with pl/pgsql's guts can
look at this and go, "hrm... It's a problem with assigning a single
value into a rowtype or SPI's memory handling is being overly
aggressive in cleaning this up because of ____," or something, but I
haven't been able to reproduce it with simpler test cases. :(

This is using a snapshot from HEAD on 2003.05.08. Here's the test
case I've been trying to use to reproduce this, for those interested
(this test proves nothing, only is a more complete version of what's
listed above that can be run verbatim). Because the below test case
doesn't work, I'm worried that this is a memory issue and not related
to a specific action, which concerns me a great deal.

\c template1
DROP DATABASE test;
CREATE DATABASE test;
\c test pgsql
CREATE FUNCTION public.plpgsql_call_handler () RETURNS language_handler
AS '$libdir/plpgsql', 'plpgsql_call_handler'
LANGUAGE c;

CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER public.plpgsql_call_handler;
\c test
CREATE SCHEMA s;
CREATE TABLE s.t (i INT, j INT);
CREATE SEQUENCE s.seq;
INSERT INTO s.t (i) VALUES (1);
INSERT INTO s.t (i) VALUES (2);
INSERT INTO s.t (i) VALUES (3);
CREATE OR REPLACE FUNCTION s.t_ins()
RETURNS TRIGGER
EXTERNAL SECURITY DEFINER
AS '
DECLARE
r_t s.t%ROWTYPE; -- RECORD;
BEGIN
SELECT NEXTVAL(''s.seq'') INTO NEW.j;
FOR r_t IN SELECT a.i FROM s.t a WHERE a.i IS NOT NULL LOOP
RAISE INFO ''i: %\tj: %'', r_t.i, NEW.j;
END LOOP;

RETURN NEW;
END;
' LANGUAGE 'plpgsql';
CREATE TRIGGER t_ins BEFORE INSERT ON s.t FOR EACH ROW EXECUTE PROCEDURE s.t_ins();
INSERT INTO s.t (i) VALUES (4);
INSERT INTO s.t (i) VALUES (5);
SELECT * FROM s.t;

Any hints are invaluable. -sc

--
Sean Chittenden

#2Sean Chittenden
sean@chittenden.org
In reply to: Sean Chittenden (#1)
Re: Different results in a loop with RECORD vs ROWTYPE...

I'm looping through a single column from a set of results. If I
store the result in a ROWTYPE of the table, I get NULL values back.
If I store them in a RECORD type, I get the proper values and life
is happy. I'm not sure why there'd be any difference, but it
screams bug to me and I haven't been able to reproduce it outside of
my production schema.

Hrm, sorry for replying to myself, but moving the rest of uses of
ROWTYPE to RECORD has solved what I originally thought was a logic
problem with recursive pl/pgsql calls, but was a problem stemming from
using ROWTYPE instead of RECORD. Using ROWTYPE, I wasn't getting
results back that I was expecting, using RECORD, I was getting the
expected rows back so that I could operate on them in a loop. ROWTYPE
seems to be broken somehow, somewhere, though I'm not sure in what way
or where. Apologies in advance for the lack of more details... I'm
not sure where to look to get more details in this instance.

My best guess is there's a bug in exec_assign_value in that it doesn't
handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything
in the error logs and there aren't any errors being thrown. So,
failing that, maybe something in exec_move_row or exec_stmt_fors in
pl_exec.c, but I can't see anything that really jumps out other than
that.

Other ideas? -sc

--
Sean Chittenden

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#2)
Re: Different results in a loop with RECORD vs ROWTYPE...

Sean Chittenden <sean@chittenden.org> writes:

My best guess is there's a bug in exec_assign_value in that it doesn't
handle assigning to PLPGSQL_DTYPE_ROW... but I'm not seeing anything
in the error logs and there aren't any errors being thrown.

Can you give us a test case now that you understand the issue better?

regards, tom lane

#4Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#3)
Re: Different results in a loop with RECORD vs ROWTYPE...

My best guess is there's a bug in exec_assign_value in that it
doesn't handle assigning to PLPGSQL_DTYPE_ROW... but I'm not
seeing anything in the error logs and there aren't any errors
being thrown.

Can you give us a test case now that you understand the issue
better?

Sweet! Yes! :-] Only took me forever and a day to get it down small
enough without cutting too much. ::sigh:: What a relief. Alright, my
best guess is that this is a memory problem. If you remove column w
from the table s.c, everything works with ROWTYPE. Why that would
affect this, I have no clue, but it does. Use a RECORD type, and
you're good to go. ::shrug:: Anyway, I've attached the test case.
Change to type RECORD if you'd like to verify that things work.
Please feel free to ask questions. -sc

#### ORIGINAL EMAIL before test case ####

Unfortunately no. I'm working on trimming away at this package's
schema to try and get to the lowest common denominator... and haven't
been able to come up with anything conclusive. I got it down to less
than 200 lines, then when I went to trim a little more, I went too far
in my cutting and the backend did the right thing. What's worse, is I
cut an adjacent table that had nothing to do with the table/procedure
in question. Anyway, when I went to use my ~ backup file, the backup
file was nearly from the start back at nearly 30K lines. ::sigh::
Working from the ground up wasn't yielding any useful results either.
:( I'm going to take another shot at it later and manually make
backups, but I've blown my whole day trying to track this down one way
or another and need to finish up some other things 1st.

I fired up gdb though and have traced through the execution of the
backend and have been able to generate two different execution paths
for a RECORD and ROWTYPE. I went through and read the two paths side
by side, but I don't understand enough of the backend to be able to
use the information in the structures effectively. My biggest problem
is I don't know what data/variables I should be watching out for and
what's superfluous. In my simplified test case, things work
correctly. When I use my regression tests + ROWTYPE, things bomb
miserably. If I use RECORD and I'm just fine.

Tom, maybe you have some advice with regards to what I can look for.
The select statement is returning the right data, that much I can
verify with gdb. What I'm lost on is figuring out where the value for
the datum is stored. I'm wondering if this isn't a BIGINT issue, but
like I said, I haven't been able to reproduce this problem with a
BIGINT data type in my tables. Because of that inability to reproduce
this on a smaller scale, I'm worried it's a memory management problem.
Internally, the attype is correctly listed as 20 in all of the
structures, from what I've observed... but somewhere along the way,
the datum is being returned as NULL in the case of a ROWTYPE and not a
RECORD.

Any other ideas? -sc

--
Sean Chittenden

Attachments:

test.sqltext/plain; charset=us-asciiDownload
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#4)
Re: Different results in a loop with RECORD vs ROWTYPE...

Sean Chittenden <sean@chittenden.org> writes:

CREATE TABLE s.c (
x BIGINT NOT NULL,
y BIGINT NOT NULL,
w INT NOT NULL DEFAULT 1::INT
);

DECLARE
r_c s.c%ROWTYPE; -- RECORD;
BEGIN
FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
PERFORM s.add_y_to_x(r_c.y,NEW.z);

It seems to me that the rowtype of this SELECT's result is (y bigint).
When you declare r_c as RECORD, it adopts that rowtype, and so the
reference to r_c.y in the PERFORM delivers the value you want. But
when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int),
the result of the SELECT's first column is delivered into r_c.x and then
the other two columns are set to null. So r_c.y is null in the PERFORM.

I think this is basically pilot error, though one could certainly argue
that the system ought to be complaining that the SELECT didn't deliver
enough columns to fill the rowtype variable. Any thoughts? Can anyone
report what Oracle's pl/sql does in comparable situations?

regards, tom lane

#6Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#5)
Re: Different results in a loop with RECORD vs ROWTYPE...

CREATE TABLE s.c (
x BIGINT NOT NULL,
y BIGINT NOT NULL,
w INT NOT NULL DEFAULT 1::INT
);

DECLARE
r_c s.c%ROWTYPE; -- RECORD;
BEGIN
FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
PERFORM s.add_y_to_x(r_c.y,NEW.z);

It seems to me that the rowtype of this SELECT's result is (y bigint).
When you declare r_c as RECORD, it adopts that rowtype, and so the
reference to r_c.y in the PERFORM delivers the value you want. But
when you declare r_c as s.c%ROWTYPE, that is (x bigint, y bigint, w int),
the result of the SELECT's first column is delivered into r_c.x and then
the other two columns are set to null. So r_c.y is null in the PERFORM.

I think this is basically pilot error, though one could certainly argue
that the system ought to be complaining that the SELECT didn't deliver
enough columns to fill the rowtype variable. Any thoughts?

Oooh, if indeed that is the way that things are implemented, then yes,
that is pilot error. I should submit some doco to that effect because
that would have been most useful to know upfront.

I was under the impression that a ROWTYPE was basically akin to a C
structure that represented a ROW from the specified table. Each
column was a pointer to the datum returned by the SELECT. Therefore,
if r_c is defined as s.c%ROWTYPE, then r_c.x, r_c.y, and r_c.w would
all be initialized to NULLs until the FOR r_c IN SELECT populated the
values of the r_c structure, with r_c.y mapping to d.y. Granted the
mapping would break down instantly if the SELECT was rewritten as:

FOR r_c IN SELECT d.y AS x...

but I'd think that'd be a powerful feature that could be easily
abused, but very useful if indeed ROWTYPEs were just pointers to the
returned datums... instead, datums are copied, something I was not
wild to discover. I thought everything was done by reference in
pl/pgsql.

Are there any pl/pgsql -> C converters?

-sc

--
Sean Chittenden

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sean Chittenden (#6)
Re: Different results in a loop with RECORD vs ROWTYPE...

Sean Chittenden <sean@chittenden.org> writes:

CREATE TABLE s.c (
x BIGINT NOT NULL,
y BIGINT NOT NULL,
w INT NOT NULL DEFAULT 1::INT
);

DECLARE
r_c s.c%ROWTYPE; -- RECORD;
BEGIN
FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
PERFORM s.add_y_to_x(r_c.y,NEW.z);

I was under the impression that a ROWTYPE was basically akin to a C
structure that represented a ROW from the specified table.

Indeed, but your SELECT doesn't deliver a ROW from the specified table.
It only delivers one column. If you'd said "SELECT * FROM s.c" then
things would have worked as you expect. But in the above command, the
column matching is positional, and so it's r_c.x not r_c.y that gets
loaded with the sole column supplied by the SELECT.

I don't think that the choice of positional matching is wrong, and in
any case we couldn't change it without breaking a lot of existing
plpgsql code. Arguably it should be an error to supply the wrong number
of columns to fill a rowtype result variable, though.

regards, tom lane

#8Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#7)
Re: Different results in a loop with RECORD vs ROWTYPE...

Tom,

I believe that we should raise an exception if the SELECT statement does not
match the ROWTYPE, though of course we'd have to provide a
backward-compatible GUC in case anyone is counting on the current behavior.

Personally, since %ROWTYPE has no efficiency gain in PL/pgSQL over RECORD, I
seldom use it, though I use RECORD about 200 times per project.

If you're still interested, I will consult my PL/SQL bible this afternoon to
see what Oracle 8i does in this case.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#9Sean Chittenden
sean@chittenden.org
In reply to: Tom Lane (#7)
Re: Different results in a loop with RECORD vs ROWTYPE...

CREATE TABLE s.c (
x BIGINT NOT NULL,
y BIGINT NOT NULL,
w INT NOT NULL DEFAULT 1::INT
);

DECLARE
r_c s.c%ROWTYPE; -- RECORD;
BEGIN
FOR r_c IN SELECT d.y FROM s.c d WHERE d.x = NEW.x LOOP
PERFORM s.add_y_to_x(r_c.y,NEW.z);

I was under the impression that a ROWTYPE was basically akin to a C
structure that represented a ROW from the specified table.

Indeed, but your SELECT doesn't deliver a ROW from the specified
table. It only delivers one column. If you'd said "SELECT * FROM
s.c" then things would have worked as you expect. But in the above
command, the column matching is positional, and so it's r_c.x not
r_c.y that gets loaded with the sole column supplied by the SELECT.

I don't think that the choice of positional matching is wrong, and
in any case we couldn't change it without breaking a lot of existing
plpgsql code. Arguably it should be an error to supply the wrong
number of columns to fill a rowtype result variable, though.

:-/ I would argue differently for the below points, but it's kinda
mute given RECORD does the job.

*) Explicitly using ROWTYPE is like using a statically declared
language. In the event that the schema changes, the stored
procedure would break and with good cause. Using a RECORD type
obviates this possibility and things may silently continue to work
with differing results.

*) SELECT * into a ROWTYPE is an expensive practice and I would argue
horribly inefficient in that an entire row has been lifted off of
disk, copied from kernel to user space, and then gets copied into the
resulting ROWTYPE. Minimizing this I would think would be of great
interest to DBAs where memory and disk cycles are precious. If
SELECT'ing a single row, or omitting specific columns from a row
throws off the placement of data into ROWTYPEs because it
sequentially places data into the ROWTYPE var, the logic of placing
data into ROWTYPEs is incorrect and that placement of data into a
ROWTYPE should be done by matching the name of the resulting column
from the SELECT into the corresponding name of the element in the
ROWTYPE. The name of an element in a ROWTYPE and table is guaranteed
to be unique by their very nature.

Anyway, my thoughts in the event that pl/pgsql gets its famed overhaul
on the script side.

*shrugs* Given the large amount of pl/pgsql code that I've got running
around, I'm slowly (2-3mo completion time it's looking like) working
on have pl/pgsql scripts compiled to C and then .so's automatically
loaded from their sequentially numbered names stored in the
data/plpgsql. Recompiling the .so between version dumps isn't an
issue because the whole db has to be reimported. If anyone's got any
advice on the topic, I'm all ears, but that's the direction I'm
working toward to help solve some of the inefficiencies in pl/pgsql.
I've nabbed the gram.y and scan.l files from plpgsql so scripts should
be 100% compatible. I figure every BSD and likely every linux server
box likely has a compiler on board, same with Slowaris if they're
running postgresql and are looking for speed. But, if they don't,
then they can use plpgsql instead of plpgsqlc. FWIW, compiler flags
and variables are stored in system catalogs so those can be changed
(Tom, any preferences on a system catalog name for compiler bits, or
should I convert things to abuse the GUC infrastructure). I'm
unconvinced that there is a need for a way to recompile a function
other than DROP'ing it and re-CREATE'ing it. Any need for an ALTER
FUNCTION s.f() RECOMPILE? Since .so's are mmpap()'ed on almost every
system, this is also of memory interest to folks, never mind the
speed.

-sc

--
Sean Chittenden

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#8)
Re: Different results in a loop with RECORD vs ROWTYPE...

Josh Berkus <josh@agliodbs.com> writes:

I believe that we should raise an exception if the SELECT statement does not
match the ROWTYPE, though of course we'd have to provide a
backward-compatible GUC in case anyone is counting on the current behavior.

Digging in the code, I see this comment in exec_move_row:

* NOTE: this code used to demand row->nfields == tup->t_data->t_natts,
* but that's wrong. The tuple might have more fields than we
* expected if it's from an inheritance-child table of the current
* table, or it might have fewer if the table has had columns added by
* ALTER TABLE. Ignore extra columns and assume NULL for missing
* columns, the same as heap_getattr would do.

So blindly restoring the column-count check will break things. I think
that the above considerations only apply to one of the call sites of
exec_move_row, so we could make the other ones apply the check, but
clearly it's a little more ticklish than one would think.

If you're still interested, I will consult my PL/SQL bible this afternoon to
see what Oracle 8i does in this case.

Since plpgsql is generally supposed to be a slavish imitation of Oracle,
it would be good to know what they do...

regards, tom lane

#11Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#10)
Re: Different results in a loop with RECORD vs ROWTYPE...

Tom,

If you're still interested, I will consult my PL/SQL bible this afternoon

to

see what Oracle 8i does in this case.

Since plpgsql is generally supposed to be a slavish imitation of Oracle,
it would be good to know what they do...

Unfortunately, PL/PQL Programming, 2nd Edition, does not make it clear what
happens if you select the wrong columns for a given %ROWTYPE. Like PL/pgSQL,
PL/SQL Programming encourages the use of RECORDs rather than %ROWTYPE.

I'll post to the SQL list to see if any of the Oracle hackers there knows.

--
-Josh Berkus
Aglio Database Solutions
San Francisco