Possible bug with row_to_json

Started by Jack Christensenover 12 years ago7 messages
#1Jack Christensen
jack@jackchristensen.com

When using a subquery as a source for row_to_json, depending on the
order of arguments it may ignore renaming a column.

jack=# create table player(
jack(# player_id serial primary key,
jack(# name varchar not null unique
jack(# );
NOTICE: CREATE TABLE will create implicit sequence
"player_player_id_seq" for serial column "player.player_id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"player_pkey" for table "player"
NOTICE: CREATE TABLE / UNIQUE will create implicit index
"player_name_key" for table "player"
CREATE TABLE
jack=# insert into player(name) values('Jack');
INSERT 0 1
jack=# select row_to_json(t)
jack-# from (
jack(# select player_id as renamed, name
jack(# from player
jack(# order by name
jack(# ) t;
row_to_json
-------------------------------
{"player_id":1,"name":"Jack"}
(1 row)

It ignored the rename.

jack=# select row_to_json(t)
from (
select name, player_id as renamed
from player
order by name
) t;
row_to_json
-----------------------------
{"name":"Jack","renamed":1}
(1 row)

But here it didn't.

Is this a bug?

Jack Christensen

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Jack Christensen (#1)
Re: Possible bug with row_to_json

On Mon, Aug 5, 2013 at 5:15 PM, Jack Christensen
<jack@jackchristensen.com> wrote:

When using a subquery as a source for row_to_json, depending on the order of
arguments it may ignore renaming a column.

jack=# create table player(
jack(# player_id serial primary key,
jack(# name varchar not null unique
jack(# );
NOTICE: CREATE TABLE will create implicit sequence "player_player_id_seq"
for serial column "player.player_id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "player_pkey"
for table "player"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "player_name_key"
for table "player"
CREATE TABLE
jack=# insert into player(name) values('Jack');
INSERT 0 1
jack=# select row_to_json(t)
jack-# from (
jack(# select player_id as renamed, name
jack(# from player
jack(# order by name
jack(# ) t;
row_to_json
-------------------------------
{"player_id":1,"name":"Jack"}
(1 row)

It ignored the rename.

jack=# select row_to_json(t)
from (
select name, player_id as renamed
from player
order by name
) t;
row_to_json
-----------------------------
{"name":"Jack","renamed":1}
(1 row)

But here it didn't.

Is this a bug?

yup.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jack Christensen (#1)
1 attachment(s)
Re: [GENERAL] Possible bug with row_to_json

Jack Christensen <jack@jackchristensen.com> writes:

jack=# create table player(
jack(# player_id serial primary key,
jack(# name varchar not null unique
jack(# );
CREATE TABLE
jack=# insert into player(name) values('Jack');
INSERT 0 1
jack=# select row_to_json(t)
jack-# from (
jack(# select player_id as renamed, name
jack(# from player
jack(# order by name
jack(# ) t;
row_to_json
-------------------------------
{"player_id":1,"name":"Jack"}
(1 row)

It ignored the rename.

I looked into this and found that the culprit is the optimization that
skips ExecProject() if a scan plan node is not doing any useful
projection. In the plan for this query:

Subquery Scan on t (cost=0.15..81.98 rows=1230 width=60)
Output: row_to_json(t.*)
-> Index Scan using player_name_key on public.player (cost=0.15..66.60 rows=1230 width=36)
Output: player.player_id, player.name

the indexscan node decides it can just return its "scan tuple" slot
directly to the caller, rather than projecting into the "result tuple"
slot. The problem is that the scan tuple slot's tuple descriptor is that
of the table being scanned, and in particular it has the actual column
names of the underlying table, plus a tdtypeid matching the table's
rowtype OID. This info is what's looked at by ExecEvalWholeRowVar to form
a tuple datum, so the datum ends up looking like it has exactly the
table's rowtype, and then row_to_json is going to be given the table's
tuple descriptor when it asks lookup_rowtype_tupdesc for a tupdesc.

The attached quick-hack patch fixes the reported case. I *think* it's
safe as is, but I wonder whether we ought to install additional checks
to verify physical compatibility of the two tuple descriptors before
we decide it's okay to skip projection. Another potential issue is
that we're losing whatever benefit there may be in having tuple datums
marked with named rowtypes rather than RECORD --- I'm not sure there's
any meaningful performance difference, but I'm not sure there isn't,
either. We could be even more paranoid by choosing to only install
the other descriptor if there's actually a difference in column names;
but is it worth the extra complexity?

Comments?

regards, tom lane

Attachments:

install-correct-descriptor.patchtext/x-diff; charset=us-ascii; name=install-correct-descriptor.patchDownload
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index 3ea6460..b91ae03 100644
*** a/src/backend/executor/execScan.c
--- b/src/backend/executor/execScan.c
*************** ExecScan(ScanState *node,
*** 234,246 ****
   *		Set up projection info for a scan node, if necessary.
   *
   * We can avoid a projection step if the requested tlist exactly matches
!  * the underlying tuple type.  If so, we just set ps_ProjInfo to NULL.
   * Note that this case occurs not only for simple "SELECT * FROM ...", but
   * also in most cases where there are joins or other processing nodes above
   * the scan node, because the planner will preferentially generate a matching
   * tlist.
   *
!  * ExecAssignScanType must have been called already.
   */
  void
  ExecAssignScanProjectionInfo(ScanState *node)
--- 234,253 ----
   *		Set up projection info for a scan node, if necessary.
   *
   * We can avoid a projection step if the requested tlist exactly matches
!  * the underlying tuple type.  If so, we set ps_ProjInfo to NULL to inform
!  * later processing that the scan tuple can be returned as-is.  We also
!  * have to tweak the scan tuple slot to have the exact same tuple descriptor
!  * the projection slot does --- they are physically compatible, if the tlist
!  * matches, but the projection slot might contain different column names and
!  * we want anything that looks at the slot tupdesc to see the right names.
!  *
   * Note that this case occurs not only for simple "SELECT * FROM ...", but
   * also in most cases where there are joins or other processing nodes above
   * the scan node, because the planner will preferentially generate a matching
   * tlist.
   *
!  * ExecAssignScanType and ExecAssignResultTypeFromTL must have been called
!  * already.
   */
  void
  ExecAssignScanProjectionInfo(ScanState *node)
*************** ExecAssignScanProjectionInfo(ScanState *
*** 258,264 ****
--- 265,277 ----
  							  scan->plan.targetlist,
  							  varno,
  							  node->ss_ScanTupleSlot->tts_tupleDescriptor))
+ 	{
+ 		/* no physical projection needed */
  		node->ps.ps_ProjInfo = NULL;
+ 		/* make sure scan tuple slot's tupdesc exposes the right names */
+ 		ExecSetSlotDescriptor(node->ss_ScanTupleSlot,
+ 							  node->ps.ps_ResultTupleSlot->tts_tupleDescriptor);
+ 	}
  	else
  		ExecAssignProjectionInfo(&node->ps,
  								 node->ss_ScanTupleSlot->tts_tupleDescriptor);
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: [GENERAL] Possible bug with row_to_json

I wrote:

Jack Christensen <jack@jackchristensen.com> writes:

It ignored the rename.

I looked into this and found that the culprit is the optimization that
skips ExecProject() if a scan plan node is not doing any useful
projection.

Further poking at this issue shows that there are related behaviors that
aren't fixed by my proposed patch. The original complaint can be
replicated in the regression database like this:

select row_to_json(i8) from (select q1 as a, q2 from int8_tbl offset 0) i8;

where we'd expect row_to_json to emit column names "a"/"q2" but we
actually get "q1"/"q2". But consider this variant:

select row_to_json(i8) from (select q1,q2 from int8_tbl offset 0) i8(x,y);

Arguably, this should show column names x/y but what you get is q1/q2,
even with my patch. Related cases include

select row_to_json(v) from (values(1,2) limit 1) v(x,y);
select row_to_json((select i8 from int8_tbl i8(x,y) limit 1));

In the first two of those, the planner isn't bothering to install the
column aliases into the plan's target lists. While we could fix that,
it wouldn't help the last case, where the whole-row Var for "int8_tbl"
is evaluated at scan level; the code for that is looking at the relation's
tuple descriptor not the scan node's result descriptor. I'm not even
sure what a clean fix for that case would look like.

Since this behavior can also be demonstrated in 9.2 (and maybe further
back using xml features?), I don't think we should consider it a
blocker bug for 9.3. I'm planning to set it on the back burner for
the moment and go worry about the planner's LATERAL bugs.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#4)
Re: [GENERAL] Possible bug with row_to_json

On Tue, Aug 13, 2013 at 4:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Since this behavior can also be demonstrated in 9.2 (and maybe further
back using xml features?), I don't think we should consider it a
blocker bug for 9.3. I'm planning to set it on the back burner for
the moment and go worry about the planner's LATERAL bugs.

+1

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: [GENERAL] Possible bug with row_to_json

On Tue, Aug 13, 2013 at 05:34:12PM -0400, Tom Lane wrote:

Further poking at this issue shows that there are related behaviors that
aren't fixed by my proposed patch. The original complaint can be
replicated in the regression database like this:

select row_to_json(i8) from (select q1 as a, q2 from int8_tbl offset 0) i8;

where we'd expect row_to_json to emit column names "a"/"q2" but we
actually get "q1"/"q2". But consider this variant:

select row_to_json(i8) from (select q1,q2 from int8_tbl offset 0) i8(x,y);

Arguably, this should show column names x/y but what you get is q1/q2,
even with my patch. Related cases include

select row_to_json(v) from (values(1,2) limit 1) v(x,y);
select row_to_json((select i8 from int8_tbl i8(x,y) limit 1));

In the first two of those, the planner isn't bothering to install the
column aliases into the plan's target lists. While we could fix that,
it wouldn't help the last case, where the whole-row Var for "int8_tbl"
is evaluated at scan level; the code for that is looking at the relation's
tuple descriptor not the scan node's result descriptor. I'm not even
sure what a clean fix for that case would look like.

Since this behavior can also be demonstrated in 9.2 (and maybe further
back using xml features?), I don't think we should consider it a
blocker bug for 9.3. I'm planning to set it on the back burner for
the moment and go worry about the planner's LATERAL bugs.

Where are we on this? I still see the failure:

regression=> select row_to_json(i8) from (select q1 as a, q2 from int8_tbl offset 0) i8;
row_to_json
------------------------------------------------
{"q1":123,"q2":456}
{"q1":123,"q2":4567890123456789}
{"q1":4567890123456789,"q2":123}
{"q1":4567890123456789,"q2":4567890123456789}
{"q1":4567890123456789,"q2":-4567890123456789}

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: [GENERAL] Possible bug with row_to_json

Bruce Momjian <bruce@momjian.us> writes:

On Tue, Aug 13, 2013 at 05:34:12PM -0400, Tom Lane wrote:

Further poking at this issue shows that there are related behaviors that
aren't fixed by my proposed patch.

Where are we on this?

Still have no idea how to fix the whole-row-Var case. We could fix some
of these behaviors, but I'm inclined to think we should wait, and take
all the related compatibility hits at once.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers