checking my understanding of TupleDesc

Started by Chapman Flackalmost 7 years ago7 messageshackers

chap@anastigmatix.net

almost 7 years ago

From looking around the code, I've made these tentative observations
about TupleDescs:

1. If the TupleDesc was obtained straight from the relcache for some
relation, then all of its attributes should have nonzero attrelid
identifying that relation, but in (every? nearly every?) other case,
the attributes found in a TupleDesc will have a dummy attrelid of zero.

2. The attributes in a TupleDesc will (always?) have consecutive attnum
corresponding to their positions in the TupleDesc (and therefore
redundant). A query, say, that projects out a subset of columns
from a relation will not have a result TupleDesc with attributes
still bearing their original attrelid and attnum; they'll have
attrelid zero and consecutive renumbered attnum.

Something like SendRowDescriptionCols_3 that wants the original table
and attnum has to reconstruct them from the targetlist if available,

Have I mistaken any of that?

Thanks,
-Chap

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Chapman Flack (#1)

Re: checking my understanding of TupleDesc

On 09/29/19 20:13, Chapman Flack wrote:

From looking around the code, I've made these tentative observations
about TupleDescs:

1. If the TupleDesc was obtained straight from the relcache for some
relation, then all of its attributes should have nonzero attrelid
identifying that relation, but in (every? nearly every?) other case,
the attributes found in a TupleDesc will have a dummy attrelid of zero.

2. The attributes in a TupleDesc will (always?) have consecutive attnum
corresponding to their positions in the TupleDesc (and therefore
redundant). A query, say, that projects out a subset of columns
from a relation will not have a result TupleDesc with attributes
still bearing their original attrelid and attnum; they'll have
attrelid zero and consecutive renumbered attnum.

Something like SendRowDescriptionCols_3 that wants the original table
and attnum has to reconstruct them from the targetlist if available,

Have I mistaken any of that?

And one more:

3. One could encounter a TupleDesc with one or more 'attisdropped'
attributes, which do have their original attnums (corresponding
to their positions in the TupleDesc and therefore redundant),
so the attnums of nondropped attributes may be discontiguous.
In building a corresponding tuple, any dropped attribute should
have its null flag set.

Is it simple to say under what circumstances a TupleDesc possibly
with dropped members could be encountered, and under what other
circumstances one would only encounter 'cleaned up' TupleDescs with
no dropped attributes, and contiguous numbers for the real ones?

Regards,
-Chap

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Chapman Flack (#2)

Re: checking my understanding of TupleDesc

Chapman Flack <chap@anastigmatix.net> writes:

On 09/29/19 20:13, Chapman Flack wrote:

From looking around the code, I've made these tentative observations
about TupleDescs:

1. If the TupleDesc was obtained straight from the relcache for some
relation, then all of its attributes should have nonzero attrelid
identifying that relation, but in (every? nearly every?) other case,
the attributes found in a TupleDesc will have a dummy attrelid of zero.

I'm not sure about every vs. nearly every, but otherwise this seems
accurate. Generally attrelid is meaningful in a pg_attribute catalog
entry, but not in TupleDescs in memory. It appears valid in relcache
entry tupdescs only because they are built straight from pg_attribute.

2. The attributes in a TupleDesc will (always?) have consecutive attnum
corresponding to their positions in the TupleDesc (and therefore
redundant).

Correct.

And one more:

3. One could encounter a TupleDesc with one or more 'attisdropped'
attributes, which do have their original attnums (corresponding
to their positions in the TupleDesc and therefore redundant),
so the attnums of nondropped attributes may be discontiguous.

Right.

Is it simple to say under what circumstances a TupleDesc possibly
with dropped members could be encountered,

Any tupdesc that's describing the rowtype of a table with dropped columns
would look like that.

and under what other
circumstances one would only encounter 'cleaned up' TupleDescs with
no dropped attributes, and contiguous numbers for the real ones?

I don't believe we ever include dropped columns in a projection result,
so generally speaking, the output of a query plan node wouldn't have them.

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output. But that would only
be true if there are upper plan nodes that are doing some projections of
their own. The final query output will definitely not have them.

regards, tom lane

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Tom Lane (#3)

Re: checking my understanding of TupleDesc

Hi,

On 2019-11-12 17:39:20 -0500, Tom Lane wrote:

and under what other
circumstances one would only encounter 'cleaned up' TupleDescs with
no dropped attributes, and contiguous numbers for the real ones?

I don't believe we ever include dropped columns in a projection result,
so generally speaking, the output of a query plan node wouldn't have them.

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output. But that would only
be true if there are upper plan nodes that are doing some projections of
their own. The final query output will definitely not have them.

I *think* we don't even do that, because build_physical_tlist() bails
out if there's a dropped (or missing) column. Or are you thinking of
something else?

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Andres Freund (#4)

Re: checking my understanding of TupleDesc

Andres Freund <andres@anarazel.de> writes:

On 2019-11-12 17:39:20 -0500, Tom Lane wrote:

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output. But that would only
be true if there are upper plan nodes that are doing some projections of
their own. The final query output will definitely not have them.

I *think* we don't even do that, because build_physical_tlist() bails
out if there's a dropped (or missing) column.

Ah, right. Probably because we need to insist on every column of an
execution-time tupdesc having a valid atttypid ... although I wonder,
is that really necessary?

regards, tom lane

Andres Freund

andres@anarazel.de

over 6 years ago

In reply to: Tom Lane (#5)

Re: checking my understanding of TupleDesc

Hi,

On 2019-11-12 18:20:56 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2019-11-12 17:39:20 -0500, Tom Lane wrote:

There's a semi-exception, which is that the planner might decide that we
can skip a projection step for the output of a table scan node, in which
case dropped columns would be included in its output. But that would only
be true if there are upper plan nodes that are doing some projections of
their own. The final query output will definitely not have them.

I *think* we don't even do that, because build_physical_tlist() bails
out if there's a dropped (or missing) column.

Ah, right. Probably because we need to insist on every column of an
execution-time tupdesc having a valid atttypid ... although I wonder,
is that really necessary?

Yea, the stated reasoning is ExecTypeFromTL():
*
* Exception: if there are any dropped or missing columns, we punt and return
* NIL. Ideally we would like to handle these cases too. However this
* creates problems for ExecTypeFromTL, which may be asked to build a tupdesc
* for a tlist that includes vars of no-longer-existent types. In theory we
* could dig out the required info from the pg_attribute entries of the
* relation, but that data is not readily available to ExecTypeFromTL.
* For now, we don't apply the physical-tlist optimization when there are
* dropped cols.

I think the main problem is that we don't even have a convenient way to
identify that a targetlist expression is actually a dropped column, and
treat that differently. If we were to expand physical tlists to cover
dropped and missing columns, we'd need to be able to add error checks to
at least ExecInitExprRec, and to printtup_prepare_info().

I wonder if we could get away with making build_physical_tlist()
returning a TargetEntry for a Const instead of a Var for the dropped
columns? That'd contain enough information for tuple deforming to work
on higher query levels? Or perhaps we ought to invent a DroppedVar
node, that includes the type information? That'd make it trivial to
error out when such an expression is actually evaluated, and allow to
detect such columns. We already put Const nodes in some places like
that IIRC...

Greetings,

Andres Freund

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Andres Freund (#6)

Re: checking my understanding of TupleDesc

Andres Freund <andres@anarazel.de> writes:

On 2019-11-12 18:20:56 -0500, Tom Lane wrote:

Ah, right. Probably because we need to insist on every column of an
execution-time tupdesc having a valid atttypid ... although I wonder,
is that really necessary?

Yea, the stated reasoning is ExecTypeFromTL():
[ ExecTypeFromTL needs to see subexpressions with valid data types ]

I wonder if we could get away with making build_physical_tlist()
returning a TargetEntry for a Const instead of a Var for the dropped
columns? That'd contain enough information for tuple deforming to work
on higher query levels? Or perhaps we ought to invent a DroppedVar
node, that includes the type information? That'd make it trivial to
error out when such an expression is actually evaluated, and allow to
detect such columns. We already put Const nodes in some places like
that IIRC...

Yeah, a DroppedVar thing might not be a bad idea, it could substitute
for the dummy null constants we currently use. Note that an interesting
property of such a node is that it doesn't actually *have* a type.
A dropped column might be of a type that's been dropped too (and,
if memory serves, we reset the column's atttypid to zero anyway).
What we'd have to do is excavate atttyplen and attalign from the
pg_attribute entry and store those in the DroppedVar node. Then,
anything reconstructing a tupdesc would have to use those fields
and avoid a pg_type lookup.

I'm not sure whether the execution-time behavior of such a node
ought to be "throw error" or just "return NULL". The precedent
of the dummy constants suggests the latter. What would error out
is anything that wants to extract an actual type OID from the
expression.

regards, tom lane