posgres 12 bug (partitioned table)

Started by Pavel Biryukovalmost 6 years ago60 messageshackersbugs
Jump to latest
#1Pavel Biryukov
79166341370@yandex.ru
hackersbugs

<div>Hello!</div><div>On postgres 12.3 the problem still exists (<a href="/messages/by-id/16446-2011a4b103fc5fd1@postgresql.org&quot;&gt;https://www.postgresql.org/message-id/16446-2011a4b103fc5fd1@postgresql.org&lt;/a&gt;):&lt;/div&gt;&lt;div&gt;%C2%A0&lt;/div&gt;&lt;div&gt;(Tested on PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit)</div><div> </div><div><div><div>Bug reference:      16446</div><div>Logged by:          Георгий Драк</div><div>Email address:      sonicgd(at)gmail(dot)com</div><div>PostgreSQL version: 12.2</div><div>Operating system:   Debian 10.3</div><div>Description:        </div><div> </div><div> </div><div>Hello. I'm catch error "virtual tuple table slot does not have system</div><div>attributes" when inserting row into partitioned table with RETURNING xmin;</div><div> </div><div> </div><div>Reproduction.</div><div> </div><div> </div><div>1. Create schema</div><div>CREATE TABLE "tmp"</div><div>(</div><div>    id   bigint generated always as identity,</div><div>    date timestamptz not null,</div><div>    foo  int         not null,</div><div>    PRIMARY KEY ("id", "date")</div><div>)</div><div>    PARTITION BY RANGE ("date");</div><div>CREATE TABLE "tmp_2020" PARTITION OF "tmp" FOR VALUES FROM ('2020-01-01') TO</div><div>('2021-01-01');</div><div> </div><div> </div><div>2. Execute query</div><div>INSERT INTO "tmp" ("date", "foo")</div><div>VALUES (NOW(), 1)</div><div>RETURNING id, xmin;</div><div> </div><div> </div><div>3. Result - ERROR: virtual tuple table slot does not have system</div><div>attributes</div><div> </div><div> </div><div>4. Expected result - id and xmin of inserted row.</div><div> </div></div></div>

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Biryukov (#1)
hackersbugs
Re: posgres 12 bug (partitioned table)

Pavel Biryukov <79166341370@yandex.ru> writes:

Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin

Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

regards, tom lane

#3David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#2)
hackersbugs
Re: posgres 12 bug (partitioned table)

On Fri, 5 Jun 2020 at 04:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

Looks like c2fe139c20 is the breaking commit.

David

#4Pavel Biryukov
79166341370@yandex.ru
In reply to: David Rowley (#3)
hackersbugs
Re: posgres 12 bug (partitioned table)

<div>Hello!</div><div> </div><div>Is it going to be fixed? This problem stops us from migrating to 12...</div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>05.06.2020, 03:18, "David Rowley" &lt;dgrowleyml@gmail.com&gt;:</div><blockquote><p>On Fri, 5 Jun 2020 at 04:57, Tom Lane &lt;<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>&gt; wrote:</p><blockquote> Reproduced here. The example works in v11, and in v10 if you remove<br /> the unnecessary-to-the-example primary key, so it seems like a clear<br /> regression. I didn't dig for the cause but I suppose it's related<br /> to Andres' slot-related changes.</blockquote><p><br />Looks like c2fe139c20 is the breaking commit.<br /><br />David</p></blockquote>

#5Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Pavel Biryukov (#4)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hello,

ccing pgsql-hackers@postgresql.org

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))

I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())

The first patch [1]v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).

The second patch [2]v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values. It
does make the assumption that the AM would supply a slot that will have
these system columns.

[1]: v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2]: v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch

Regards,
Soumyadeep (VMware)

Attachments:

v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchapplication/x-patch; name=v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchDownload+43-2
v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchapplication/octet-stream; name=v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchDownload+40-2
#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Soumyadeep Chakraborty (#5)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table. Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():

mtstate->mt_scans[i] =
ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),

table_slot_callbacks(resultRelInfo->ri_RelationDesc));

Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation. For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:

if (relation->rd_tableam)
tts_cb = relation->rd_tableam->slot_callbacks(relation);
else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
/*
* Historically FDWs expect to store heap tuples in slots. Continue
* handing them one, to make it less painful to adapt FDWs to new
* versions. The cost of a heap slot over a virtual slot is pretty
* small.
*/
tts_cb = &TTSOpsHeapTuple;
}
else
{
/*
* These need to be supported, as some parts of the code (like COPY)
* need to create slots for such relations too. It seems better to
* centralize the knowledge that a heap slot is the right thing in
* that case here.
*/
Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
tts_cb = &TTSOpsVirtual;
}

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

I have attached two alternate patches to solve the problem.

IMHO, they are solving the problem at the wrong place. We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with. We could
do what I suggest above or maybe find some other way.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachments:

partitioned-table-heap-slot.patchapplication/octet-stream; name=partitioned-table-heap-slot.patchDownload+3-3
#7Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Amit Langote (#6)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi Amit,

Thanks for your reply!

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:

Hi Soumyadeep,

Thanks for picking this up.

On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:

Upon investigation, it seems that the problem is caused by the
following:

The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.

Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table.

Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1]https://github.com/greenplum-db/postgres/tree/zedstore, a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case. However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.

All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.

[1]: https://github.com/greenplum-db/postgres/tree/zedstore

Regards,
Soumyadeep

#8Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Soumyadeep Chakraborty (#7)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi Soumyadeep,

On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:

On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:

If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.

This is what I had thought of initially but I had taken a step back for 2
reasons:

1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.

Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#9Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Amit Langote (#8)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hey Amit,

On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:

Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.

Regards,
Soumyadeep (VMware)

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Soumyadeep Chakraborty (#9)
hackersbugs
Re: posgres 12 bug (partitioned table)

On Thu, Jul 9, 2020 at 1:53 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:

On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:

Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.

Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.

Well, I was hoping that table_tuple_insert() would fill that info, but
you did say upthread that table AMs are not exactly expected to do so,
so maybe you have a point.

By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#11Soumyadeep Chakraborty
soumyadeep2007@gmail.com
In reply to: Amit Langote (#10)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hey Amit,

On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:

By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?

We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"

A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.

For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).

The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.

I really wonder what other AMs are doing about these issues.

I think we should either:

1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.

2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.

3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.

For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.

Regards,
Soumyadeep (VMware)

#12Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Soumyadeep Chakraborty (#11)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi Soumyadeep,

On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:

Hey Amit,

On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:

By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?

We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"

A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.

For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).

The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.

I really wonder what other AMs are doing about these issues.

I think we should either:

1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.

2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns. We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots. I suppose there's a reason why we allow AMs to
provide their own slot callbacks.

Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.

3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.

For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls,

As long as the AM's slot_getsysattr() callback returns the correct
value, this works.

as well as demand a heap slot for partition roots
and interior nodes.

It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#13Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#12)
hackersbugs
Re: posgres 12 bug (partitioned table)

Reading my own words, I think I must fix an ambiguity:

On Fri, Jul 10, 2020 at 3:23 PM Amit Langote <amitlangote09@gmail.com> wrote:

So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.

The "maybe as an optimization" refers to the part of the sentence that
comes before it. That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
hackersbugs
Re: posgres 12 bug (partitioned table)

On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Biryukov <79166341370@yandex.ru> writes:

Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin

Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
hackersbugs
Re: posgres 12 bug (partitioned table)

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavel Biryukov <79166341370@yandex.ru> writes:

Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin

I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.

Well, if we want to allow such scenarios then we need to forbid queries
from accessing "system columns" of a partitioned table, much as we do for
views. Failing way down inside the executor seems quite unacceptable.

regards, tom lane

#16Pavel Biryukov
79166341370@yandex.ru
In reply to: Tom Lane (#15)
hackersbugs
Re: posgres 12 bug (partitioned table)

<div>Hello all!</div><div> </div><div>I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.</div><div>This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.</div><div> </div><div><a href="https://www.npgsql.org/efcore/index.html&quot;&gt;https://www.npgsql.org/efcore/index.html&lt;/a&gt;&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;-- &lt;br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 18:34, "Tom Lane" &lt;tgl@sss.pgh.pa.us&gt;:</div><blockquote><p>Robert Haas &lt;<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>&gt; writes:</p><blockquote> On Thu, Jun 4, 2020 at 12:57 PM Tom Lane &lt;<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>&gt; wrote:<blockquote> Pavel Biryukov &lt;<a href="mailto:79166341370@yandex.ru">79166341370@yandex.ru</a>&gt; writes:<blockquote> Hello. I'm catch error "virtual tuple table slot does not have system<br /> attributes" when inserting row into partitioned table with RETURNING<br /> xmin</blockquote></blockquote></blockquote><p> </p><blockquote> I wonder whether it's really correct to classify this as a bug. The<br /> subsequent discussion essentially boils down to this: the partitioned<br /> table's children could use any AM, and they might not all use the same<br /> AM. The system columns that are relevant for the heap may therefore be<br /> relevant to all, some, or none of the children. In fact, any fixed<br /> kind of tuple table slot we might choose to use for the parent has<br /> this problem. If all of the children are of the same type -- and today<br /> that would have to be heap -- then using that type of tuple table slot<br /> for the parent as well would make sense. But there's no real reason<br /> why that's the correct answer in general. If the children are all of<br /> some other type, using a heap slot for the parent is wrong; and if<br /> they're all of different types, it's unclear that anything other than<br /> a virtual slot makes any sense.</blockquote><p><br />Well, if we want to allow such scenarios then we need to forbid queries<br />from accessing "system columns" of a partitioned table, much as we do for<br />views. Failing way down inside the executor seems quite unacceptable.<br /><br />                        regards, tom lane</p></blockquote>

#17Robert Haas
robertmhaas@gmail.com
In reply to: Pavel Biryukov (#16)
hackersbugs
Re: posgres 12 bug (partitioned table)

On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:

I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.

That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Andres Freund
andres@anarazel.de
In reply to: Pavel Biryukov (#16)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi,

On 2020-08-11 19:02:57 +0300, Pavel Biryukov wrote:

I just want to point that Npgsql provider for .Net Core builds queries like
that (RETURNING�xmin) to keep track for concurrency.

Could you provide a bit more details about what that's actually used
for?

Regards,

Andres

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
hackersbugs
Re: posgres 12 bug (partitioned table)

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:

I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.

That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.

For xmin in particular, you don't have to assume "everything's a heap".
What you have to assume is "everything uses MVCC", which seems a more
defensible position. It'll still fall down for foreign tables that are
partitions, though.

I echo Andres' nearby question about exactly why npgsql has such a
hard dependency on xmin. Maybe what we need is to try to abstract
that a little, and see if we could require all partition members
to support some unified concept of it.

regards, tom lane

#20Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
hackersbugs
Re: posgres 12 bug (partitioned table)

Hi,

On 2020-06-04 12:57:30 -0400, Tom Lane wrote:

Pavel Biryukov <79166341370@yandex.ru> writes:

Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin

Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.

The reason we're getting the failure is that nodeModifyTable.c only is
dealing with virtual tuple slots, which don't carry visibility
information. Despite actually having infrastructure for creating a
partition specific slot. If I force check_attrmap_match() to return
false, the example starts working.

I don't really know how to best fix this in the partitioning
infrastructure. Currently the determination whether there needs to be
any conversion between subplan slot and the slot used for insertion is
solely based on comparing tuple descriptors. But for better or worse, we
don't represent system column accesses in tuple descriptors.

It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c). See attached.

But I'm worried that that's not enough: What if somebody in a trigger
wants to access system columns besides tableoid and tid (which are
handled in a generic manner)? Currently - only for partitioned table DML
going through the root table - we'll not have valid values for the
trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...

I suspect we should just unconditionally use
partrouteinfo->pi_PartitionTupleSlot. Going through
partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
otherwise.

Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column. There's no meaningful way for some AMs to have xmin / xmax
in a compatible way with heap.

Greetings,

Andres Freund

#21Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#20)
hackersbugs
#22Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
hackersbugs
#23Pavel Biryukov
79166341370@yandex.ru
In reply to: Andres Freund (#18)
hackersbugs
#24Andres Freund
andres@anarazel.de
In reply to: Pavel Biryukov (#23)
hackersbugs
#25Pavel Biryukov
79166341370@yandex.ru
In reply to: Andres Freund (#24)
hackersbugs
#26Pavel Biryukov
79166341370@yandex.ru
In reply to: Andres Freund (#24)
hackersbugs
#27Andres Freund
andres@anarazel.de
In reply to: Pavel Biryukov (#25)
hackersbugs
#28Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#20)
hackersbugs
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#28)
hackersbugs
#30Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#29)
hackersbugs
#31Pavel Biryukov
79166341370@yandex.ru
In reply to: Andres Freund (#24)
hackersbugs
#32Wilm Hoyer
W.Hoyer@dental-vision.de
In reply to: Pavel Biryukov (#31)
hackersbugs
#33Pavel Biryukov
79166341370@yandex.ru
In reply to: Wilm Hoyer (#32)
hackersbugs
#34Wilm Hoyer
W.Hoyer@dental-vision.de
In reply to: Pavel Biryukov (#33)
hackersbugs
#35Pavel Biryukov
79166341370@yandex.ru
In reply to: Wilm Hoyer (#34)
hackersbugs
#36Andres Freund
andres@anarazel.de
In reply to: Amit Langote (#30)
hackersbugs
#37Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Andres Freund (#36)
hackersbugs
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#37)
hackersbugs
#39Pavel Biryukov
79166341370@yandex.ru
In reply to: Amit Langote (#37)
hackersbugs
#40Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Pavel Biryukov (#39)
hackersbugs
#41Noname
kubilay.dag@post.ch
In reply to: Wilm Hoyer (#32)
hackersbugs
#42David Steele
david@pgmasters.net
In reply to: Amit Langote (#40)
hackersbugs
#43Pavel Biryukov
79166341370@yandex.ru
In reply to: David Steele (#42)
hackersbugs
#44Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Pavel Biryukov (#1)
hackersbugs
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#44)
hackersbugs
#46Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#45)
hackersbugs
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#46)
hackersbugs
#48Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#47)
hackersbugs
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#48)
hackersbugs
#50Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#49)
hackersbugs
#51Pavel Biryukov
79166341370@yandex.ru
In reply to: Andres Freund (#50)
hackersbugs
#52Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#45)
hackersbugs
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Langote (#52)
hackersbugs
#54Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#53)
hackersbugs
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#54)
hackersbugs
#56Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#55)
hackersbugs
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#56)
hackersbugs
#58Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#57)
hackersbugs
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#58)
hackersbugs
#60Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Tom Lane (#53)
hackersbugs