posgres 12 bug (partitioned table)
<div>Hello!</div><div>On postgres 12.3 the problem still exists (<a href="/messages/by-id/16446-2011a4b103fc5fd1@postgresql.org">https://www.postgresql.org/message-id/16446-2011a4b103fc5fd1@postgresql.org</a>):</div><div>%C2%A0</div><div>(Tested on PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit)</div><div> </div><div><div><div>Bug reference: 16446</div><div>Logged by: Георгий Драк</div><div>Email address: sonicgd(at)gmail(dot)com</div><div>PostgreSQL version: 12.2</div><div>Operating system: Debian 10.3</div><div>Description: </div><div> </div><div> </div><div>Hello. I'm catch error "virtual tuple table slot does not have system</div><div>attributes" when inserting row into partitioned table with RETURNING xmin;</div><div> </div><div> </div><div>Reproduction.</div><div> </div><div> </div><div>1. Create schema</div><div>CREATE TABLE "tmp"</div><div>(</div><div> id bigint generated always as identity,</div><div> date timestamptz not null,</div><div> foo int not null,</div><div> PRIMARY KEY ("id", "date")</div><div>)</div><div> PARTITION BY RANGE ("date");</div><div>CREATE TABLE "tmp_2020" PARTITION OF "tmp" FOR VALUES FROM ('2020-01-01') TO</div><div>('2021-01-01');</div><div> </div><div> </div><div>2. Execute query</div><div>INSERT INTO "tmp" ("date", "foo")</div><div>VALUES (NOW(), 1)</div><div>RETURNING id, xmin;</div><div> </div><div> </div><div>3. Result - ERROR: virtual tuple table slot does not have system</div><div>attributes</div><div> </div><div> </div><div>4. Expected result - id and xmin of inserted row.</div><div> </div></div></div>
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin
Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
regards, tom lane
On Fri, 5 Jun 2020 at 04:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
Looks like c2fe139c20 is the breaking commit.
David
<div>Hello!</div><div> </div><div>Is it going to be fixed? This problem stops us from migrating to 12...</div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>05.06.2020, 03:18, "David Rowley" <dgrowleyml@gmail.com>:</div><blockquote><p>On Fri, 5 Jun 2020 at 04:57, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:</p><blockquote> Reproduced here. The example works in v11, and in v10 if you remove<br /> the unnecessary-to-the-example primary key, so it seems like a clear<br /> regression. I didn't dig for the cause but I suppose it's related<br /> to Andres' slot-related changes.</blockquote><p><br />Looks like c2fe139c20 is the breaking commit.<br /><br />David</p></blockquote>
Hello,
ccing pgsql-hackers@postgresql.org
Upon investigation, it seems that the problem is caused by the
following:
The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))
I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())
The first patch [1]v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).
The second patch [2]v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values. It
does make the assumption that the AM would supply a slot that will have
these system columns.
[1]: v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2]: v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch
Regards,
Soumyadeep (VMware)
Attachments:
v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchapplication/x-patch; name=v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchDownload+43-2
v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchapplication/octet-stream; name=v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchDownload+40-2
Hi Soumyadeep,
Thanks for picking this up.
On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
Upon investigation, it seems that the problem is caused by the
following:The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.
Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table. Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():
mtstate->mt_scans[i] =
ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),
table_slot_callbacks(resultRelInfo->ri_RelationDesc));
Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation. For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:
if (relation->rd_tableam)
tts_cb = relation->rd_tableam->slot_callbacks(relation);
else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
/*
* Historically FDWs expect to store heap tuples in slots. Continue
* handing them one, to make it less painful to adapt FDWs to new
* versions. The cost of a heap slot over a virtual slot is pretty
* small.
*/
tts_cb = &TTSOpsHeapTuple;
}
else
{
/*
* These need to be supported, as some parts of the code (like COPY)
* need to create slots for such relations too. It seems better to
* centralize the knowledge that a heap slot is the right thing in
* that case here.
*/
Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
tts_cb = &TTSOpsVirtual;
}
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.
I have attached two alternate patches to solve the problem.
IMHO, they are solving the problem at the wrong place. We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with. We could
do what I suggest above or maybe find some other way.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Attachments:
partitioned-table-heap-slot.patchapplication/octet-stream; name=partitioned-table-heap-slot.patchDownload+3-3
Hi Amit,
Thanks for your reply!
On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Soumyadeep,
Thanks for picking this up.
On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:Upon investigation, it seems that the problem is caused by the
following:The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table.
Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.
This is what I had thought of initially but I had taken a step back for 2
reasons:
1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1]https://github.com/greenplum-db/postgres/tree/zedstore, a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.
2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case. However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.
All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.
[1]: https://github.com/greenplum-db/postgres/tree/zedstore
Regards,
Soumyadeep
Hi Soumyadeep,
On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.This is what I had thought of initially but I had taken a step back for 2
reasons:1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Hey Amit,
On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.
Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.
Regards,
Soumyadeep (VMware)
On Thu, Jul 9, 2020 at 1:53 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.
Well, I was hoping that table_tuple_insert() would fill that info, but
you did say upthread that table AMs are not exactly expected to do so,
so maybe you have a point.
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Hey Amit,
On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?
We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"
A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.
For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).
The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.
I really wonder what other AMs are doing about these issues.
I think we should either:
1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.
2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.
3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.
For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.
Regards,
Soumyadeep (VMware)
Hi Soumyadeep,
On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
Hey Amit,
On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.I really wonder what other AMs are doing about these issues.
I think we should either:
1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.
So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns. We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots. I suppose there's a reason why we allow AMs to
provide their own slot callbacks.
Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.
3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls,
As long as the AM's slot_getsysattr() callback returns the correct
value, this works.
as well as demand a heap slot for partition roots
and interior nodes.
It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Reading my own words, I think I must fix an ambiguity:
On Fri, Jul 10, 2020 at 3:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.
The "maybe as an optimization" refers to the part of the sentence that
comes before it. That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xminReproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin
I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.
Well, if we want to allow such scenarios then we need to forbid queries
from accessing "system columns" of a partitioned table, much as we do for
views. Failing way down inside the executor seems quite unacceptable.
regards, tom lane
<div>Hello all!</div><div> </div><div>I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.</div><div>This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.</div><div> </div><div><a href="https://www.npgsql.org/efcore/index.html">https://www.npgsql.org/efcore/index.html</a></div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 18:34, "Tom Lane" <tgl@sss.pgh.pa.us>:</div><blockquote><p>Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> writes:</p><blockquote> On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<blockquote> Pavel Biryukov <<a href="mailto:79166341370@yandex.ru">79166341370@yandex.ru</a>> writes:<blockquote> Hello. I'm catch error "virtual tuple table slot does not have system<br /> attributes" when inserting row into partitioned table with RETURNING<br /> xmin</blockquote></blockquote></blockquote><p> </p><blockquote> I wonder whether it's really correct to classify this as a bug. The<br /> subsequent discussion essentially boils down to this: the partitioned<br /> table's children could use any AM, and they might not all use the same<br /> AM. The system columns that are relevant for the heap may therefore be<br /> relevant to all, some, or none of the children. In fact, any fixed<br /> kind of tuple table slot we might choose to use for the parent has<br /> this problem. If all of the children are of the same type -- and today<br /> that would have to be heap -- then using that type of tuple table slot<br /> for the parent as well would make sense. But there's no real reason<br /> why that's the correct answer in general. If the children are all of<br /> some other type, using a heap slot for the parent is wrong; and if<br /> they're all of different types, it's unclear that anything other than<br /> a virtual slot makes any sense.</blockquote><p><br />Well, if we want to allow such scenarios then we need to forbid queries<br />from accessing "system columns" of a partitioned table, much as we do for<br />views. Failing way down inside the executor seems quite unacceptable.<br /><br /> regards, tom lane</p></blockquote>
On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.
That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-08-11 19:02:57 +0300, Pavel Biryukov wrote:
I just want to point that Npgsql provider for .Net Core builds queries like
that (RETURNING�xmin) to keep track for concurrency.
Could you provide a bit more details about what that's actually used
for?
Regards,
Andres
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.
That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.
For xmin in particular, you don't have to assume "everything's a heap".
What you have to assume is "everything uses MVCC", which seems a more
defensible position. It'll still fall down for foreign tables that are
partitions, though.
I echo Andres' nearby question about exactly why npgsql has such a
hard dependency on xmin. Maybe what we need is to try to abstract
that a little, and see if we could require all partition members
to support some unified concept of it.
regards, tom lane
Hi,
On 2020-06-04 12:57:30 -0400, Tom Lane wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xminReproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
The reason we're getting the failure is that nodeModifyTable.c only is
dealing with virtual tuple slots, which don't carry visibility
information. Despite actually having infrastructure for creating a
partition specific slot. If I force check_attrmap_match() to return
false, the example starts working.
I don't really know how to best fix this in the partitioning
infrastructure. Currently the determination whether there needs to be
any conversion between subplan slot and the slot used for insertion is
solely based on comparing tuple descriptors. But for better or worse, we
don't represent system column accesses in tuple descriptors.
It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c). See attached.
But I'm worried that that's not enough: What if somebody in a trigger
wants to access system columns besides tableoid and tid (which are
handled in a generic manner)? Currently - only for partitioned table DML
going through the root table - we'll not have valid values for the
trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...
I suspect we should just unconditionally use
partrouteinfo->pi_PartitionTupleSlot. Going through
partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
otherwise.
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column. There's no meaningful way for some AMs to have xmin / xmax
in a compatible way with heap.
Greetings,
Andres Freund