posgres 12 bug (partitioned table)
<div>Hello!</div><div>On postgres 12.3 the problem still exists (<a href="/messages/by-id/16446-2011a4b103fc5fd1@postgresql.org">https://www.postgresql.org/message-id/16446-2011a4b103fc5fd1@postgresql.org</a>):</div><div>%C2%A0</div><div>(Tested on PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit)</div><div> </div><div><div><div>Bug reference: 16446</div><div>Logged by: Георгий Драк</div><div>Email address: sonicgd(at)gmail(dot)com</div><div>PostgreSQL version: 12.2</div><div>Operating system: Debian 10.3</div><div>Description: </div><div> </div><div> </div><div>Hello. I'm catch error "virtual tuple table slot does not have system</div><div>attributes" when inserting row into partitioned table with RETURNING xmin;</div><div> </div><div> </div><div>Reproduction.</div><div> </div><div> </div><div>1. Create schema</div><div>CREATE TABLE "tmp"</div><div>(</div><div> id bigint generated always as identity,</div><div> date timestamptz not null,</div><div> foo int not null,</div><div> PRIMARY KEY ("id", "date")</div><div>)</div><div> PARTITION BY RANGE ("date");</div><div>CREATE TABLE "tmp_2020" PARTITION OF "tmp" FOR VALUES FROM ('2020-01-01') TO</div><div>('2021-01-01');</div><div> </div><div> </div><div>2. Execute query</div><div>INSERT INTO "tmp" ("date", "foo")</div><div>VALUES (NOW(), 1)</div><div>RETURNING id, xmin;</div><div> </div><div> </div><div>3. Result - ERROR: virtual tuple table slot does not have system</div><div>attributes</div><div> </div><div> </div><div>4. Expected result - id and xmin of inserted row.</div><div> </div></div></div>
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin
Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
regards, tom lane
On Fri, 5 Jun 2020 at 04:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Reproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
Looks like c2fe139c20 is the breaking commit.
David
<div>Hello!</div><div> </div><div>Is it going to be fixed? This problem stops us from migrating to 12...</div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>05.06.2020, 03:18, "David Rowley" <dgrowleyml@gmail.com>:</div><blockquote><p>On Fri, 5 Jun 2020 at 04:57, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:</p><blockquote> Reproduced here. The example works in v11, and in v10 if you remove<br /> the unnecessary-to-the-example primary key, so it seems like a clear<br /> regression. I didn't dig for the cause but I suppose it's related<br /> to Andres' slot-related changes.</blockquote><p><br />Looks like c2fe139c20 is the breaking commit.<br /><br />David</p></blockquote>
Hello,
ccing pgsql-hackers@postgresql.org
Upon investigation, it seems that the problem is caused by the
following:
The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot. If there are system
columns other than ctid and tableOid referenced in the RETURNING clause
(not only xmin as in the bug report), it will lead to the ERROR as
mentioned in this thread as virtual tuple table slots don't really store
such columns. (ctid and tableOid are present directly in the
TupleTableSlot struct and can be satisfied from there: refer:
slot_getsysattr()))
I have attached two alternate patches to solve the problem. Both patches
use and share a mechanism to detect if there are any such system
columns. This is done inside ExecBuildProjectionInfo() and we store this
info inside the ProjectionInfo struct. Then based on this info, system
columns are populated in a suitable slot, which is then passed on to
ExecProcessReturning(). (If it is deemed that this operation only be
done for RETURNING, we can just as easily do it in the callsite for
ExecBuildProjectionInfo() in ExecInitModifyTable() for RETURNING instead
of doing it inside ExecBuildProjectionInfo())
The first patch [1]v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch explicitly creates a heap tuple table slot, fills in the
system column values as we would do during heap_prepare_insert() and
then passes that slot to ExecProcessReturning(). (We use a heap tuple table
slot as it is guaranteed to support these attributes).
The second patch [2]v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch instead of relying on a heap tuple table slot,
relies on ExecGetReturningSlot() for the right slot and
table_tuple_fetch_row_version() to supply the system column values. It
does make the assumption that the AM would supply a slot that will have
these system columns.
[1]: v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patch
[2]: v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patch
Regards,
Soumyadeep (VMware)
Attachments:
v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchapplication/x-patch; name=v1-0001-Explicitly-supply-system-columns-for-INSERT.RETUR.patchDownload
From baa48a89e571405f4cd802f065d0f82131599f53 Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <soumyadeep2007@gmail.com>
Date: Sun, 5 Jul 2020 10:23:30 -0700
Subject: [PATCH v1 1/1] Explicitly supply system columns for INSERT..RETURNING
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If there are system columns other than ctid and tableOid in the
RETURNING clause for an INSERT, we must ensure that the slot we pass to
ExecProcessReturning() has those columns (ctid and tableOid are present
directly in the TupleTableSlot struct and can be satisifed from there:
refer: slot_getsysattr)).
This is in response to bug #16446 reported by Георгий Драк. In the bug
an INSERT..RETURNING statement fails with:
"ERROR: virtual tuple table slot does not have system
attributes"
This is caused due to the fact that ExecProcessReturning() was fed with
a virtual tuple table slot. Thus the resultant expression evaluation
and by extension, call to ExecEvalSysVar(), blows up with the
aforementioned error.
So, we fix this by:
1. Determining if there are system columns (other than tableOid and
ctid) referenced in ExecBuildProjectionInfo() and we store this info
inside the ProjectionInfo struct.
2. If there are such system columns in RETURNING, we explicitly create a
heap tuple table slot, fill in the system column values as we would do
during heap_prepare_insert() and then pass that slot to
ExecProcessReturning(). We use a heap tuple table slot as it is
guaranteed to support these attributes.
---
src/backend/executor/execExpr.c | 19 ++++++++++++++++++-
src/backend/executor/nodeModifyTable.c | 23 +++++++++++++++++++++++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..8cd966d227 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -360,10 +360,12 @@ ExecBuildProjectionInfo(List *targetList,
ExprState *state;
ExprEvalStep scratch = {0};
ListCell *lc;
+ List *tlist_vars;
projInfo->pi_exprContext = econtext;
/* We embed ExprState into ProjectionInfo instead of doing extra palloc */
projInfo->pi_state.tag = T_ExprState;
+ projInfo->has_non_slot_system_cols = false;
state = &projInfo->pi_state;
state->expr = (Expr *) targetList;
state->parent = parent;
@@ -455,7 +457,6 @@ ExecBuildProjectionInfo(List *targetList,
*/
ExecInitExprRec(tle->expr, state,
&state->resvalue, &state->resnull);
-
/*
* Column might be referenced multiple times in upper nodes, so
* force value to R/O - but only if it could be an expanded datum.
@@ -469,6 +470,22 @@ ExecBuildProjectionInfo(List *targetList,
}
}
+ /* Record system columns that are part of this projection */
+ tlist_vars = pull_var_clause((Node *) targetList,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+ foreach(lc, tlist_vars)
+ {
+ Var *var = (Var *) lfirst(lc);
+ if (var->varattno < 0 && (var->varattno != TableOidAttributeNumber ||
+ var->varattno != SelfItemPointerAttributeNumber))
+ {
+ projInfo->has_non_slot_system_cols = true;
+ break;
+ }
+ }
+
scratch.opcode = EEOP_DONE;
ExprEvalPushStep(state, &scratch);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..956bdf0afd 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,7 +678,30 @@ ExecInsert(ModifyTableState *mtstate,
/* Process RETURNING if present */
if (resultRelInfo->ri_projectReturning)
+ {
+ /*
+ * If the RETURNING list contains system columns other than ctid and
+ * tableOid, we should make sure that the system columns are available
+ * in a slot that supports system columns.
+ */
+ if (TTS_IS_VIRTUAL(slot) && resultRelInfo->ri_projectReturning->has_non_slot_system_cols)
+ {
+ /*
+ * Explicitly supply the system columns that are not ctid and
+ * tableOid. So, supply the system columns as we do when we insert.
+ * See heap_prepare_insert().
+ */
+ TupleTableSlot *returningSlot = ExecInitExtraTupleSlot(estate,
+ RelationGetDescr(resultRelationDesc),
+ &TTSOpsHeapTuple);
+ HeapTuple heapTuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+ HeapTupleHeaderSetXmin(heapTuple->t_data, GetCurrentTransactionId());
+ HeapTupleHeaderSetXmax(heapTuple->t_data, 0);
+ HeapTupleHeaderSetCmin(heapTuple->t_data, estate->es_output_cid);
+ slot = ExecStoreHeapTuple(heapTuple, returningSlot, true);
+ }
result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+ }
return result;
}
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f5dfa32d55..f8047d2089 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -333,6 +333,8 @@ typedef struct ProjectionInfo
ExprState pi_state;
/* expression context in which to evaluate expression */
ExprContext *pi_exprContext;
+ /* projection contains system columns other than ctid and tableOid */
+ bool has_non_slot_system_cols;
} ProjectionInfo;
/* ----------------
--
2.24.1
v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchapplication/octet-stream; name=v1-0001-Use-table_tuple_fetch_row_version-to-supply-INSER.patchDownload
From 067096fa0c47191359911b1c2d441eae6ef99d9d Mon Sep 17 00:00:00 2001
From: soumyadeep2007 <soumyadeep2007@gmail.com>
Date: Sun, 5 Jul 2020 10:23:30 -0700
Subject: [PATCH v1 1/1] Use table_tuple_fetch_row_version() to supply
INSERT..RETURNING system cols
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If there are system columns other than ctid and tableOid in the
RETURNING clause for an INSERT, we must ensure that the slot we pass to
ExecProcessReturning() has those columns (ctid and tableOid are present
directly in the TupleTableSlot struct and can be satisifed from there:
refer: slot_getsysattr)).
This is in response to bug #16446 reported by Георгий Драк. In the bug
an INSERT..RETURNING statement fails with:
"ERROR: virtual tuple table slot does not have system
attributes"
This is caused due to the fact that ExecProcessReturning() was fed with
a virtual tuple table slot. Thus the resultant expression evaluation
and by extension, call to ExecEvalSysVar(), blows up with the
aforementioned error.
So, we fix this by:
1. Determining if there are system columns (other than tableOid and
ctid) referenced in ExecBuildProjectionInfo() and we store this info
inside the ProjectionInfo struct.
2. If there are such system columns in RETURNING, we fetch the tuple
being inserted anew using table_tuple_fetch_row_version() and pass it
the slot we obtain from ExecGetReturningSlot(). This makes
the assumption that such system columns would be present in the
resulting slot. Then we pass this slot to ExecProcessReturning().
---
src/backend/executor/execExpr.c | 19 ++++++++++++++++++-
src/backend/executor/nodeModifyTable.c | 20 ++++++++++++++++++++
src/include/nodes/execnodes.h | 2 ++
3 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c
index 236413f62a..8cd966d227 100644
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -360,10 +360,12 @@ ExecBuildProjectionInfo(List *targetList,
ExprState *state;
ExprEvalStep scratch = {0};
ListCell *lc;
+ List *tlist_vars;
projInfo->pi_exprContext = econtext;
/* We embed ExprState into ProjectionInfo instead of doing extra palloc */
projInfo->pi_state.tag = T_ExprState;
+ projInfo->has_non_slot_system_cols = false;
state = &projInfo->pi_state;
state->expr = (Expr *) targetList;
state->parent = parent;
@@ -455,7 +457,6 @@ ExecBuildProjectionInfo(List *targetList,
*/
ExecInitExprRec(tle->expr, state,
&state->resvalue, &state->resnull);
-
/*
* Column might be referenced multiple times in upper nodes, so
* force value to R/O - but only if it could be an expanded datum.
@@ -469,6 +470,22 @@ ExecBuildProjectionInfo(List *targetList,
}
}
+ /* Record system columns that are part of this projection */
+ tlist_vars = pull_var_clause((Node *) targetList,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_INCLUDE_PLACEHOLDERS);
+ foreach(lc, tlist_vars)
+ {
+ Var *var = (Var *) lfirst(lc);
+ if (var->varattno < 0 && (var->varattno != TableOidAttributeNumber ||
+ var->varattno != SelfItemPointerAttributeNumber))
+ {
+ projInfo->has_non_slot_system_cols = true;
+ break;
+ }
+ }
+
scratch.opcode = EEOP_DONE;
ExprEvalPushStep(state, &scratch);
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c474cc..5af3eeb2e6 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -678,7 +678,27 @@ ExecInsert(ModifyTableState *mtstate,
/* Process RETURNING if present */
if (resultRelInfo->ri_projectReturning)
+ {
+ /*
+ * If the RETURNING list contains system columns other than ctid and
+ * tableOid, we should make sure that the system columns are available
+ * in a slot that supports system columns.
+ */
+ if (TTS_IS_VIRTUAL(slot) && resultRelInfo->ri_projectReturning->has_non_slot_system_cols)
+ {
+ /* Fetch the inserted tuple to ensure that system columns are present. */
+ TupleTableSlot *returningSlot =
+ ExecGetReturningSlot(estate, resultRelInfo);
+ Assert(!TTS_IS_VIRTUAL(returningSlot));
+ if (!table_tuple_fetch_row_version(resultRelationDesc,
+ &slot->tts_tid,
+ SnapshotAny,
+ returningSlot))
+ elog(ERROR, "failed to fetch inserted tuple for INSERT RETURNING");
+ slot = returningSlot;
+ }
result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+ }
return result;
}
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f5dfa32d55..f8047d2089 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -333,6 +333,8 @@ typedef struct ProjectionInfo
ExprState pi_state;
/* expression context in which to evaluate expression */
ExprContext *pi_exprContext;
+ /* projection contains system columns other than ctid and tableOid */
+ bool has_non_slot_system_cols;
} ProjectionInfo;
/* ----------------
--
2.24.1
Hi Soumyadeep,
Thanks for picking this up.
On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
Upon investigation, it seems that the problem is caused by the
following:The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.
Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table. Whether
or not it is virtual depends on the following piece of code in
ExecInitModifyTable():
mtstate->mt_scans[i] =
ExecInitExtraTupleSlot(mtstate->ps.state,
ExecGetResultType(mtstate->mt_plans[i]),
table_slot_callbacks(resultRelInfo->ri_RelationDesc));
Specifically, the call to table_slot_callbacks() above determines what
kind of slot is assigned for a given target relation. For partitioned
tables, it happens to return a virtual slot currently, per this
implementation:
if (relation->rd_tableam)
tts_cb = relation->rd_tableam->slot_callbacks(relation);
else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
{
/*
* Historically FDWs expect to store heap tuples in slots. Continue
* handing them one, to make it less painful to adapt FDWs to new
* versions. The cost of a heap slot over a virtual slot is pretty
* small.
*/
tts_cb = &TTSOpsHeapTuple;
}
else
{
/*
* These need to be supported, as some parts of the code (like COPY)
* need to create slots for such relations too. It seems better to
* centralize the knowledge that a heap slot is the right thing in
* that case here.
*/
Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
tts_cb = &TTSOpsVirtual;
}
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.
I have attached two alternate patches to solve the problem.
IMHO, they are solving the problem at the wrong place. We should
really fix things so that the slot that gets passed down to
ExecProcessReturning() is of the correct type to begin with. We could
do what I suggest above or maybe find some other way.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Attachments:
partitioned-table-heap-slot.patchapplication/octet-stream; name=partitioned-table-heap-slot.patchDownload
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 4b2bb29..7796a0c 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -47,7 +47,8 @@ table_slot_callbacks(Relation relation)
if (relation->rd_tableam)
tts_cb = relation->rd_tableam->slot_callbacks(relation);
- else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ else if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
/*
* Historically FDWs expect to store heap tuples in slots. Continue
@@ -65,8 +66,7 @@ table_slot_callbacks(Relation relation)
* centralize the knowledge that a heap slot is the right thing in
* that case here.
*/
- Assert(relation->rd_rel->relkind == RELKIND_VIEW ||
- relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
+ Assert(relation->rd_rel->relkind == RELKIND_VIEW);
tts_cb = &TTSOpsVirtual;
}
Hi Amit,
Thanks for your reply!
On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
Hi Soumyadeep,
Thanks for picking this up.
On Tue, Jul 7, 2020 at 7:46 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:Upon investigation, it seems that the problem is caused by the
following:The slot passed to the call to ExecProcessReturning() inside
ExecInsert() is often a virtual tuple table slot.Actually, not that often in practice. The slot is not virtual, for
example, when inserting into a regular non-partitioned table.
Indeed! I meant partitioned tables are a common use case. Sorry, I
should have elaborated.
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.
This is what I had thought of initially but I had taken a step back for 2
reasons:
1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1]https://github.com/greenplum-db/postgres/tree/zedstore, a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.
2. This is a secondary aspect but I will mention it here for
completeness. Not knowing enough about this code: will demanding heap
tuples for partitioned tables all throughout the code have a performance
impact? At a first glance it didn't seem to be the case. However, I did
find lots of callsites for partitioning or otherwise where we kind of
expect a virtual tuple table slot (as evidenced with the calls to
ExecStoreVirtualTuple()). With your patch, we seem to be calling
ExecStoreVirtualTuple() on a heap tuple table slot, in various places:
such as inside execute_attr_map_slot(). It seems to be harmless to do so
however, in accordance with my limited investigation.
All in all, I think we have to explicitly supply those system columns. I
heard from Daniel that one of the motivations for having table AMs
was to ensure that transaction meta-data storage is not demanded off any
AM.
[1]: https://github.com/greenplum-db/postgres/tree/zedstore
Regards,
Soumyadeep
Hi Soumyadeep,
On Wed, Jul 8, 2020 at 9:37 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:18 AM Amit Langote <amitlangote09@gmail.com> wrote:
If I change this to return a "heap" slot for partitioned tables, just
like for foreign tables, the problem goes away (see the attached). In
fact, even make check-world passes, so I don't know why it isn't that
way to begin with.This is what I had thought of initially but I had taken a step back for 2
reasons:1. It is not mandatory for an AM to supply a heap tuple in the slot
passed to any AM routine. With your patch, the slot passed to
table_tuple_insert() inside ExecInsert() for instance is now expected to
supply a heap tuple for the subsequent call to ExecProcessReturning().
This can lead to garbage values for xmin, xmax, cmin and cmax. I tried
applying your patch on Zedstore [1], a columnar AM, and consequently, I
got incorrect values for xmin, xmax etc with the query reported in this
issue.
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Hey Amit,
On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.
Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.
Regards,
Soumyadeep (VMware)
On Thu, Jul 9, 2020 at 1:53 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
On Tue, Jul 7, 2020 at 7:17 PM Amit Langote <amitlangote09@gmail.com> wrote:
Ah, I see. You might've noticed that ExecInsert() only ever sees leaf
partitions, because tuple routing would've switched the result
relation to a leaf partition by the time we are in ExecInsert(). So,
table_tuple_insert() always refers to a leaf partition's AM. Not
sure if you've also noticed but each leaf partition gets to own a slot
(PartitionRoutingInfo.pi_PartitionTupleSlot), but currently it is only
used if the leaf partition attribute numbers are not the same as the
root partitioned table. How about we also use it if the leaf
partition AM's table_slot_callbacks() differs from the root
partitioned table's slot's tts_ops? That would be the case, for
example, if the leaf partition is of Zedstore AM. In the more common
cases where all leaf partitions are of heap AM, this would mean the
original slot would be used as is, that is, if we accept hard-coding
table_slot_callbacks() to return a "heap" slot for partitioned tables
as I suggest.Even then, we will still need to fill in the system columns explicitly as
pi_PartitionTupleSlot will not be filled with system columns after
it comes back out of table_tuple_insert() if we have a non-heap AM.
Well, I was hoping that table_tuple_insert() would fill that info, but
you did say upthread that table AMs are not exactly expected to do so,
so maybe you have a point.
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Hey Amit,
On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?
We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"
A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.
For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).
The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.
I really wonder what other AMs are doing about these issues.
I think we should either:
1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.
2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.
3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.
For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls, as well as demand a heap slot for partition roots
and interior nodes. And then later on. we would need a larger effort
making all of these APIs not really demand transactional information.
Perhaps the UNDO framework will come to the rescue.
Regards,
Soumyadeep (VMware)
Hi Soumyadeep,
On Fri, Jul 10, 2020 at 2:56 AM Soumyadeep Chakraborty
<soumyadeep2007@gmail.com> wrote:
Hey Amit,
On Thu, Jul 9, 2020 at 12:16 AM Amit Langote <amitlangote09@gmail.com> wrote:
By the way, what happens today if you do INSERT INTO a_zedstore_table
... RETURNING xmin? Do you get an error "xmin is unrecognized" or
some such in slot_getsysattr() when trying to project the RETURNING
list?We get garbage values for xmin and cmin. If we request cmax/xmax, we get
an ERROR from slot_getsystattr()->tts_zedstore_getsysattr():
"zedstore tuple table slot does not have system attributes (except xmin
and cmin)"A ZedstoreTupleTableSlot only stores xmin and xmax. Also,
zedstoream_insert(), which is the tuple_insert() implementation, does
not supply the xmin/cmin, thus making those values garbage.For context, Zedstore has its own UNDO log implementation to act as
storage for transaction information. (which is intended to be replaced
with the upstream UNDO log in the future).The above behavior is not just restricted to INSERT..RETURNING, right
now. If we do a select <tx_column> from foo in Zedstore, the behavior is
the same. The transaction information is never returned from Zedstore
in tableam calls that don't demand transactional information be
used/returned. If you ask it to do a tuple_satisfies_snapshot(), OTOH,
it will use the transactional information correctly. It will also
populate TM_FailureData, which contains xmax and cmax, in the APIs where
it is demanded.I really wonder what other AMs are doing about these issues.
I think we should either:
1. Demand transactional information off of AMs for all APIs that involve
a projection of transactional information.2. Have some other component of Postgres supply the transactional
information. This is what I think the upstream UNDO log can probably
provide.
So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns. We shouldn't really need any
new core code to get the transaction-related system columns while
there exists a perfectly reasonable channel for it to arrive through
-- TupleTableSlots. I suppose there's a reason why we allow AMs to
provide their own slot callbacks.
Whether an AM uses UNDO log or something else to manage the
transaction info is up to the AM, so I don't see why the AMs
themselves shouldn't be in charge of returning that info, because only
they know where it is.
3. (Least elegant) Transform tuple table slots into heap tuple table
slots (since it is the only kind of tuple storage that can supply
transactional info) and explicitly fill in the transactional values
depending on the context, whenever transactional information is
projected.For this bug report, I am not sure what is right. Perhaps, to stop the
bleeding temporarily, we could use the pi_PartitionTupleSlot and assume
that the AM needs to provide the transactional info in the respective
insert AM API calls,
As long as the AM's slot_getsysattr() callback returns the correct
value, this works.
as well as demand a heap slot for partition roots
and interior nodes.
It would be a compromise on the core's part to use "heap" slots for
partitioned tables, because they don't have a valid table AM.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Reading my own words, I think I must fix an ambiguity:
On Fri, Jul 10, 2020 at 3:23 PM Amit Langote <amitlangote09@gmail.com> wrote:
So even if an AM's table_tuple_insert() itself doesn't populate the
transaction info into the slot handed to it, maybe as an optimization,
it does not sound entirely unreasonable to expect that the AM's
slot_getsysattr() callback returns it correctly when projecting a
target list containing system columns.
The "maybe as an optimization" refers to the part of the sentence that
comes before it. That is, I mean table_tuple_insert() may choose to
not populate the transaction info in the slot as an optimization.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xminReproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xmin
I wonder whether it's really correct to classify this as a bug. The
subsequent discussion essentially boils down to this: the partitioned
table's children could use any AM, and they might not all use the same
AM. The system columns that are relevant for the heap may therefore be
relevant to all, some, or none of the children. In fact, any fixed
kind of tuple table slot we might choose to use for the parent has
this problem. If all of the children are of the same type -- and today
that would have to be heap -- then using that type of tuple table slot
for the parent as well would make sense. But there's no real reason
why that's the correct answer in general. If the children are all of
some other type, using a heap slot for the parent is wrong; and if
they're all of different types, it's unclear that anything other than
a virtual slot makes any sense.
Well, if we want to allow such scenarios then we need to forbid queries
from accessing "system columns" of a partitioned table, much as we do for
views. Failing way down inside the executor seems quite unacceptable.
regards, tom lane
<div>Hello all!</div><div> </div><div>I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.</div><div>This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.</div><div> </div><div><a href="https://www.npgsql.org/efcore/index.html">https://www.npgsql.org/efcore/index.html</a></div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 18:34, "Tom Lane" <tgl@sss.pgh.pa.us>:</div><blockquote><p>Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>> writes:</p><blockquote> On Thu, Jun 4, 2020 at 12:57 PM Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<blockquote> Pavel Biryukov <<a href="mailto:79166341370@yandex.ru">79166341370@yandex.ru</a>> writes:<blockquote> Hello. I'm catch error "virtual tuple table slot does not have system<br /> attributes" when inserting row into partitioned table with RETURNING<br /> xmin</blockquote></blockquote></blockquote><p> </p><blockquote> I wonder whether it's really correct to classify this as a bug. The<br /> subsequent discussion essentially boils down to this: the partitioned<br /> table's children could use any AM, and they might not all use the same<br /> AM. The system columns that are relevant for the heap may therefore be<br /> relevant to all, some, or none of the children. In fact, any fixed<br /> kind of tuple table slot we might choose to use for the parent has<br /> this problem. If all of the children are of the same type -- and today<br /> that would have to be heap -- then using that type of tuple table slot<br /> for the parent as well would make sense. But there's no real reason<br /> why that's the correct answer in general. If the children are all of<br /> some other type, using a heap slot for the parent is wrong; and if<br /> they're all of different types, it's unclear that anything other than<br /> a virtual slot makes any sense.</blockquote><p><br />Well, if we want to allow such scenarios then we need to forbid queries<br />from accessing "system columns" of a partitioned table, much as we do for<br />views. Failing way down inside the executor seems quite unacceptable.<br /><br /> regards, tom lane</p></blockquote>
On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.
That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-08-11 19:02:57 +0300, Pavel Biryukov wrote:
I just want to point that Npgsql provider for .Net Core builds queries like
that (RETURNING�xmin) to keep track for concurrency.
Could you provide a bit more details about what that's actually used
for?
Regards,
Andres
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Aug 11, 2020 at 12:02 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
I just want to point that Npgsql provider for .Net Core builds queries like that (RETURNING xmin) to keep track for concurrency.
This bug stops us from moving to partitioned tables in Postgres 12 with Npgsql.
That's certainly a good reason to try to make it work. And we can make
it work, if we're willing to assume that everything's a heap table.
But at some point, that hopefully won't be true any more, and then
this whole idea becomes pretty dubious. I think we shouldn't wait
until it happens to start thinking about that problem.
For xmin in particular, you don't have to assume "everything's a heap".
What you have to assume is "everything uses MVCC", which seems a more
defensible position. It'll still fall down for foreign tables that are
partitions, though.
I echo Andres' nearby question about exactly why npgsql has such a
hard dependency on xmin. Maybe what we need is to try to abstract
that a little, and see if we could require all partition members
to support some unified concept of it.
regards, tom lane
Hi,
On 2020-06-04 12:57:30 -0400, Tom Lane wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xminReproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.
The reason we're getting the failure is that nodeModifyTable.c only is
dealing with virtual tuple slots, which don't carry visibility
information. Despite actually having infrastructure for creating a
partition specific slot. If I force check_attrmap_match() to return
false, the example starts working.
I don't really know how to best fix this in the partitioning
infrastructure. Currently the determination whether there needs to be
any conversion between subplan slot and the slot used for insertion is
solely based on comparing tuple descriptors. But for better or worse, we
don't represent system column accesses in tuple descriptors.
It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c). See attached.
But I'm worried that that's not enough: What if somebody in a trigger
wants to access system columns besides tableoid and tid (which are
handled in a generic manner)? Currently - only for partitioned table DML
going through the root table - we'll not have valid values for the
trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...
I suspect we should just unconditionally use
partrouteinfo->pi_PartitionTupleSlot. Going through
partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
otherwise.
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column. There's no meaningful way for some AMs to have xmin / xmax
in a compatible way with heap.
Greetings,
Andres Freund
On 2020-08-11 11:02:31 -0700, Andres Freund wrote:
It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c). See attached.
Actually attached this time.
Attachments:
hack.difftext/x-diff; charset=us-asciiDownload
diff --git i/src/backend/executor/execPartition.c w/src/backend/executor/execPartition.c
index 79fcbd6b066..152a80cc6c9 100644
--- i/src/backend/executor/execPartition.c
+++ w/src/backend/executor/execPartition.c
@@ -903,8 +903,14 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
* a slot dedicated to storing this partition's tuples. The slot is used
* for various operations that are applied to tuples after routing, such
* as checking constraints.
+ *
+ * Even if there's no difference between the normal columns, RETURNING
+ * might access system columns. So whenever RETURNING is specified we
+ * force creation of the slot as well - which will cause use of the right
+ * slot type for RETURNING processing. XXX: But what about triggers etc?
*/
- if (partrouteinfo->pi_RootToPartitionMap != NULL)
+ if (partrouteinfo->pi_RootToPartitionMap != NULL ||
+ partRelInfo->ri_returningList)
{
Relation partrel = partRelInfo->ri_RelationDesc;
diff --git i/src/backend/executor/nodeModifyTable.c w/src/backend/executor/nodeModifyTable.c
index 20a4c474cc4..eb318ac3f73 100644
--- i/src/backend/executor/nodeModifyTable.c
+++ w/src/backend/executor/nodeModifyTable.c
@@ -1949,6 +1949,12 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
}
+ else if (partrouteinfo->pi_PartitionTupleSlot)
+ {
+ TupleTableSlot *new_slot = partrouteinfo->pi_PartitionTupleSlot;
+
+ slot = ExecCopySlot(new_slot, slot);
+ }
return slot;
}
Hi,
On 2020-08-11 13:52:00 -0400, Tom Lane wrote:
For xmin in particular, you don't have to assume "everything's a heap".
What you have to assume is "everything uses MVCC", which seems a more
defensible position. It'll still fall down for foreign tables that are
partitions, though.
Don't think that necessarily implies having a compatible xmin / xmax
around. Certainly not a 32bit one. I guess an AM could always return
InvalidOid, but that doesn't seem particularly helpful either.
I think it'd be better if we actually tried to provide a way to do
whatever xmin is being used properly. I've seen many uses of xmin/xmax
and many of them didn't work at all, and most of remaining ones only
worked in common cases.
Greetings,
Andres Freund
<div>Sure,</div><div> </div><div>Entity Framework is an ORM for .Net (and .Net Core). It has different providers for different databases (NpgSql for Postgres). It uses Optimistic concurrency. The common use case is to use xmin as "concurrency token".</div><div> </div><div>In code we make "var e = new Entity();", "dbContext.Add(e)" and "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us, classical ORM;</div><div> </div><div>When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it as concurrency token for further updates (update is made like "where id = [id] AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).</div><div> </div><div><a href="https://www.npgsql.org/efcore/modeling/concurrency.html">https://www.npgsql.org/efcore/modeling/concurrency.html</a></div><div> </div><div> </div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 20:22, "Andres Freund" <andres@anarazel.de>:</div><blockquote><p>Hi,<br /><br />On 2020-08-11 19:02:57 +0300, Pavel Biryukov wrote:</p><blockquote> I just want to point that Npgsql provider for .Net Core builds queries like<br /> that (RETURNING xmin) to keep track for concurrency.</blockquote><p><br />Could you provide a bit more details about what that's actually used<br />for?<br /><br />Regards,<br /><br />Andres</p></blockquote>
Hi,
On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:
Entity Framework is an ORM for .Net (and .Net Core). It has different providers
for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
The common use case is to use xmin as "concurrency token".
�
In code we make "var e = new Entity();", "dbContext.Add(e)" and
"dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
classical ORM;
�
When new row is inserted, EF makes an insert with "RETURNING�xmin" to keep it
as concurrency token for further updates (update is made like "where id = [id]
AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).
That's not really a safe use of xmin, e.g. it could have wrapped around
leading you to not notice a concurrent modification. Nor does it
properly deal with multiple statements within a transaction. Perhaps
those are low enough risk for you, but I don't xmin is a decent building
block for this kind of thing.
Greetings,
Andres Freund
<div> </div><div>I don't see a problem with "wrapping around" - the row's xmin does not change with freeze (AFAIK). It changes when the row is modified.</div><div>So event if you hold some entity (with current xmin) for a long time (enough for "wrap around") and then try to update it, it will update ok.</div><div> </div><div>"Multiple statements" possible problems is managed for us by NpgSQL :)</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 21:39, "Andres Freund" <andres@anarazel.de>:</div><blockquote><p>Hi,<br /><br />On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:</p><blockquote> Entity Framework is an ORM for .Net (and .Net Core). It has different providers<br /> for different databases (NpgSql for Postgres). It uses Optimistic concurrency.<br /> The common use case is to use xmin as "concurrency token".<br /> <br /> In code we make "var e = new Entity();", "dbContext.Add(e)" and<br /> "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,<br /> classical ORM;<br /> <br /> When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it<br /> as concurrency token for further updates (update is made like "where id = [id]<br /> AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).</blockquote><p><br />That's not really a safe use of xmin, e.g. it could have wrapped around<br />leading you to not notice a concurrent modification. Nor does it<br />properly deal with multiple statements within a transaction. Perhaps<br />those are low enough risk for you, but I don't xmin is a decent building<br />block for this kind of thing.<br /><br />Greetings,<br /><br />Andres Freund</p></blockquote>
<div>Oh, I understand! There is a 1/(2^32) chance that after wrapping around and update it has the same xmin but different content and we don't notice it :)</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 21:39, "Andres Freund" <andres@anarazel.de>:</div><blockquote><p>Hi,<br /><br />On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:</p><blockquote> Entity Framework is an ORM for .Net (and .Net Core). It has different providers<br /> for different databases (NpgSql for Postgres). It uses Optimistic concurrency.<br /> The common use case is to use xmin as "concurrency token".<br /> <br /> In code we make "var e = new Entity();", "dbContext.Add(e)" and<br /> "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,<br /> classical ORM;<br /> <br /> When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it<br /> as concurrency token for further updates (update is made like "where id = [id]<br /> AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).</blockquote><p><br />That's not really a safe use of xmin, e.g. it could have wrapped around<br />leading you to not notice a concurrent modification. Nor does it<br />properly deal with multiple statements within a transaction. Perhaps<br />those are low enough risk for you, but I don't xmin is a decent building<br />block for this kind of thing.<br /><br />Greetings,<br /><br />Andres Freund</p></blockquote>
Hi,
On 2020-08-11 21:55:32 +0300, Pavel Biryukov wrote:
I don't see a problem with "wrapping around" - the row's xmin does not change
with freeze (AFAIK). It changes when the row is modified.
So event if you hold some entity (with current xmin) for a long time (enough
for "wrap around") and then try to update it, it will update ok.
The problem isn't that it won't update ok, it is that it might update
despite there being another update since the RETURNING xmin.
s1) BEGIN;INSERT ... RETURN xmin;COMMIT;
s2) BEGIN;UPDATE .. WHERE xmin ...; COMMIT;
s*) WRAPAROUND;
s1) BEGIN;UPDATE .. WHERE xmin ...; COMMIT;
this could lead to s1 not noticing that s2 was updated.
- Andres
Hi,
On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
On 2020-06-04 12:57:30 -0400, Tom Lane wrote:
Pavel Biryukov <79166341370@yandex.ru> writes:
Hello. I'm catch error "virtual tuple table slot does not have system
attributes" when inserting row into partitioned table with RETURNING
xminReproduced here. The example works in v11, and in v10 if you remove
the unnecessary-to-the-example primary key, so it seems like a clear
regression. I didn't dig for the cause but I suppose it's related
to Andres' slot-related changes.The reason we're getting the failure is that nodeModifyTable.c only is
dealing with virtual tuple slots, which don't carry visibility
information. Despite actually having infrastructure for creating a
partition specific slot. If I force check_attrmap_match() to return
false, the example starts working.I don't really know how to best fix this in the partitioning
infrastructure. Currently the determination whether there needs to be
any conversion between subplan slot and the slot used for insertion is
solely based on comparing tuple descriptors. But for better or worse, we
don't represent system column accesses in tuple descriptors.It's not that hard to force the slot creation & use whenever there's
returning, but somehow that feels hackish (but so does plenty other
things in execPartition.c). See attached.But I'm worried that that's not enough: What if somebody in a trigger
wants to access system columns besides tableoid and tid (which are
handled in a generic manner)? Currently - only for partitioned table DML
going through the root table - we'll not have valid values for the
trigger. It's pretty dubious imo to use xmin/xmax in triggers, but ...I suspect we should just unconditionally use
partrouteinfo->pi_PartitionTupleSlot. Going through
partrouteinfo->pi_RootToPartitionMap if present, and ExecCopySlot()
otherwise.
I see that to be the only way forward even though there will be a
slight hit in performance in typical cases where a virtual tuple slot
suffices.
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column.
Performance-wise I would prefer the former, because the latter would
involve checking *all* partitions statically in the INSERT case,
something that we've avoided doing so far.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column.
Performance-wise I would prefer the former, because the latter would
involve checking *all* partitions statically in the INSERT case,
something that we've avoided doing so far.
It's not like we don't have a technology for doing that. The way this
ideally would work, IMV, is that the parent partitioned table either
has or doesn't have a given system column. If it does, then every
child must too, just like the way things work for user columns.
This'd require (a) some sort of consensus about which kinds of system
columns can make sense --- as Andres noted, 32-bit xmin might not be
the best choice here --- and (b) some notation for users to declare
which of these columns they want in a partitioned table. Once upon
a time we had WITH OIDS, maybe that idea could be extended.
I'm not entirely sure that this is worth all the trouble, but that's
how I'd sketch doing it.
regards, tom lane
On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column.Performance-wise I would prefer the former, because the latter would
involve checking *all* partitions statically in the INSERT case,
something that we've avoided doing so far.It's not like we don't have a technology for doing that. The way this
ideally would work, IMV, is that the parent partitioned table either
has or doesn't have a given system column. If it does, then every
child must too, just like the way things work for user columns.
Ah, I may have misread "insisting that all partitions have a given
system column" as doing that on every query, but maybe Andres meant
what you are describing here.
This'd require (a) some sort of consensus about which kinds of system
columns can make sense --- as Andres noted, 32-bit xmin might not be
the best choice here --- and (b) some notation for users to declare
which of these columns they want in a partitioned table. Once upon
a time we had WITH OIDS, maybe that idea could be extended.
For (a), isn't there already a consensus that all table AMs support at
least the set of system columns described in 5.5 System Columns [1]https://www.postgresql.org/docs/current/ddl-system-columns.html
even if the individual members of that set are no longer the best
choice at this point? I do agree that we'd need (b) in some form to
require AMs to fill those columns which it seems is not the case
currently.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
[1]: https://www.postgresql.org/docs/current/ddl-system-columns.html
<div> </div><div>With "wrap around" update when the same xmin value is assigned to the row (leading to concurrency detection problem) the solution may be to make sure in Postgres that the new xmin value is different from previous (freezed row with xmin before "wrap around")</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>11.08.2020, 21:39, "Andres Freund" <andres@anarazel.de>:</div><blockquote><p>Hi,<br /><br />On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:</p><blockquote> Entity Framework is an ORM for .Net (and .Net Core). It has different providers<br /> for different databases (NpgSql for Postgres). It uses Optimistic concurrency.<br /> The common use case is to use xmin as "concurrency token".<br /> <br /> In code we make "var e = new Entity();", "dbContext.Add(e)" and<br /> "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,<br /> classical ORM;<br /> <br /> When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it<br /> as concurrency token for further updates (update is made like "where id = [id]<br /> AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).</blockquote><p><br />That's not really a safe use of xmin, e.g. it could have wrapped around<br />leading you to not notice a concurrent modification. Nor does it<br />properly deal with multiple statements within a transaction. Perhaps<br />those are low enough risk for you, but I don't xmin is a decent building<br />block for this kind of thing.<br /><br />Greetings,<br /><br />Andres Freund</p></blockquote>
Hi,
while the request for returning xmin of partitioned tables is still valid, i’d like to add some information and a possible workaround.
On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:
Entity Framework is an ORM for .Net (and .Net Core). It has different providers
for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
The common use case is to use xmin as "concurrency token".
In code we make "var e = new Entity();", "dbContext.Add(e)" and
"dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
classical ORM;
When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it
as concurrency token for further updates (update is made like "where id = [id]
AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).
Neither the Entity Framework, nor npgsql rely on the column xmin. Both don’t know about this column in their codebase.
In the case oft he EF i’m sure that this holds true for all versions, since it is designed as DBMS independant, and as such will never know anything about a PostgreSQL specific column.
Also you can use any ADO.Net provider to connect to a concrete DBMS – i for example use dotConnect for PostgreSQL because it provided more features and less bugs as Npgsql at the time of decission.
As for Npgsql i have only checked that the current HEAD has no reference to xmin in its source code.
With that in mind, i assume the OP included the column xmin in his Entity Model by himself and set the ConcurrencyMode to fixed for that column.
As xmin is a system column that the EF should never try to update (PostgreSQL will reject this attempt, i think), i’d suggest using a self defined column (row_version for example) and either use triggers on update and insert to increment its value (works even with updates outside of EF) or let the EF do the increment.
Regards
Wilm Hoyer.
<div>Wilm,</div><div> </div><div>Have a look, there is even an extension method for xmin (stable branch):</div><div> </div><div><a href="https://github.com/npgsql/efcore.pg/blob/stable/src/EFCore.PG/Extensions/NpgsqlEntityTypeBuilderExtensions.cs">https://github.com/npgsql/efcore.pg/blob/stable/src/EFCore.PG/Extensions/NpgsqlEntityTypeBuilderExtensions.cs</a></div><div> </div><div>EF Core is mostly independent from DB, but there usually are some "tweaks" for each database that you should know (like for SQL server the native concurrency token should be like byte[]:<ol style="box-sizing:border-box;color:rgb( 51 , 51 , 51 );font-family:'menlo' , 'monaco' , 'consolas' , 'courier new' , monospace;font-size:13px;font-style:normal;font-weight:400;margin-bottom:0px;margin-top:0px;padding-left:0px;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:pre;word-spacing:0px"><li style="box-sizing:border-box;list-style-type:none;margin:0px"><code style="background:rgb( 249 , 249 , 249 );border-radius:0px;box-sizing:border-box;color:inherit;font-family:'menlo' , 'monaco' , 'consolas' , 'courier new' , monospace;font-size:inherit;padding:0px;white-space:pre-wrap"><span style="box-sizing:border-box;color:#666600">[</span><span style="box-sizing:border-box;color:#555555">TimeStamp</span><span style="box-sizing:border-box;color:#666600">]</span></code></li><li style="background:rgb( 249 , 249 , 249 );box-sizing:border-box;list-style-type:none;margin:0px"><code style="background:rgb( 249 , 249 , 249 );border-radius:0px;box-sizing:border-box;color:inherit;font-family:'menlo' , 'monaco' , 'consolas' , 'courier new' , monospace;font-size:inherit;padding:0px;white-space:pre-wrap"><span style="box-sizing:border-box;color:#0000ff">public</span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#0000ff">byte</span><span style="box-sizing:border-box;color:#666600">[]</span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#555555">RowVersion</span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#666600">{<!-- --></span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#0000ff">get</span><span style="box-sizing:border-box;color:#666600">;</span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#0000ff">set</span><span style="box-sizing:border-box;color:#666600">;</span><span style="box-sizing:border-box;color:#000000"> </span><span style="box-sizing:border-box;color:#666600">}</span></code></li><li style="background:rgb( 249 , 249 , 249 );box-sizing:border-box;list-style-type:none;margin:0px"><code style="background:rgb( 249 , 249 , 249 );border-radius:0px;box-sizing:border-box;color:inherit;font-family:'menlo' , 'monaco' , 'consolas' , 'courier new' , monospace;font-size:inherit;padding:0px;white-space:pre-wrap"><span style="box-sizing:border-box;color:#666600">)</span></code></li></ol></div><div> </div><div>Additional "self defined column" for billion rows tables leads to additional space needed. We use partitioning when the tables are LARGE :)</div><div> </div><div>This works fine in 10, 11, it's strange it is broken in 12 (from db users point of view, I haven't examined the sources of PG for internals...)</div><div> </div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>12.08.2020, 12:35, "Wilm Hoyer" <w.hoyer@dental-vision.de>:</div><blockquote><div lang="DE"><div><blockquote style="margin-bottom:5pt;margin-top:5pt"><p>Hi,</p><p>while the request for returning xmin of partitioned tables is still valid, i’d like to add some information and a possible workaround.</p><p><br />On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:</p><blockquote style="margin-bottom:5pt;margin-top:5pt"><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> Entity Framework is an ORM for .Net (and .Net Core). It has different providers<br /> for different databases (NpgSql for Postgres). It uses Optimistic concurrency.<br /> The common use case is to use xmin as "concurrency token".<br /> <br /> In code we make "var e = new Entity();", "dbContext.Add(e)" and<br /> "dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,<br /> classical ORM;<br /> <br /> When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it<br /> as concurrency token for further updates (update is made like "where id = [id]<br /> AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Neither the Entity Framework, nor npgsql rely on the column xmin. Both don’t know about this column in their codebase.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">In the case oft he EF i’m sure that this holds true for all versions, since it is designed as DBMS independant, and as such will never know anything about a PostgreSQL specific column.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Also you can use any ADO.Net provider to connect to a concrete DBMS – i for example use dotConnect for PostgreSQL because it provided more features and less bugs as Npgsql at the time of decission.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">As for Npgsql i have only checked that the current HEAD has no reference to xmin in its source code.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">With that in mind, i assume the OP included the column xmin in his Entity Model by himself and set the ConcurrencyMode to fixed for that column.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">As xmin is a system column that the EF should never try to update (PostgreSQL will reject this attempt, i think), i’d suggest using a self defined column (row_version for example) and either use triggers on update and insert to increment its value (works even with updates outside of EF) or let the EF do the increment.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Regards</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Wilm Hoyer.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></blockquote></blockquote></div></div></blockquote>
Wilm,
Have a look, there is even an extension method for xmin (stable branch):
EF Core is mostly independent from DB, but there usually are some "tweaks" for each database that you should know (like for SQL server the native concurrency token should be like byte[]:
1. [TimeStamp]
2. public byte[] RowVersion { get; set; }
3. )
Additional "self defined column" for billion rows tables leads to additional space needed. We use partitioning when the tables are LARGE :)
This works fine in 10, 11, it's strange it is broken in 12 (from db users point of view, I haven't examined the sources of PG for internals...)
Sorry, i had not looked into the extension since it is not part of core npgsql. As it states: Npgsql.EntityFrameworkCore.PostgreSQL is an Entity Framework Core provider built on top of Npgsql<https://github.com/npgsql/npgsql>. (and we left npgsql before this extension was started).
I guess it would be easy for them to implement a solution, because that basically means adopting the relevant part from the EF SqlServer implementation. But i fear you’re out of luck with them, as support for table partitioning is still an open issue in their project.
Best regards
Wilm.
<div>Wilm,</div><div> </div><div>For different developers to begin some "adoptation" there should be a clear "Breaking changes" record in Postgres version history.</div><div> </div><div>For now we consider it bug (worked before, not working now), that should be just fixed...</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>12.08.2020, 13:57, "Wilm Hoyer" <w.hoyer@dental-vision.de>:</div><blockquote><div lang="DE"><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> Wilm,</p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> Have a look, there is even an extension method for xmin (stable branch):</p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> <a href="https://github.com/npgsql/efcore.pg/blob/stable/src/EFCore.PG/Extensions/NpgsqlEntityTypeBuilderExtensions.cs"> https://github.com/npgsql/efcore.pg/blob/stable/src/EFCore.PG/Extensions/NpgsqlEntityTypeBuilderExtensions.cs</a></p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> EF Core is mostly independent from DB, but there usually are some "tweaks" for each database that you should know (like for SQL server the native concurrency token should be like byte[]:</p><ol type="1" style="margin-bottom:0cm;margin-top:0cm"><li style="box-sizing:border-box;color:#333333;font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"><code style="font-family:'courier new'"><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">[</span><span style="background:#f9f9f9;color:#555555;font-family:'consolas';font-size:10pt">TimeStamp</span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">]</span></code></li><li style="background:#f9f9f9;box-sizing:border-box;color:#333333;font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"><code style="font-family:'courier new'"><span style="background:#f9f9f9;color:blue;font-family:'consolas';font-size:10pt">public</span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:blue;font-family:'consolas';font-size:10pt">byte</span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">[]</span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:#555555;font-family:'consolas';font-size:10pt">RowVersion</span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">{<!-- --></span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:blue;font-family:'consolas';font-size:10pt">get</span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">;</span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:blue;font-family:'consolas';font-size:10pt">set</span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">;</span><span style="background:#f9f9f9;color:black;font-family:'consolas';font-size:10pt"> </span><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">}</span></code></li><li style="background:#f9f9f9;box-sizing:border-box;color:#333333;font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"><code style="font-family:'courier new'"><span style="background:#f9f9f9;color:#666600;font-family:'consolas';font-size:10pt">)</span></code></li></ol></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> Additional "self defined column" for billion rows tables leads to additional space needed. We use partitioning when the tables are LARGE :)</p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">> This works fine in 10, 11, it's strange it is broken in 12 (from db users point of view, I haven't examined the sources of PG for internals...)</p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p></div><div><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Sorry, i had not looked into the extension since it is not part of core npgsql. As it states: Npgsql.EntityFrameworkCore.PostgreSQL is an Entity Framework Core provider built on top of <a href="https://github.com/npgsql/npgsql">Npgsql</a>. (and we left npgsql before this extension was started).</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">I guess it would be easy for them to implement a solution, because that basically means adopting the relevant part from the EF SqlServer implementation. But i fear you’re out of luck with them, as support for table partitioning is still an open issue in their project.</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm"> </p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Best regards</p><p style="font-family:'calibri' , sans-serif;font-size:11pt;margin:0cm">Wilm.</p></div></div></div></blockquote>
Hi,
On 2020-08-12 14:19:12 +0900, Amit Langote wrote:
On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Wed, Aug 12, 2020 at 3:02 AM Andres Freund <andres@anarazel.de> wrote:
Medium term I think we should just plain out forbid references to system
columns in partioned tables Or at least insist that all partitions have
that column.Performance-wise I would prefer the former, because the latter would
involve checking *all* partitions statically in the INSERT case,
something that we've avoided doing so far.It's not like we don't have a technology for doing that. The way this
ideally would work, IMV, is that the parent partitioned table either
has or doesn't have a given system column. If it does, then every
child must too, just like the way things work for user columns.Ah, I may have misread "insisting that all partitions have a given
system column" as doing that on every query, but maybe Andres meant
what you are describing here.
I think Tom's formulation makes sense.
This'd require (a) some sort of consensus about which kinds of system
columns can make sense --- as Andres noted, 32-bit xmin might not be
the best choice here --- and (b) some notation for users to declare
which of these columns they want in a partitioned table. Once upon
a time we had WITH OIDS, maybe that idea could be extended.For (a), isn't there already a consensus that all table AMs support at
least the set of system columns described in 5.5 System Columns [1]
even if the individual members of that set are no longer the best
choice at this point?
I don't think there is. I don't think xmin/xmax/cmin/cmax should be
among those. tableoid and ctid are handled by generic code, so I think
they would be among the required columns.
Where do you see that concensus?
I do agree that we'd need (b) in some form to require AMs to fill
those columns which it seems is not the case currently.
Hm? What are you referencing here?
Greetings,
Andres Freund
Hi,
On Thu, Aug 13, 2020 at 1:27 AM Andres Freund <andres@anarazel.de> wrote:
On 2020-08-12 14:19:12 +0900, Amit Langote wrote:
On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
It's not like we don't have a technology for doing that. The way this
ideally would work, IMV, is that the parent partitioned table either
has or doesn't have a given system column. If it does, then every
child must too, just like the way things work for user columns.Ah, I may have misread "insisting that all partitions have a given
system column" as doing that on every query, but maybe Andres meant
what you are describing here.I think Tom's formulation makes sense.
Yes, I agree.
This'd require (a) some sort of consensus about which kinds of system
columns can make sense --- as Andres noted, 32-bit xmin might not be
the best choice here --- and (b) some notation for users to declare
which of these columns they want in a partitioned table. Once upon
a time we had WITH OIDS, maybe that idea could be extended.For (a), isn't there already a consensus that all table AMs support at
least the set of system columns described in 5.5 System Columns [1]
even if the individual members of that set are no longer the best
choice at this point?I don't think there is. I don't think xmin/xmax/cmin/cmax should be
among those. tableoid and ctid are handled by generic code, so I think
they would be among the required columns.Where do you see that concensus?
Perhaps I was wrong to use the word consensus. I was trying to say
that table AM extensibility work didn't change the description in 5.5
System Columns, which still says *all* tables, irrespective of their
AM, implicitly have those columns, so I assumed we continue to ask AM
authors to have space for those columns in their tuples. Maybe, that
list is a legacy of heapam and updating it in am AM-agnostic manner
would require consensus.
I do agree that we'd need (b) in some form to require AMs to fill
those columns which it seems is not the case currently.Hm? What are you referencing here?
I meant that WITH <a-system-column> specified on a table presumably
forces an AM to ensure that the column is present in its tuples, like
WITH OIDS specification on a table would force heapam to initialize
the oid system column in all tuples being inserted into that table.
Lack of the same notation for other system columns means that AMs
don't feel forced to ensure those columns are present in their tuples.
Also, having the WITH notation makes it easy to enforce that all
partitions in a given hierarchy have AMs that respect the WITH
specification.
--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes:
On Thu, Aug 13, 2020 at 1:27 AM Andres Freund <andres@anarazel.de> wrote:
Hm? What are you referencing here?
I meant that WITH <a-system-column> specified on a table presumably
forces an AM to ensure that the column is present in its tuples, like
WITH OIDS specification on a table would force heapam to initialize
the oid system column in all tuples being inserted into that table.
Lack of the same notation for other system columns means that AMs
don't feel forced to ensure those columns are present in their tuples.
I might be missing some subtlety here, but it seems to me that a
given table AM will probably have a fixed set of system columns
that it provides. For example heapam would provide exactly the
current set of columns, no matter what. I'm imagining that
(1) if the parent partitioned table has column X, and the proposed
child-table AM doesn't provide that, then we just refuse to
create/attach the partition.
(2) if a child-table AM provides some system column that the
parent does not, then you can access that column when querying
the child table directly, but not when querying the parent.
This works just like extra child columns in traditional PG
inheritance.
Given these rules, an AM isn't expected to do anything conditional at
runtime: it just provides what it provides. Instead we have an issue
to solve in or near the TupleTableSlot APIs, namely how to deal with
tuples that don't all have the same system columns. But we'd have
that problem anyway.
regards, tom lane
<div>Hi!</div><div> </div><div>What's the state with this issue?</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>13.08.2020, 07:13, "Amit Langote" <amitlangote09@gmail.com>:</div><blockquote><p>Hi,<br /><br />On Thu, Aug 13, 2020 at 1:27 AM Andres Freund <<a href="mailto:andres@anarazel.de" rel="noopener noreferrer">andres@anarazel.de</a>> wrote:</p><blockquote> On 2020-08-12 14:19:12 +0900, Amit Langote wrote:<br /> > On Wed, Aug 12, 2020 at 1:08 PM Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us" rel="noopener noreferrer">tgl@sss.pgh.pa.us</a>> wrote:<br /> > > It's not like we don't have a technology for doing that. The way this<br /> > > ideally would work, IMV, is that the parent partitioned table either<br /> > > has or doesn't have a given system column. If it does, then every<br /> > > child must too, just like the way things work for user columns.<br /> ><br /> > Ah, I may have misread "insisting that all partitions have a given<br /> > system column" as doing that on every query, but maybe Andres meant<br /> > what you are describing here.<br /><br /> I think Tom's formulation makes sense.</blockquote><p><br />Yes, I agree.<br /> </p><blockquote> > > This'd require (a) some sort of consensus about which kinds of system<br /> > > columns can make sense --- as Andres noted, 32-bit xmin might not be<br /> > > the best choice here --- and (b) some notation for users to declare<br /> > > which of these columns they want in a partitioned table. Once upon<br /> > > a time we had WITH OIDS, maybe that idea could be extended.<br /> ><br /> > For (a), isn't there already a consensus that all table AMs support at<br /> > least the set of system columns described in 5.5 System Columns [1]<br /> > even if the individual members of that set are no longer the best<br /> > choice at this point?<br /><br /> I don't think there is. I don't think xmin/xmax/cmin/cmax should be<br /> among those. tableoid and ctid are handled by generic code, so I think<br /> they would be among the required columns.<br /><br /> Where do you see that concensus?</blockquote><p><br />Perhaps I was wrong to use the word consensus. I was trying to say<br />that table AM extensibility work didn't change the description in 5.5<br />System Columns, which still says *all* tables, irrespective of their<br />AM, implicitly have those columns, so I assumed we continue to ask AM<br />authors to have space for those columns in their tuples. Maybe, that<br />list is a legacy of heapam and updating it in am AM-agnostic manner<br />would require consensus.<br /> </p><blockquote> > I do agree that we'd need (b) in some form to require AMs to fill<br /> > those columns which it seems is not the case currently.<br /><br /> Hm? What are you referencing here?</blockquote><p><br />I meant that WITH <a-system-column> specified on a table presumably<br />forces an AM to ensure that the column is present in its tuples, like<br />WITH OIDS specification on a table would force heapam to initialize<br />the oid system column in all tuples being inserted into that table.<br />Lack of the same notation for other system columns means that AMs<br />don't feel forced to ensure those columns are present in their tuples.<br />Also, having the WITH notation makes it easy to enforce that all<br />partitions in a given hierarchy have AMs that respect the WITH<br />specification.<br /> </p>--<br />Amit Langote<br />EnterpriseDB: <a href="http://www.enterprisedb.com/" rel="noopener noreferrer">http://www.enterprisedb.com</a></blockquote>
On Tue, Oct 27, 2020 at 10:55 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
Hi!
What's the state with this issue?
I think that the patch that Andres Freund posted earlier on this
thread [1]/messages/by-id/20200811180629.zx57llliqcmcgfyr@alap3.anarazel.de would be fine as a workaround at least for stable releases
v12 and v13. I have attached with this email a rebased version of
that patch, although I also made a few changes. The idea of the patch
is to allocate and use a partition-specific *non-virtual* slot, one
that is capable of providing system columns when the RETURNING
projection needs them. Andres' patch would allocate such a slot even
if RETURNING contained no system columns, whereas I changed the slot
creation code stanza to also check that RETURNING indeed contains
system columns. I've attached 2 patch files: one for HEAD and another
for v12 and v13 branches.
That said, the discussion on what to do going forward to *cleanly*
support accessing system columns through partitioned tables is
pending, but maybe the "workaround" fix will be enough in the meantime
(at least v12 and v13 can only get a workaround fix).
--
Amit Langote
EDB: http://www.enterprisedb.com
[1]: /messages/by-id/20200811180629.zx57llliqcmcgfyr@alap3.anarazel.de
Attachments:
parted-insert-returning_syscols_HEAD_v2.patchapplication/octet-stream; name=parted-insert-returning_syscols_HEAD_v2.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 86594bd..004d480 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/optimizer.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -683,6 +684,9 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
ExprContext *econtext;
List *returningList;
int firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+ int col;
+ Bitmapset *returning_attrs = NULL;
+ bool returning_has_syscol = false;
/* See the comment above for WCO lists. */
Assert((node->operation == CMD_INSERT &&
@@ -730,6 +734,29 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
leaf_part_rri->ri_projectReturning =
ExecBuildProjectionInfo(returningList, econtext, slot,
&mtstate->ps, RelationGetDescr(partrel));
+
+ /*
+ * If RETURNING will access partition's system columns, which the root
+ * partitioned table's virtual tuple slot will not be able to provide,
+ * create a partition-specific slot that will into which the tuple
+ * being inserted will be copied.
+ */
+ pull_varattnos((Node *) returningList, firstVarno, &returning_attrs);
+ col = -1;
+ while ((col = bms_next_member(returning_attrs, col)) >= 0)
+ {
+ AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber;
+
+ if (attno < 0)
+ {
+ returning_has_syscol = true;
+ break;
+ }
+ }
+
+ if (returning_has_syscol)
+ leaf_part_rri->ri_PartitionTupleSlot =
+ table_slot_create(partrel, &estate->es_tupleTable);
}
/* Set up information needed for routing tuples to the partition. */
@@ -969,9 +996,11 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
* If a partition has a different rowtype than the root parent, initialize
* a slot dedicated to storing this partition's tuples. The slot is used
* for various operations that are applied to tuples after routing, such
- * as checking constraints.
+ * as checking constraints. Caller may have already initialized the slot
+ * for RETURNING's perusal.
*/
- if (partRelInfo->ri_RootToPartitionMap != NULL)
+ if (partRelInfo->ri_RootToPartitionMap != NULL &&
+ partRelInfo->ri_PartitionTupleSlot == NULL)
{
Relation partrel = partRelInfo->ri_RelationDesc;
@@ -983,8 +1012,6 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
partRelInfo->ri_PartitionTupleSlot =
table_slot_create(partrel, &estate->es_tupleTable);
}
- else
- partRelInfo->ri_PartitionTupleSlot = NULL;
/*
* If the partition is a foreign table, let the FDW init itself for
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 29e07b7..8fac474 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1904,7 +1904,8 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
}
/*
- * Convert the tuple, if necessary.
+ * Convert the tuple, if necessary, and store in the partition-specific
+ * slot, if any.
*/
map = partrel->ri_RootToPartitionMap;
if (map != NULL)
@@ -1913,6 +1914,12 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
}
+ else if (partrel->ri_PartitionTupleSlot)
+ {
+ TupleTableSlot *new_slot = partrel->ri_PartitionTupleSlot;
+
+ slot = ExecCopySlot(new_slot, slot);
+ }
*partRelInfo = partrel;
return slot;
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index da50ee3..02403a8 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -960,13 +960,13 @@ select tableoid::regclass, * from mcrparted order by a, b;
(11 rows)
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
- returningwrtest
------------------
- (1)
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = pg_current_xact_id()::xid AS xmin_correct;
+ returningwrtest | ctid | xmin_correct
+-----------------+-------+--------------
+ (1) | (0,1) | t
(1 row)
-- check also that the wholerow vars in RETURNING list are converted as needed
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 963faa1..c40a4b9 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -590,10 +590,10 @@ insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
select tableoid::regclass, * from mcrparted order by a, b;
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = pg_current_xact_id()::xid AS xmin_correct;
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
parted-insert-returning_syscols_BACKPATCH_v2.patchapplication/octet-stream; name=parted-insert-returning_syscols_BACKPATCH_v2.patchDownload
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bd2ea25..df1ea35 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/optimizer.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -733,6 +734,40 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
leaf_part_rri, partidx);
/*
+ * If RETURNING will access partition's system columns, which the root
+ * partitioned table's virtual tuple slot will not be able to provide,
+ * create a partition-specific slot that will into which the tuple being
+ * inserted will be copied. ExecInitRoutingInfo() may already have set
+ * one up for tuple conversion, in which case nothing to do.
+ */
+ if (leaf_part_rri->ri_returningList &&
+ leaf_part_rri->ri_PartitionInfo->pi_PartitionTupleSlot == NULL)
+ {
+ int firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
+ int col;
+ Bitmapset *returning_attrs = NULL;
+ bool returning_has_syscol = false;
+
+ pull_varattnos((Node *) leaf_part_rri->ri_returningList, firstVarno,
+ &returning_attrs);
+ col = -1;
+ while ((col = bms_next_member(returning_attrs, col)) >= 0)
+ {
+ AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber;
+
+ if (attno < 0)
+ {
+ returning_has_syscol = true;
+ break;
+ }
+ }
+
+ if (returning_has_syscol)
+ leaf_part_rri->ri_PartitionInfo->pi_PartitionTupleSlot =
+ table_slot_create(partrel, &estate->es_tupleTable);
+ }
+
+ /*
* If there is an ON CONFLICT clause, initialize state for it.
*/
if (node && node->onConflictAction != ONCONFLICT_NONE)
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 20a4c47..12445c2 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1940,7 +1940,8 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
}
/*
- * Convert the tuple, if necessary.
+ * Convert the tuple, if necessary, and store in the partition-specific
+ * slot, if any.
*/
map = partrouteinfo->pi_RootToPartitionMap;
if (map != NULL)
@@ -1949,6 +1950,12 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
}
+ else if (partrouteinfo->pi_PartitionTupleSlot)
+ {
+ TupleTableSlot *new_slot = partrouteinfo->pi_PartitionTupleSlot;
+
+ slot = ExecCopySlot(new_slot, slot);
+ }
return slot;
}
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45b..b249662 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -962,15 +962,17 @@ select tableoid::regclass, * from mcrparted order by a, b;
(11 rows)
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
- returningwrtest
------------------
- (1)
+CREATE FUNCTION xid_current() RETURNS xid LANGUAGE SQL AS $$SELECT (txid_current() % ((1::int8<<32)))::text::xid;$$;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = xid_current() AS xmin_correct;
+ returningwrtest | ctid | xmin_correct
+-----------------+-------+--------------
+ (1) | (0,1) | t
(1 row)
+DROP FUNCTION xid_current();
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
create table returningwrtest2 (b text, c int, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aac..f966863 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -592,10 +592,12 @@ insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
select tableoid::regclass, * from mcrparted order by a, b;
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
+CREATE FUNCTION xid_current() RETURNS xid LANGUAGE SQL AS $$SELECT (txid_current() % ((1::int8<<32)))::text::xid;$$;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = xid_current() AS xmin_correct;
+DROP FUNCTION xid_current();
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
Hi Community
Related to ef core I found another issue. I try to convert a database column from bool to int with ef core migrations.
So if I generate the migration file it will look like as following: ALTER TABLE "Customer" ALTER COLUMN "State" TYPE INT
But in Postgres this statement need the “using” extension so it should look like this: ALTER TABLE " Customer" ALTER COLUMN " State " TYPE INT USING " State"::integer;
In my opinion ef core should be generate the right statement (with using statements) out of the box, so is there an issue with the NPG Provider?
Thank you in advance.
Kind regards
Kenan
Von: Wilm Hoyer <W.Hoyer@dental-vision.de>
Gesendet: Mittwoch, 12. August 2020 11:35
An: Pavel Biryukov <79166341370@yandex.ru>; Andres Freund <andres@anarazel.de>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Robert Haas <robertmhaas@gmail.com>; PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Betreff: AW: posgres 12 bug (partitioned table)
Hi,
while the request for returning xmin of partitioned tables is still valid, i’d like to add some information and a possible workaround.
On 2020-08-11 21:31:52 +0300, Pavel Biryukov wrote:
Entity Framework is an ORM for .Net (and .Net Core). It has different providers
for different databases (NpgSql for Postgres). It uses Optimistic concurrency.
The common use case is to use xmin as "concurrency token".
In code we make "var e = new Entity();", "dbContext.Add(e)" and
"dbContext.SaveChanges()" (smth like that), and EF Core constructs sql for us,
classical ORM;
When new row is inserted, EF makes an insert with "RETURNING xmin" to keep it
as concurrency token for further updates (update is made like "where id = [id]
AND xmin=[xmin]" to be sure the row hasn't been updated by other clients).
Neither the Entity Framework, nor npgsql rely on the column xmin. Both don’t know about this column in their codebase.
In the case oft he EF i’m sure that this holds true for all versions, since it is designed as DBMS independant, and as such will never know anything about a PostgreSQL specific column.
Also you can use any ADO.Net provider to connect to a concrete DBMS – i for example use dotConnect for PostgreSQL because it provided more features and less bugs as Npgsql at the time of decission.
As for Npgsql i have only checked that the current HEAD has no reference to xmin in its source code.
With that in mind, i assume the OP included the column xmin in his Entity Model by himself and set the ConcurrencyMode to fixed for that column.
As xmin is a system column that the EF should never try to update (PostgreSQL will reject this attempt, i think), i’d suggest using a self defined column (row_version for example) and either use triggers on update and insert to increment its value (works even with updates outside of EF) or let the EF do the increment.
Regards
Wilm Hoyer.
On 10/28/20 10:01 AM, Amit Langote wrote:
On Tue, Oct 27, 2020 at 10:55 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
Hi!
What's the state with this issue?
I think that the patch that Andres Freund posted earlier on this
thread [1] would be fine as a workaround at least for stable releases
v12 and v13. I have attached with this email a rebased version of
that patch, although I also made a few changes. The idea of the patch
is to allocate and use a partition-specific *non-virtual* slot, one
that is capable of providing system columns when the RETURNING
projection needs them. Andres' patch would allocate such a slot even
if RETURNING contained no system columns, whereas I changed the slot
creation code stanza to also check that RETURNING indeed contains
system columns. I've attached 2 patch files: one for HEAD and another
for v12 and v13 branches.That said, the discussion on what to do going forward to *cleanly*
support accessing system columns through partitioned tables is
pending, but maybe the "workaround" fix will be enough in the meantime
(at least v12 and v13 can only get a workaround fix).
stgresql.org/message-id/20200811180629.zx57llliqcmcgfyr%40alap3.anarazel.de
This thread has been quiet for while. Does everyone agree with Amit's
proposal to use this patch as a workaround?
Also, is there a CF entry for this issue? If so I am unable to find it.
Regards,
--
-David
david@pgmasters.net
<div>We are waiting for the fix :)</div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>05.03.2021, 19:25, "David Steele" <david@pgmasters.net>:</div><blockquote><p>On 10/28/20 10:01 AM, Amit Langote wrote:</p><blockquote> On Tue, Oct 27, 2020 at 10:55 PM Pavel Biryukov <<a href="mailto:79166341370@yandex.ru" rel="noopener noreferrer">79166341370@yandex.ru</a>> wrote:<blockquote> Hi!<br /><br /> What's the state with this issue?</blockquote> I think that the patch that Andres Freund posted earlier on this<br /> thread [1] would be fine as a workaround at least for stable releases<br /> v12 and v13. I have attached with this email a rebased version of<br /> that patch, although I also made a few changes. The idea of the patch<br /> is to allocate and use a partition-specific *non-virtual* slot, one<br /> that is capable of providing system columns when the RETURNING<br /> projection needs them. Andres' patch would allocate such a slot even<br /> if RETURNING contained no system columns, whereas I changed the slot<br /> creation code stanza to also check that RETURNING indeed contains<br /> system columns. I've attached 2 patch files: one for HEAD and another<br /> for v12 and v13 branches.<br /><br /> That said, the discussion on what to do going forward to *cleanly*<br /> support accessing system columns through partitioned tables is<br /> pending, but maybe the "workaround" fix will be enough in the meantime<br /> (at least v12 and v13 can only get a workaround fix).<br /> stgresql.org/message-id/20200811180629.zx57llliqcmcgfyr%40alap3.anarazel.de</blockquote><p><br />This thread has been quiet for while. Does everyone agree with Amit's<br />proposal to use this patch as a workaround?<br /><br />Also, is there a CF entry for this issue? If so I am unable to find it.<br /><br />Regards,<br /> </p>--<br />-David<br /><a href="mailto:david@pgmasters.net" rel="noopener noreferrer">david@pgmasters.net</a><br /> </blockquote>
Sorry for the noise to everyone who will see the below reply a 2nd
time, but doing...
On Fri, Mar 12, 2021 at 7:16 PM Amit Langote <amitlangote09@gmail.com> wrote:
(adding -hackers)
..this is in violation of list rules which I should've known. So I'm
re-replying after removing -hackers.
On Sat, Mar 6, 2021 at 1:25 AM David Steele <david@pgmasters.net> wrote:
On 10/28/20 10:01 AM, Amit Langote wrote:
On Tue, Oct 27, 2020 at 10:55 PM Pavel Biryukov <79166341370@yandex.ru> wrote:
Hi!
What's the state with this issue?
I think that the patch that Andres Freund posted earlier on this
thread [1] would be fine as a workaround at least for stable releases
v12 and v13. I have attached with this email a rebased version of
that patch, although I also made a few changes. The idea of the patch
is to allocate and use a partition-specific *non-virtual* slot, one
that is capable of providing system columns when the RETURNING
projection needs them. Andres' patch would allocate such a slot even
if RETURNING contained no system columns, whereas I changed the slot
creation code stanza to also check that RETURNING indeed contains
system columns. I've attached 2 patch files: one for HEAD and another
for v12 and v13 branches.That said, the discussion on what to do going forward to *cleanly*
support accessing system columns through partitioned tables is
pending, but maybe the "workaround" fix will be enough in the meantime
(at least v12 and v13 can only get a workaround fix).
stgresql.org/message-id/20200811180629.zx57llliqcmcgfyr%40alap3.anarazel.deThis thread has been quiet for while.
Sorry David, I'm a week late myself in noticing this reminder.
Does everyone agree with Amit's
proposal to use this patch as a workaround?I'm attaching an updated version of the patches with a commit message
this time. Two files like before -- one that applies to both v12 and
v13 and another to the master branch.Also, is there a CF entry for this issue? If so I am unable to find it.
Unfortunately, I never created one.
Patches as described above are attached again.
Forgot to ask in the last reply: for HEAD, should we consider simply
preventing accessing system columns when querying through a parent
partitioned table, as was also mentioned upthread?
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
v3-0001-Fix-RETURNING-system-columns-when-inserting-into-_BACKPATCH_v12_13.patchapplication/octet-stream; name=v3-0001-Fix-RETURNING-system-columns-when-inserting-into-_BACKPATCH_v12_13.patchDownload
From 5b69833982406fcfc833981f17bd2e2e96f9e5c8 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 12 Mar 2021 17:24:04 +0900
Subject: [PATCH v3] Fix RETURNING system columns when inserting into
partitioned tables
Currently, the use of a virtual TupleTableSlots to hold the tuples
being inserted into partitioned tables leads to the following fixed
error when the RETURNING list contains references to system columns:
ERROR: virtual tuple table slot does not have system attributes
This commit arranges the inserted tuple to be placed in a
TupleTableSlot specified by a leaf partition's table AM after the
tuple is routed to it. That allows any references to system columns
present in the RETURNING list to be resolved appropriately by the
leaf partition table AM's TupleTableSlotOps implementation.
---
src/backend/executor/execPartition.c | 35 ++++++++++++++++++++++++--
src/backend/executor/nodeModifyTable.c | 13 +++++-----
src/test/regress/expected/insert.out | 12 +++++----
src/test/regress/sql/insert.sql | 6 +++--
4 files changed, 51 insertions(+), 15 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e64bb2b605..63ab32e1e4 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/optimizer.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -168,6 +169,7 @@ static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
PartitionDispatch dispatch,
ResultRelInfo *rootResultRelInfo,
int partidx);
+static bool tlist_has_system_column(List *returningList, int firstVarno);
static void ExecInitRoutingInfo(ModifyTableState *mtstate,
EState *estate,
PartitionTupleRouting *proute,
@@ -924,6 +926,31 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
return leaf_part_rri;
}
+/*
+ * Returns if any of the Vars in returningList refers to a system column.
+ */
+static bool
+tlist_has_system_column(List *returningList, int firstVarno)
+{
+ Bitmapset *returning_attrs = NULL;
+ int col;
+
+ if (returningList == NIL)
+ return false;
+
+ pull_varattnos((Node *) returningList, firstVarno, &returning_attrs);
+ col = -1;
+ while ((col = bms_next_member(returning_attrs, col)) >= 0)
+ {
+ AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber;
+
+ if (attno < 0)
+ return true;
+ }
+
+ return false;
+}
+
/*
* ExecInitRoutingInfo
* Set up information needed for translating tuples between root
@@ -942,6 +969,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
MemoryContext oldcxt;
PartitionRoutingInfo *partrouteinfo;
int rri_index;
+ int firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
oldcxt = MemoryContextSwitchTo(proute->memcxt);
@@ -959,9 +987,12 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
* If a partition has a different rowtype than the root parent, initialize
* a slot dedicated to storing this partition's tuples. The slot is used
* for various operations that are applied to tuples after routing, such
- * as checking constraints.
+ * as checking constraints. We would also need this slot when there are
+ * system columns present in the query's RETURNING list, because the root
+ * parent's virtual slot would fail on trying to access them.
*/
- if (partrouteinfo->pi_RootToPartitionMap != NULL)
+ if (partrouteinfo->pi_RootToPartitionMap != NULL ||
+ tlist_has_system_column(partRelInfo->ri_returningList, firstVarno))
{
Relation partrel = partRelInfo->ri_RelationDesc;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9572ef0690..ef449e4954 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1889,6 +1889,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
ResultRelInfo *partrel;
PartitionRoutingInfo *partrouteinfo;
TupleConversionMap *map;
+ TupleTableSlot *partition_slot;
/*
* Lookup the target partition's ResultRelInfo. If ExecFindPartition does
@@ -1940,15 +1941,15 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
}
/*
- * Convert the tuple, if necessary.
+ * Convert the tuple, if necessary, and store in the partition-specific
+ * slot, if any.
*/
map = partrouteinfo->pi_RootToPartitionMap;
+ partition_slot = partrouteinfo->pi_PartitionTupleSlot;
if (map != NULL)
- {
- TupleTableSlot *new_slot = partrouteinfo->pi_PartitionTupleSlot;
-
- slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
- }
+ slot = execute_attr_map_slot(map->attrMap, slot, partition_slot);
+ else if (partition_slot)
+ slot = ExecCopySlot(partition_slot, slot);
return slot;
}
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index eb9d45be5e..b249662583 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -962,15 +962,17 @@ select tableoid::regclass, * from mcrparted order by a, b;
(11 rows)
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
- returningwrtest
------------------
- (1)
+CREATE FUNCTION xid_current() RETURNS xid LANGUAGE SQL AS $$SELECT (txid_current() % ((1::int8<<32)))::text::xid;$$;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = xid_current() AS xmin_correct;
+ returningwrtest | ctid | xmin_correct
+-----------------+-------+--------------
+ (1) | (0,1) | t
(1 row)
+DROP FUNCTION xid_current();
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
create table returningwrtest2 (b text, c int, a int);
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index ffd4aacbc4..f966863c14 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -592,10 +592,12 @@ insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
select tableoid::regclass, * from mcrparted order by a, b;
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
+CREATE FUNCTION xid_current() RETURNS xid LANGUAGE SQL AS $$SELECT (txid_current() % ((1::int8<<32)))::text::xid;$$;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = xid_current() AS xmin_correct;
+DROP FUNCTION xid_current();
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
--
2.24.1
v3-0001-Fix-RETURNING-system-columns-when-inserting-into-_HEAD.patchapplication/octet-stream; name=v3-0001-Fix-RETURNING-system-columns-when-inserting-into-_HEAD.patchDownload
From b43fcbbfabdead3faab40f27117a6feeaaf23857 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Fri, 12 Mar 2021 17:24:04 +0900
Subject: [PATCH v4] Fix RETURNING system columns when inserting into
partitioned tables
Currently, the use of a virtual TupleTableSlots to hold the tuples
being inserted into partitioned tables leads to the following fixed
error when the RETURNING list contains references to system columns:
ERROR: virtual tuple table slot does not have system attributes
This commit arranges the inserted tuple to be placed in a
TupleTableSlot specified by a leaf partition's table AM after the
tuple is routed to it. That allows any references to system columns
present in the RETURNING list to be resolved appropriately by the
leaf partition table AM's TupleTableSlotOps implementation.
---
src/backend/executor/execPartition.c | 37 +++++++++++++++++++++++---
src/backend/executor/nodeModifyTable.c | 13 ++++-----
src/test/regress/expected/insert.out | 10 +++----
src/test/regress/sql/insert.sql | 4 +--
4 files changed, 47 insertions(+), 17 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b8da4c5967..077ded4d3b 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/optimizer.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -168,6 +169,7 @@ static ResultRelInfo *ExecInitPartitionInfo(ModifyTableState *mtstate,
PartitionDispatch dispatch,
ResultRelInfo *rootResultRelInfo,
int partidx);
+static bool tlist_has_system_column(List *returningList, int firstVarno);
static void ExecInitRoutingInfo(ModifyTableState *mtstate,
EState *estate,
PartitionTupleRouting *proute,
@@ -936,6 +938,31 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, EState *estate,
return leaf_part_rri;
}
+/*
+ * Returns if any of the Vars in returningList refers to a system column.
+ */
+static bool
+tlist_has_system_column(List *returningList, int firstVarno)
+{
+ Bitmapset *returning_attrs = NULL;
+ int col;
+
+ if (returningList == NIL)
+ return false;
+
+ pull_varattnos((Node *) returningList, firstVarno, &returning_attrs);
+ col = -1;
+ while ((col = bms_next_member(returning_attrs, col)) >= 0)
+ {
+ AttrNumber attno = col + FirstLowInvalidHeapAttributeNumber;
+
+ if (attno < 0)
+ return true;
+ }
+
+ return false;
+}
+
/*
* ExecInitRoutingInfo
* Set up information needed for translating tuples between root
@@ -953,6 +980,7 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
ResultRelInfo *rootRelInfo = partRelInfo->ri_RootResultRelInfo;
MemoryContext oldcxt;
int rri_index;
+ int firstVarno = mtstate->resultRelInfo[0].ri_RangeTableIndex;
oldcxt = MemoryContextSwitchTo(proute->memcxt);
@@ -968,9 +996,12 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
* If a partition has a different rowtype than the root parent, initialize
* a slot dedicated to storing this partition's tuples. The slot is used
* for various operations that are applied to tuples after routing, such
- * as checking constraints.
+ * as checking constraints. We would also need this slot when there are
+ * system columns present in the query's RETURNING list, because the root
+ * parent's virtual slot would fail on trying to access them.
*/
- if (partRelInfo->ri_RootToPartitionMap != NULL)
+ if (partRelInfo->ri_RootToPartitionMap != NULL ||
+ tlist_has_system_column(partRelInfo->ri_returningList, firstVarno))
{
Relation partrel = partRelInfo->ri_RelationDesc;
@@ -982,8 +1013,6 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
partRelInfo->ri_PartitionTupleSlot =
table_slot_create(partrel, &estate->es_tupleTable);
}
- else
- partRelInfo->ri_PartitionTupleSlot = NULL;
/*
* If the partition is a foreign table, let the FDW init itself for
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 2993ba43e3..143eda16a7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -1993,6 +1993,7 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
{
ResultRelInfo *partrel;
TupleConversionMap *map;
+ TupleTableSlot *partition_slot;
/*
* Lookup the target partition's ResultRelInfo. If ExecFindPartition does
@@ -2022,15 +2023,15 @@ ExecPrepareTupleRouting(ModifyTableState *mtstate,
}
/*
- * Convert the tuple, if necessary.
+ * Convert the tuple, if necessary, and store in the partition-specific
+ * slot, if any.
*/
map = partrel->ri_RootToPartitionMap;
+ partition_slot = partrel->ri_PartitionTupleSlot;
if (map != NULL)
- {
- TupleTableSlot *new_slot = partrel->ri_PartitionTupleSlot;
-
- slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
- }
+ slot = execute_attr_map_slot(map->attrMap, slot, partition_slot);
+ else if (partition_slot)
+ slot = ExecCopySlot(partition_slot, slot);
*partRelInfo = partrel;
return slot;
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index da50ee3b67..02403a8505 100644
--- a/src/test/regress/expected/insert.out
+++ b/src/test/regress/expected/insert.out
@@ -960,13 +960,13 @@ select tableoid::regclass, * from mcrparted order by a, b;
(11 rows)
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
- returningwrtest
------------------
- (1)
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = pg_current_xact_id()::xid AS xmin_correct;
+ returningwrtest | ctid | xmin_correct
+-----------------+-------+--------------
+ (1) | (0,1) | t
(1 row)
-- check also that the wholerow vars in RETURNING list are converted as needed
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index 963faa1614..c40a4b9c31 100644
--- a/src/test/regress/sql/insert.sql
+++ b/src/test/regress/sql/insert.sql
@@ -590,10 +590,10 @@ insert into mcrparted values ('aaa', 0), ('b', 0), ('bz', 10), ('c', -10),
select tableoid::regclass, * from mcrparted order by a, b;
drop table mcrparted;
--- check that wholerow vars in the RETURNING list work with partitioned tables
+-- check that wholerow and system columns in the RETURNING list work with partitioned tables
create table returningwrtest (a int) partition by list (a);
create table returningwrtest1 partition of returningwrtest for values in (1);
-insert into returningwrtest values (1) returning returningwrtest;
+insert into returningwrtest values (1) returning returningwrtest, ctid, xmin = pg_current_xact_id()::xid AS xmin_correct;
-- check also that the wholerow vars in RETURNING list are converted as needed
alter table returningwrtest add b text;
--
2.24.1
Import Notes
Reply to msg id not found: CA+HiwqG7qhKJnFpZOqyrFngqcMzja+pgssMK73b+-BZffxqw@mail.gmail.com
Amit Langote <amitlangote09@gmail.com> writes:
Forgot to ask in the last reply: for HEAD, should we consider simply
preventing accessing system columns when querying through a parent
partitioned table, as was also mentioned upthread?
Indeed, it seems like this thread is putting a fair amount of work
into a goal that we shouldn't even be trying to do. We gave up the
assumption that a partitioned table's children would have consistent
system columns the moment we decided to allow foreign tables as
partition leaf tables; and it's only going to get worse as alternate
table AMs become more of a thing. So now I'm thinking the right thing
to be working on is banning access to system columns via partition
parent tables (other than tableoid, which ought to work).
I'm not quite sure whether the right way to do that is just not
make pg_attribute entries for system columns of partitioned tables,
as we do for views; or make them but have a code-driven error if
you try to use one in a query. The second way is uglier, but
it should allow a more on-point error message. OTOH, if we start
to have different sets of system columns for different table AMs,
we can't have partitioned tables covering all of those sets.
In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.
regards, tom lane
Hi,
On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
Indeed, it seems like this thread is putting a fair amount of work
into a goal that we shouldn't even be trying to do. We gave up the
assumption that a partitioned table's children would have consistent
system columns the moment we decided to allow foreign tables as
partition leaf tables; and it's only going to get worse as alternate
table AMs become more of a thing.
Agreed.
So now I'm thinking the right thing to be working on is banning access
to system columns via partition parent tables (other than tableoid,
which ought to work).
And ctid, I assume? While I have some hope for generalizing the
representation of tids at some point, I don't think it's realistic that
we'd actually get rid of them anytime soon.
I'm not quite sure whether the right way to do that is just not
make pg_attribute entries for system columns of partitioned tables,
as we do for views; or make them but have a code-driven error if
you try to use one in a query. The second way is uglier, but
it should allow a more on-point error message.
I'm leaning towards the approach of not having the pg_attribute entries
- it seems not great to have pg_attribute entries for columns that
actually cannot be queried. I don't think we can expect tooling querying
the catalogs to understand that.
OTOH, if we start to have different sets of system columns for
different table AMs, we can't have partitioned tables covering all of
those sets.
One could even imagine partition root specific system columns...
If we wanted AMs to actually be able to do introduce their own set of
system columns we'd need to change their representation to some
degree.
ISTM that the minimal changes needed would be to reorder sysattr.h to
have TableOidAttributeNumber be -2 (and keep
SelfItemPointerAttributeNumber as -1). And then slowly change all code
to not reference FirstLowInvalidHeapAttributeNumber - which seems like a
*substantial* amount of effort, due to all the shifting of AttrNumber by
FirstLowInvalidHeapAttributeNumber to be able to represent system
columns in bitmaps.
But perhaps that's the wrong direction, and we instead should work
towards *removing* system columns besides tableoid and ctid? At least
the way they work in heapam doesn't really make xmin,xmax,cmin,cmax
particularly useful. Wild speculation: Perhaps we ought to have some
parse-analysis-time handling for AM specific functions that return
additional information about a row, and that are evaluated directly
above (or even in) tableams?
In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.
Seems ok enough.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
So now I'm thinking the right thing to be working on is banning access
to system columns via partition parent tables (other than tableoid,
which ought to work).
And ctid, I assume? While I have some hope for generalizing the
representation of tids at some point, I don't think it's realistic that
we'd actually get rid of them anytime soon.
Hmm, I'd have thought that ctid would be very high on the list of
things we don't want to assume are the same for all AMs. Admittedly,
refactoring that is going to be a large pain in the rear, but ...
I see that it actually works right now because slot_getsysattr
special-cases both tableoid and ctid, but I have a hard time
believing that that's a long-term answer.
One could even imagine partition root specific system columns...
Yeah. As I think about this, I recall a previous discussion where
we suggested that maybe partitioned tables could have a subset
of system columns, whereupon all their children would be required
to support (at least) those system columns. But that would have
to be user-controllable, so a lot of infrastructure would need to
be built for it. For the moment I'd be satisfied with a fixed
decision that only tableoid is treated that way.
ISTM that the minimal changes needed would be to reorder sysattr.h to
have TableOidAttributeNumber be -2 (and keep
SelfItemPointerAttributeNumber as -1). And then slowly change all code
to not reference FirstLowInvalidHeapAttributeNumber - which seems like a
*substantial* amount of effort, due to all the shifting of AttrNumber by
FirstLowInvalidHeapAttributeNumber to be able to represent system
columns in bitmaps.
Getting rid of FirstLowInvalidHeapAttributeNumber seems like nearly
a nonstarter. I'd be more inclined to reduce it by one or two so
as to leave some daylight for AMs that want system columns different
from the usual set. I don't feel any urgency about renumbering the
existing columns --- the value of that vs. the risk of breaking
things doesn't seem like a great tradeoff.
regards, tom lane
Hi,
On 2021-04-21 16:51:42 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-04-21 15:10:09 -0400, Tom Lane wrote:
So now I'm thinking the right thing to be working on is banning access
to system columns via partition parent tables (other than tableoid,
which ought to work).And ctid, I assume? While I have some hope for generalizing the
representation of tids at some point, I don't think it's realistic that
we'd actually get rid of them anytime soon.Hmm, I'd have thought that ctid would be very high on the list of
things we don't want to assume are the same for all AMs. Admittedly,
refactoring that is going to be a large pain in the rear, but ...
I don't really see us getting rid of something like ctid as a generic
concept across AMs - there's just too many places that need a way to
reference a specific tuple. However, I think we ought to change how much
code outside of AMs know about what tids mean. And, although that's a
significant lift on its own, we ought to make at least the generic
representation variable width.
I see that it actually works right now because slot_getsysattr
special-cases both tableoid and ctid, but I have a hard time
believing that that's a long-term answer.
Not fully responsive to this: I've previously wondered if we ought to
handle tableoid at parse-analysis or expression simplification time
instead of runtime. It's not really something that needs runtime
evaluation. But it's also not clear it's worth changing anything...
One could even imagine partition root specific system columns...
Yeah. As I think about this, I recall a previous discussion where
we suggested that maybe partitioned tables could have a subset
of system columns, whereupon all their children would be required
to support (at least) those system columns. But that would have
to be user-controllable, so a lot of infrastructure would need to
be built for it. For the moment I'd be satisfied with a fixed
decision that only tableoid is treated that way.
I don't really see a convincing usecase to add this kind of complication
with the current set of columns. Outside of tableoid and ctid every use
of system attributes was either wrong, or purely for debugging /
introspection, where restricting it to actual tables doesn't seem like a
problem.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-21 16:51:42 -0400, Tom Lane wrote:
Hmm, I'd have thought that ctid would be very high on the list of
things we don't want to assume are the same for all AMs. Admittedly,
refactoring that is going to be a large pain in the rear, but ...
I don't really see us getting rid of something like ctid as a generic
concept across AMs - there's just too many places that need a way to
reference a specific tuple. However, I think we ought to change how much
code outside of AMs know about what tids mean. And, although that's a
significant lift on its own, we ought to make at least the generic
representation variable width.
It seems like it might not be that hard to convert ctid generically
into a uint64, where heaps and heap-related indexes only use 6 bytes
of it. Variable-width I agree would be a very big complication added
on top, and I'm not quite convinced that we need it.
Not fully responsive to this: I've previously wondered if we ought to
handle tableoid at parse-analysis or expression simplification time
instead of runtime. It's not really something that needs runtime
evaluation. But it's also not clear it's worth changing anything...
If you do "RETURNING tableoid" in DML on a partitioned table, or on a
traditional-inheritance hierarchy, you get the OID of the particular
partition that was touched (indeed, that's the whole point of the
feature, IIRC). Don't really see how that could be implemented
earlier than runtime. Also don't see where the win would be, TBH.
regards, tom lane
Hi,
On 2021-04-21 17:38:53 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
I don't really see us getting rid of something like ctid as a generic
concept across AMs - there's just too many places that need a way to
reference a specific tuple. However, I think we ought to change how much
code outside of AMs know about what tids mean. And, although that's a
significant lift on its own, we ought to make at least the generic
representation variable width.It seems like it might not be that hard to convert ctid generically
into a uint64, where heaps and heap-related indexes only use 6 bytes
of it.
Yep.
Variable-width I agree would be a very big complication added on top,
and I'm not quite convinced that we need it.
I can see three (related) major cases where variable width tids would be
quite useful:
1) Creating an index-oriented-table AM would harder/more
limited with just an 8 byte tid
2) Supporting "indirect" indexes (i.e. indexes pointing to a primary
key, thereby being much cheaper to maintain when there are updates),
would require the primary key to map to an 8 byte integer.
3) Global indexes would be a lot easier if we had variable width tids
(but other ways of addressing the issue are possible).
Greetings,
Andres Freund
<div>Hi guys,</div><div> </div><div>Are you going to completely remove xmin from partitioned tables? So it will not work in <em>select ..., xmin from partitioned_table ?</em></div><div> </div><div>For us as users of Postgres (not knowing internal details) it is strange that there are some problems with getting current transaction and saving it as usual to xmin...</div><div> </div><div>What about first adding a "rowversion" type like in SQL Server for optimistic concurrency?</div><div> </div><div>I would remind that many code use xmin and would no be able to work with partitioned table...</div><div><div><a href="https://www.npgsql.org/efcore/modeling/concurrency.html" rel="noopener noreferrer" target="_blank">https://www.npgsql.org/efcore/modeling/concurrency.html</a></div><div> </div></div><div> </div><div>-- <br />С уважением, Павел</div><div> </div><div> </div><div> </div><div>22.04.2021, 00:48, "Andres Freund" <andres@anarazel.de>:</div><blockquote><p>Hi,<br /><br />On 2021-04-21 17:38:53 -0400, Tom Lane wrote:</p><blockquote> Andres Freund <<a href="mailto:andres@anarazel.de" rel="noopener noreferrer">andres@anarazel.de</a>> writes:<br /> > I don't really see us getting rid of something like ctid as a generic<br /> > concept across AMs - there's just too many places that need a way to<br /> > reference a specific tuple. However, I think we ought to change how much<br /> > code outside of AMs know about what tids mean. And, although that's a<br /> > significant lift on its own, we ought to make at least the generic<br /> > representation variable width.<br /><br /> It seems like it might not be that hard to convert ctid generically<br /> into a uint64, where heaps and heap-related indexes only use 6 bytes<br /> of it.</blockquote><p><br />Yep.<br /><br /> </p><blockquote> Variable-width I agree would be a very big complication added on top,<br /> and I'm not quite convinced that we need it.</blockquote><p><br />I can see three (related) major cases where variable width tids would be<br />quite useful:<br />1) Creating an index-oriented-table AM would harder/more<br /> limited with just an 8 byte tid<br />2) Supporting "indirect" indexes (i.e. indexes pointing to a primary<br /> key, thereby being much cheaper to maintain when there are updates),<br /> would require the primary key to map to an 8 byte integer.<br />3) Global indexes would be a lot easier if we had variable width tids<br /> (but other ways of addressing the issue are possible).<br /><br />Greetings,<br /><br />Andres Freund</p></blockquote>
On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
Forgot to ask in the last reply: for HEAD, should we consider simply
preventing accessing system columns when querying through a parent
partitioned table, as was also mentioned upthread?Indeed, it seems like this thread is putting a fair amount of work
into a goal that we shouldn't even be trying to do. We gave up the
assumption that a partitioned table's children would have consistent
system columns the moment we decided to allow foreign tables as
partition leaf tables; and it's only going to get worse as alternate
table AMs become more of a thing. So now I'm thinking the right thing
to be working on is banning access to system columns via partition
parent tables (other than tableoid, which ought to work).
Accessing both tableoid and ctid works currently. Based on the
discussion, it seems we're not so sure whether that will remain the
case for the 2nd going forward.
I'm not quite sure whether the right way to do that is just not
make pg_attribute entries for system columns of partitioned tables,
as we do for views; or make them but have a code-driven error if
you try to use one in a query. The second way is uglier, but
it should allow a more on-point error message. OTOH, if we start
to have different sets of system columns for different table AMs,
we can't have partitioned tables covering all of those sets.
I tend to agree with Andres that not having any pg_attribute entries
may be better.
In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.
Got it. That sounds like an acceptable compromise.
--
Amit Langote
EDB: http://www.enterprisedb.com
Amit Langote <amitlangote09@gmail.com> writes:
On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.
Got it. That sounds like an acceptable compromise.
Hm, we're not done with this. Whatever you think of the merits of
throwing an implementation-level error, what's actually happening
in v12 and v13 in the wake of a71cfc56b is that an attempt to do
"RETURNING xmin" works in a non-cross-partition UPDATE, but in
a cross-partition UPDATE it dumps core. What I'm seeing is that
the result of the execute_attr_map_slot() call looks like
(gdb) p *(BufferHeapTupleTableSlot*) scanslot
$3 = {base = {base = {type = T_TupleTableSlot, tts_flags = 8, tts_nvalid = 3,
tts_ops = 0xa305c0 <TTSOpsBufferHeapTuple>,
tts_tupleDescriptor = 0x7f1c7798f920, tts_values = 0x2077100,
tts_isnull = 0x2075fb0, tts_mcxt = 0x2015ea0, tts_tid = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 3}, tts_tableOid = 37812},
tuple = 0x0, off = 0, tupdata = {t_len = 0, t_self = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0,
t_data = 0x0}}, buffer = 0}
and since base.tuple is NULL, heap_getsysattr dies on either
"Assert(tup)" or a null-pointer dereference.
So
(1) It seems like this is exposing a shortcoming in the
multiple-slot-types logic. It's not hard to understand why the slot would
look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
but if this is not a legal state for a BufferHeapTuple slot, why didn't
ExecStoreVirtualTuple raise a complaint?
(2) It also seems like we can't use the srcSlot if we want to have
the fail-because-its-a-virtual-tuple behavior. I experimented with
doing ExecMaterializeSlot on the result of execute_attr_map_slot,
and that stops the crash, but then we're returning garbage values
of xmin etc, which does not seem good.
regards, tom lane
Hi,
On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
Hm, we're not done with this. Whatever you think of the merits of
throwing an implementation-level error, what's actually happening
in v12 and v13 in the wake of a71cfc56b is that an attempt to do
"RETURNING xmin" works in a non-cross-partition UPDATE, but in
a cross-partition UPDATE it dumps core. What I'm seeing is that
the result of the execute_attr_map_slot() call looks like(gdb) p *(BufferHeapTupleTableSlot*) scanslot
$3 = {base = {base = {type = T_TupleTableSlot, tts_flags = 8, tts_nvalid = 3,
tts_ops = 0xa305c0 <TTSOpsBufferHeapTuple>,
tts_tupleDescriptor = 0x7f1c7798f920, tts_values = 0x2077100,
tts_isnull = 0x2075fb0, tts_mcxt = 0x2015ea0, tts_tid = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 3}, tts_tableOid = 37812},
tuple = 0x0, off = 0, tupdata = {t_len = 0, t_self = {ip_blkid = {
bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0,
t_data = 0x0}}, buffer = 0}and since base.tuple is NULL, heap_getsysattr dies on either
"Assert(tup)" or a null-pointer dereference.So
(1) It seems like this is exposing a shortcoming in the
multiple-slot-types logic. It's not hard to understand why the slot would
look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
but if this is not a legal state for a BufferHeapTuple slot, why didn't
ExecStoreVirtualTuple raise a complaint?
I think it's too useful to support ExecStoreVirtualTuple() in a
heap[buffer] slot to disallow that. Seems like we should make
tts_buffer_heap_getsysattr() error out if there's no tuple?
(2) It also seems like we can't use the srcSlot if we want to have
the fail-because-its-a-virtual-tuple behavior. I experimented with
doing ExecMaterializeSlot on the result of execute_attr_map_slot,
and that stops the crash, but then we're returning garbage values
of xmin etc, which does not seem good.
Garbage values as in 0's, or random data? Seems like it should be the
former?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
(1) It seems like this is exposing a shortcoming in the
multiple-slot-types logic. It's not hard to understand why the slot would
look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
but if this is not a legal state for a BufferHeapTuple slot, why didn't
ExecStoreVirtualTuple raise a complaint?
I think it's too useful to support ExecStoreVirtualTuple() in a
heap[buffer] slot to disallow that. Seems like we should make
tts_buffer_heap_getsysattr() error out if there's no tuple?
OK, I could work with that. Shall we spell the error message the
same as if it really were a virtual slot, or does it need to be
different to avoid confusion?
(2) It also seems like we can't use the srcSlot if we want to have
the fail-because-its-a-virtual-tuple behavior. I experimented with
doing ExecMaterializeSlot on the result of execute_attr_map_slot,
and that stops the crash, but then we're returning garbage values
of xmin etc, which does not seem good.
Garbage values as in 0's, or random data? Seems like it should be the
former?
I was seeing something like xmin = 128. I think this might be from
our filling in the header as though for a composite Datum. In any
case, I think we need to either deliver the correct answer or throw
an error; silently returning zeroes wouldn't be good.
regards, tom lane
Hi,
On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-04-22 14:21:05 -0400, Tom Lane wrote:
(1) It seems like this is exposing a shortcoming in the
multiple-slot-types logic. It's not hard to understand why the slot would
look like this after execute_attr_map_slot does ExecStoreVirtualTuple,
but if this is not a legal state for a BufferHeapTuple slot, why didn't
ExecStoreVirtualTuple raise a complaint?I think it's too useful to support ExecStoreVirtualTuple() in a
heap[buffer] slot to disallow that. Seems like we should make
tts_buffer_heap_getsysattr() error out if there's no tuple?OK, I could work with that. Shall we spell the error message the
same as if it really were a virtual slot, or does it need to be
different to avoid confusion?
Hm. Seems like it'd be better to have a distinct error message? Feels
like it'll often indicate separate issues whether a non-materialized
heap slot or a virtual slot has its system columns accessed.
(2) It also seems like we can't use the srcSlot if we want to have
the fail-because-its-a-virtual-tuple behavior. I experimented with
doing ExecMaterializeSlot on the result of execute_attr_map_slot,
and that stops the crash, but then we're returning garbage values
of xmin etc, which does not seem good.Garbage values as in 0's, or random data? Seems like it should be the
former?I was seeing something like xmin = 128. I think this might be from
our filling in the header as though for a composite Datum.
I was wondering if we're not initializing memory somewhere were we
should...
In any case, I think we need to either deliver the correct answer or
throw an error; silently returning zeroes wouldn't be good.
IIRC there's some historical precedent in returning 0s in some
cases. But I don't think we should do that if we can avoid it.
Not entirely clear to me how we'd track whether we have valid system
column data or not once materialized - which I think is why we
historically had the cases where we returned bogus values.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
OK, I could work with that. Shall we spell the error message the
same as if it really were a virtual slot, or does it need to be
different to avoid confusion?
Hm. Seems like it'd be better to have a distinct error message? Feels
like it'll often indicate separate issues whether a non-materialized
heap slot or a virtual slot has its system columns accessed.
After thinking about it for a bit, I'm inclined to promote this to
a user-facing error, and have all the slot types report
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot retrieve a system column in this context")));
which is at least somewhat intelligible to end users. A developer
trying to figure out why it happened would resort to \errverbose or
more likely gdb in any case, so the lack of specificity doesn't
seem like a problem.
Not entirely clear to me how we'd track whether we have valid system
column data or not once materialized - which I think is why we
historically had the cases where we returned bogus values.
Right, but with this fix we won't need to materialize.
regards, tom lane
Hi,
On 2021-04-22 15:09:59 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2021-04-22 14:37:26 -0400, Tom Lane wrote:
OK, I could work with that. Shall we spell the error message the
same as if it really were a virtual slot, or does it need to be
different to avoid confusion?Hm. Seems like it'd be better to have a distinct error message? Feels
like it'll often indicate separate issues whether a non-materialized
heap slot or a virtual slot has its system columns accessed.After thinking about it for a bit, I'm inclined to promote this to
a user-facing error, and have all the slot types reportereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot retrieve a system column in this context")));
WFM.
which is at least somewhat intelligible to end users. A developer
trying to figure out why it happened would resort to \errverbose or
more likely gdb in any case, so the lack of specificity doesn't
seem like a problem.
I'm wondering if it'd be a good idea to add TupleTableSlotOps.name,
which we then could include in error messages without needing to end up
with per-slot-type code... But that's probably a separate project from
adding the error above.
Not entirely clear to me how we'd track whether we have valid system
column data or not once materialized - which I think is why we
historically had the cases where we returned bogus values.Right, but with this fix we won't need to materialize.
In this path - but it does seem mildly bothersome that we might do the
wrong thing in other paths. If we at least returned something halfway
sensible (e.g. NULL) instead of 128... I guess we could track whether
the tuple originated externally, or whether it's from a materialized
virtual tuple...
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2021-04-22 15:09:59 -0400, Tom Lane wrote:
After thinking about it for a bit, I'm inclined to promote this to
a user-facing error, and have all the slot types report
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot retrieve a system column in this context")));
WFM.
OK, I'll go make that happen.
regards, tom lane
On Fri, Apr 23, 2021 at 3:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Amit Langote <amitlangote09@gmail.com> writes:
On Thu, Apr 22, 2021 at 4:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the meantime, if the back branches fail with something like
"virtual tuple table slot does not have system attributes" when
trying to do this, that's not great but I'm not sure we should
be putting effort into improving it.Got it. That sounds like an acceptable compromise.
Hm, we're not done with this. Whatever you think of the merits of
throwing an implementation-level error, what's actually happening
in v12 and v13 in the wake of a71cfc56b is that an attempt to do
"RETURNING xmin" works in a non-cross-partition UPDATE, but in
a cross-partition UPDATE it dumps core.
Thanks for fixing this.
--
Amit Langote
EDB: http://www.enterprisedb.com