System column support for partitioned tables using heap

Started by Morris de Oryxover 3 years ago7 messages

morrisdeoryx@gmail.com

over 3 years ago

I've run into an existing behavior where xmax(), and various other system
tables, return an error when included in the RETURNING list on a
partitioned table.

ERROR: cannot retrieve a system column in this context
`
This issue got a fair airing back in 2020:

AW: posgres 12 bug (partitioned table)
/messages/by-id/GVAP278MB006939B1D7DFDD650E383FBFEACE0@GVAP278MB0069.CHEP278.PROD.OUTLOOK.COM

I'm using 14.4, and just ran into this behavior today. I'm wondering if
there has been any new work on this subject, or anything to take into
account moving forward?

I'm not a C coder, and do not know the Postgres internals, but here's what
I gleaned from the thread:

* Available system columns depend on the underlying table access method,
and may/will vary across AMs. For example, the columns implemented by heap
is what the docs describe, an FDW could be anything, and Postgres has no
control of what, if any, system column-like attributes they support, and
future and hypothetical AMs may have different sets.

* Rather than return garbage results, or a default of 0, etc., the system
throws the error I ran into.

I'd be happier working with a NULL result than garbage, ambiguous results,
or errors...but an error is the current behavior. Agreed on that, I'd
rather an error than a bad/meaningless result. Postgres' consistent
emphasis on correctness is easily one of its greatest qualities.

In my case, I'm upgrading a lot of existing code to try and capture a more
complete profile of what an UPSERT did. Right now, I grab a count(*) of the
rows and return that. Works fine. A revised snippet looks a bit like this:

------------------------------------------------------------
...UPSERT code
returning xmax as inserted_transaction_id),

status_data AS (
select count(*) FILTER (where inserted_transaction_id = 0) AS
insert_count,
count(*) FILTER (where inserted_transaction_id != 0) AS
estimated_update_count,
pg_current_xact_id_if_assigned()::text AS
transaction_id

from inserted_rows),

...custom logging code

-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id

from status_data;
------------------------------------------------------------

This fails on a partitioned table because xmax() may not exist. In fact, it
does exist in all of those tables, but the system doesn't know how to
guarantee that. I know which tables are partitioned, and can downgrade the
result on partitioned tables to the count(*) I've been using to date. But
now I'm wondering if working with xmax() like this is a poor idea going
forward. I don't want to lean on a feature/behavior that's likely to
change. For example, I noticed the other day that MERGE does not support
RETURNING.

I'd appreciate any insight or advice you can offer.

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Morris de Oryx (#1)

Re: System column support for partitioned tables using heap

On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:

This fails on a partitioned table because xmax() may not exist. In fact, it does exist in all of those tables, but the system doesn't know how to guarantee that. I know which tables are partitioned, and can downgrade the result on partitioned tables to the count(*) I've been using to date. But now I'm wondering if working with xmax() like this is a poor idea going forward. I don't want to lean on a feature/behavior that's likely to change. For example, I noticed the other day that MERGE does not support RETURNING.

I'd appreciate any insight or advice you can offer.

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

--
Robert Haas
EDB: http://www.enterprisedb.com

Morris de Oryx

morrisdeoryx@gmail.com

over 3 years ago

In reply to: Robert Haas (#2)

Re: System column support for partitioned tables using heap

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

Yeah, I figured it was niche when I saw so little mention of the issue.

My reason for xmax() in the result is to break down the affected rows count
into an insert count, and a modified estimate. Not super critical, but
helpful. I've built out some simple custom logging table in out system for
this kind of detail, and folks have been wanting to break down rows
submitted, rows inserted, and rows updated a bit better. Rows submitted is
easy and rows inserted is too...update is an estimate as I'm not using
anything fancy with xmax() to sort out what exactly happened.

For clarification, we're not using an ORM, and may need to support
straggling clients, so our push cycle works like this:

* Create a view with the fields expected in the insert. I figured I'd use
CREATE VIEW instead of CREATE TYPE as then I can quickly check out the
details against live data, and I still get a custom compound type.

* Write a function that accepts an array of view_name_type. I *love* Postgres'
typing system, It has spoiled me forever. Can't submit badly formatted
objects from the client, they're rejected automatically.

* Write a client-side routine to package data as an array and push it into
the insert handling function. The function unnests the array, and then the
actual insert code draws from the unpacked values. If I need to extend the
table, I can add a new function that knows about the revised fields, and
revise (when necessary) earlier supported formats to map to new
types/columns/defaults.

There are few CTEs in there, including one that does the main insert and
returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <>
0 (not an insert).

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

You should never feel bad about anything ;-) You and others on that thread
contribute so much that I'm getting value out of.

I had it in mind that it would be nice to have some kind of
catalog/abstraction that would make it possible to interrogate what system
columns are available on a table/partition based on access method. In my
vague notion, that might make some of the other ideas from that thread,
such as index-oriented stores with quite different physical layouts, easier
to implement. But, it's all free when you aren't the one who can write the
code.

I've switched the partition-based tables back to returning * on the insert
CTE, and then aggregating that to add to a log table and the client result.
It's fine. A rich result summary would be very nice. As in rows
added/modified/deleted on whatever table(s). If anyone ever decides to
implement such a structure for MERGE, it would be nice to see it
retrofitted to the other data modification commands where RETURNING works.

On Tue, Jul 19, 2022 at 6:13 AM Robert Haas <robertmhaas@gmail.com> wrote:

Show quoted text

On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com>
wrote:

This fails on a partitioned table because xmax() may not exist. In fact,

it does exist in all of those tables, but the system doesn't know how to
guarantee that. I know which tables are partitioned, and can downgrade the
result on partitioned tables to the count(*) I've been using to date. But
now I'm wondering if working with xmax() like this is a poor idea going
forward. I don't want to lean on a feature/behavior that's likely to
change. For example, I noticed the other day that MERGE does not support
RETURNING.

I'd appreciate any insight or advice you can offer.

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

--
Robert Haas
EDB: http://www.enterprisedb.com

Morris de Oryx

morrisdeoryx@gmail.com

over 3 years ago

In reply to: Morris de Oryx (#3)

Re: System column support for partitioned tables using heap

The column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.

Right, seeing what you're meaning there. It's fantastic that a partition
might be an FDW to a system that has no concept at all of anything like a
"system column", or something with an alternative AM to heap that has a
different set of system columns. That flexibility in partitions is super
valuable. I'd love to be able to convert old partitions into column stores,
for example. (I think that Citus offers that feature now.)

I guess if anyone ever felt it was worth the effort, maybe whatever checks
are done at attach-partition time for the column list could also enforce
meta/system columns. If missing, a shimming mechanism would be pretty
necessary.

Sounds like a lot of work for not much gain, at least in this narrow case.

Thanks again for answering.

On Tue, Jul 19, 2022 at 6:43 PM Morris de Oryx <morrisdeoryx@gmail.com>
wrote:

Show quoted text

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

Yeah, I figured it was niche when I saw so little mention of the issue.

My reason for xmax() in the result is to break down the affected rows
count into an insert count, and a modified estimate. Not super critical,
but helpful. I've built out some simple custom logging table in out system
for this kind of detail, and folks have been wanting to break down rows
submitted, rows inserted, and rows updated a bit better. Rows submitted is
easy and rows inserted is too...update is an estimate as I'm not using
anything fancy with xmax() to sort out what exactly happened.

For clarification, we're not using an ORM, and may need to support
straggling clients, so our push cycle works like this:

* Create a view with the fields expected in the insert. I figured I'd use
CREATE VIEW instead of CREATE TYPE as then I can quickly check out the
details against live data, and I still get a custom compound type.

* Write a function that accepts an array of view_name_type. I *love* Postgres'
typing system, It has spoiled me forever. Can't submit badly formatted
objects from the client, they're rejected automatically.

* Write a client-side routine to package data as an array and push it into
the insert handling function. The function unnests the array, and then the
actual insert code draws from the unpacked values. If I need to extend the
table, I can add a new function that knows about the revised fields, and
revise (when necessary) earlier supported formats to map to new
types/columns/defaults.

There are few CTEs in there, including one that does the main insert and
returns the xmax(). That lets me distinguish xmax = 0 (insert) from xmax <>
0 (not an insert).

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

You should never feel bad about anything ;-) You and others on that thread
contribute so much that I'm getting value out of.

I had it in mind that it would be nice to have some kind of
catalog/abstraction that would make it possible to interrogate what system
columns are available on a table/partition based on access method. In my
vague notion, that might make some of the other ideas from that thread,
such as index-oriented stores with quite different physical layouts, easier
to implement. But, it's all free when you aren't the one who can write the
code.

I've switched the partition-based tables back to returning * on the insert
CTE, and then aggregating that to add to a log table and the client result.
It's fine. A rich result summary would be very nice. As in rows
added/modified/deleted on whatever table(s). If anyone ever decides to
implement such a structure for MERGE, it would be nice to see it
retrofitted to the other data modification commands where RETURNING works.

On Tue, Jul 19, 2022 at 6:13 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jul 17, 2022 at 9:04 PM Morris de Oryx <morrisdeoryx@gmail.com>
wrote:

This fails on a partitioned table because xmax() may not exist. In

fact, it does exist in all of those tables, but the system doesn't know how
to guarantee that. I know which tables are partitioned, and can downgrade
the result on partitioned tables to the count(*) I've been using to date.
But now I'm wondering if working with xmax() like this is a poor idea going
forward. I don't want to lean on a feature/behavior that's likely to
change. For example, I noticed the other day that MERGE does not support
RETURNING.

I'd appreciate any insight or advice you can offer.

What is motivating you to want to see the xmax value here? It's not an
unreasonable thing to want to do, IMHO, but it's a little bit niche so
I'm just curious what the motivation is.

I do agree with you that it would be nice if this worked better than
it does, but I don't really know exactly how to make that happen. The
column list for a partitioned table must be fixed at the time it is
created, but we do not know what partitions might be added in the
future, and thus we don't know whether they will have an xmax column.
I guess we could have tried to work things out so that a 0 value would
be passed up from children that lack an xmax column, and that would
allow the parent to have such a column, but I don't feel too bad that
we didn't do that ... should I?

--
Robert Haas
EDB: http://www.enterprisedb.com

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Morris de Oryx (#3)

Re: System column support for partitioned tables using heap

On Tue, Jul 19, 2022 at 4:44 AM Morris de Oryx <morrisdeoryx@gmail.com> wrote:

My reason for xmax() in the result is to break down the affected rows count into an insert count, and a modified estimate. Not super critical, but helpful. I've built out some simple custom logging table in out system for this kind of detail, and folks have been wanting to break down rows submitted, rows inserted, and rows updated a bit better. Rows submitted is easy and rows inserted is too...update is an estimate as I'm not using anything fancy with xmax() to sort out what exactly happened.

I wonder whether you could just have the CTEs bubble up 1 or 0 and
then sum them at some stage, instead of relying on xmax. Presumably
your UPSERT simulation knows which thing it did in each case.

For MERGE itself, I wonder if some information about this should be
included in the command tag. It looks like MERGE already includes some
sort of row count in the command tag, but I guess perhaps it doesn't
distinguish between inserts and updates. I don't know why we couldn't
expose multiple values this way, though.

--
Robert Haas
EDB: http://www.enterprisedb.com

Morris de Oryx

morrisdeoryx@gmail.com

over 3 years ago

In reply to: Robert Haas (#5)

Re: System column support for partitioned tables using heap

On Tue, Jul 19, 2022 at 10:38 PM Robert Haas <robertmhaas@gmail.com> wrote:

For MERGE itself, I wonder if some information about this should be
included in the command tag. It looks like MERGE already includes some
sort of row count in the command tag, but I guess perhaps it doesn't
distinguish between inserts and updates. I don't know why we couldn't
expose multiple values this way, though.

It would be great to get some sort of feedback from MERGE accessible
through SQL results, even if that doesn't come in the form of a RETURNING
list.

I wonder whether you could just have the CTEs bubble up 1 or 0 and
then sum them at some stage, instead of relying on xmax. Presumably
your UPSERT simulation knows which thing it did in each case.

It might help if I show a sample insert handling function. The issue is
with the line at the end of the top CTE, insert_rows:

returning xmax as inserted_transaction_id),

That's what fails on partitions. Is there an alternative way to test what
happened to the row(s)? here's the full function. . I wrote a code
generator, so I don't have to hand-code all of these bits for each
table+version:

-- Create a function to accept an array of rows formatted as item_type_v1
for UPSERT into item_type.
DROP FUNCTION IF EXISTS types_plus.insert_item_type_v1
(types_plus.item_type_v1[]);

CREATE OR REPLACE FUNCTION types_plus.insert_item_type_v1 (data_in
types_plus.item_type_v1[])

RETURNS TABLE (
insert_count integer,
estimated_update_count integer,
transaction_id text)

LANGUAGE SQL

BEGIN ATOMIC

-- The CTE below is a roundabout way of returning an insertion count from a
pure SQL function in Postgres.
WITH
inserted_rows as (
INSERT INTO item_type (
id,
marked_for_deletion,
name_)

SELECT
rows_in.id,
rows_in.marked_for_deletion,
rows_in.name_

FROM unnest(data_in) as rows_in

ON CONFLICT(id) DO UPDATE SET
marked_for_deletion = EXCLUDED.marked_for_deletion,
name_ = EXCLUDED.name_

returning xmax as inserted_transaction_id),

from inserted_rows),

insert_log_entry AS (
INSERT INTO insert_log (
data_file_id,
ib_version,
job_run_id,

schema_name,
table_name,
records_submitted,
insert_count,
estimated_update_count)

SELECT
coalesce_session_variable(
'data_file_id',
'00000000000000000000000000000000')::uuid,

coalesce_session_variable('ib_version'), -- Default result is ''

coalesce_session_variable(
'job_run_id',
'00000000000000000000000000000000')::uuid,

'ascendco',
'item_type',
(select cardinality(data_in)),
insert_count,
estimated_update_count

FROM status_data
)

-- Final output/result.
select insert_count,
estimated_update_count,
transaction_id

from status_data;

END;

Robert Haas

robertmhaas@gmail.com

over 3 years ago

In reply to: Morris de Oryx (#6)

Re: System column support for partitioned tables using heap

On Tue, Jul 19, 2022 at 11:22 PM Morris de Oryx <morrisdeoryx@gmail.com> wrote:

It might help if I show a sample insert handling function. The issue is with the line at the end of the top CTE, insert_rows:

returning xmax as inserted_transaction_id),

That's what fails on partitions. Is there an alternative way to test what happened to the row(s)? here's the full function. . I wrote a code generator, so I don't have to hand-code all of these bits for each table+version:

Oh I see. I didn't realize you were using INSERT .. ON CONFLICT
UPDATE, but that makes tons of sense, and I don't see an obvious
alternative to the way you wrote this.

Hmm.

--
Robert Haas
EDB: http://www.enterprisedb.com