Why does ExecComputeStoredGenerated() form a heap tuple
Hi,
while rebasing the remaining tableam patches (luckily a pretty small set
now!), I had a few conflicts with ExecComputeStoredGenerated(). While
resolving I noticed:
oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces);
ExecForceStoreHeapTuple(newtuple, slot);
if (should_free)
heap_freetuple(oldtuple);
MemoryContextSwitchTo(oldContext);
First off, I'm not convinced this is correct:
ISTM you'd need at least an ExecMaterializeSlot() before the
MemoryContextSwitchTo() in ExecComputeStoredGenerated().
But what actually brought me to reply was that it seems like it'll cause
unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if
the slot isn't in that form, and then it'll cause a conversion by
storing a heap tuple even if the target doesn't use heap representation.
ISTM the above would be much more efficiently - even more efficient if
only heap is used - implemented as something roughly akin to:
slot_getallattrs(slot);
memcpy(values, slot->tts_values, ...);
memcpy(nulls, slot->tts_isnull, ...);
for (int i = 0; i < natts; i++)
{
if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED)
{
values[i] = ...
}
else
values[i] = datumCopy(...);
}
ExecClearTuple(slot);
memcpy(slot->tts_values, values, ...);
memcpy(slot->tts_isnull, nulls, ...);
ExecStoreVirtualTuple(slot);
ExecMaterializeSlot(slot);
that's not perfect, but more efficient than your version...
Greetings,
Andres Freund
Hi,
On 2019-03-30 19:57:44 -0700, Andres Freund wrote:
while rebasing the remaining tableam patches (luckily a pretty small set
now!), I had a few conflicts with ExecComputeStoredGenerated(). While
resolving I noticed:oldtuple = ExecFetchSlotHeapTuple(slot, true, &should_free);
newtuple = heap_modify_tuple(oldtuple, tupdesc, values, nulls, replaces);
ExecForceStoreHeapTuple(newtuple, slot);
if (should_free)
heap_freetuple(oldtuple);MemoryContextSwitchTo(oldContext);
First off, I'm not convinced this is correct:
ISTM you'd need at least an ExecMaterializeSlot() before the
MemoryContextSwitchTo() in ExecComputeStoredGenerated().But what actually brought me to reply was that it seems like it'll cause
unnecessary slowdowns for !heap AMs. First, it'll form a heaptuple if
the slot isn't in that form, and then it'll cause a conversion by
storing a heap tuple even if the target doesn't use heap representation.ISTM the above would be much more efficiently - even more efficient if
only heap is used - implemented as something roughly akin to:slot_getallattrs(slot);
memcpy(values, slot->tts_values, ...);
memcpy(nulls, slot->tts_isnull, ...);for (int i = 0; i < natts; i++)
{
if (TupleDescAttr(tupdesc, i)->attgenerated == ATTRIBUTE_GENERATED_STORED)
{
values[i] = ...
}
else
values[i] = datumCopy(...);
}ExecClearTuple(slot);
memcpy(slot->tts_values, values, ...);
memcpy(slot->tts_isnull, nulls, ...);
ExecStoreVirtualTuple(slot);
ExecMaterializeSlot(slot);that's not perfect, but more efficient than your version...
Also, have you actually benchmarked this code? ISTM that adding a
stored generated column would cause quite noticable slowdowns in the
COPY path based on this code.
Greetings,
Andres Freund
On 2019-03-31 04:57, Andres Freund wrote:
while rebasing the remaining tableam patches (luckily a pretty small set
now!), I had a few conflicts with ExecComputeStoredGenerated(). While
resolving I noticed:
The core of that code was written a long time ago and perhaps hasn't
caught up with all the refactoring going on. I'll look through your
proposal and update the code.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-03-31 05:00, Andres Freund wrote:
Also, have you actually benchmarked this code? ISTM that adding a
stored generated column would cause quite noticable slowdowns in the
COPY path based on this code.
Yes, it'll be slower than not having it, but it's much faster than the
equivalent trigger.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2019-04-01 11:25:46 +0200, Peter Eisentraut wrote:
On 2019-03-31 05:00, Andres Freund wrote:
Also, have you actually benchmarked this code? ISTM that adding a
stored generated column would cause quite noticable slowdowns in the
COPY path based on this code.Yes, it'll be slower than not having it, but it's much faster than the
equivalent trigger.
It at the moment is quite noticably slower than directly inserting the
generated column.
postgres[11993][1]=# CREATE TABLE foo_without_generated(id int, copy_of_int int);
CREATE TABLE
Time: 0.625 ms
postgres[11993][1]=# CREATE TABLE foo_with_generated(id int, copy_of_int int generated always as (id) stored);
CREATE TABLE
Time: 0.771 ms
postgres[11993][1]=# INSERT INTO foo_without_generated SELECT g.i, g.i FROM generate_series(1, 1000000) g(i);
INSERT 0 1000000
Time: 691.533 ms
postgres[11993][1]=# INSERT INTO foo_with_generated SELECT g.i FROM generate_series(1, 1000000) g(i);
INSERT 0 1000000
Time: 825.471 ms
postgres[11993][1]=# COPY foo_without_generated TO '/tmp/foo_without_generated';
COPY 1000000
Time: 194.051 ms
postgres[11993][1]=# COPY foo_with_generated TO '/tmp/foo_with_generated';
COPY 1000000
Time: 153.146 ms
postgres[11993][1]=# ;TRUNCATE foo_without_generated ;COPY foo_without_generated FROM '/tmp/foo_without_generated';
Time: 0.178 ms
TRUNCATE TABLE
Time: 8.456 ms
COPY 1000000
Time: 394.990 ms
postgres[11993][1]=# ;TRUNCATE foo_with_generated ;COPY foo_with_generated FROM '/tmp/foo_with_generated';
Time: 0.147 ms
TRUNCATE TABLE
Time: 8.043 ms
COPY 1000000
Time: 508.918 ms
From a quick profile that's indeed largely because
ExecComputeStoredGenerated() is really inefficient - and it seems
largely unnecessarily so. I think this should at least be roughly as
efficient as getting the additional data from the client.
Minor other point: I'm not a fan of defining more general infrastructure
like ExecComputedStoredGenerated() in nodeModifyTable.c - it's already
large and confusing, and it's not obvious that e.g. COPY would call into
it.
Greetings,
Andres Freund
On 2019-04-01 11:23, Peter Eisentraut wrote:
On 2019-03-31 04:57, Andres Freund wrote:
while rebasing the remaining tableam patches (luckily a pretty small set
now!), I had a few conflicts with ExecComputeStoredGenerated(). While
resolving I noticed:The core of that code was written a long time ago and perhaps hasn't
caught up with all the refactoring going on. I'll look through your
proposal and update the code.
The attached patch is based on your sketch. It's clearly better in the
long term not to rely on heap tuples here. But in testing this change
seems to make it slightly slower, certainly not a speedup as you were
apparently hoping for.
Test setup:
create table t0 (a int, b int);
insert into t0 select generate_series (1, 10000000); -- 10 million
\copy t0 (a) to 'test.dat';
-- for comparison, without generated column
truncate t0;
\copy t0 (a) from 'test.dat';
-- master
create table t1 (a int, b int generated always as (a * 2) stored);
truncate t1;
\copy t1 (a) from 'test.dat';
-- patched
truncate t1;
\copy t1 (a) from 'test.dat';
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Don-t-form-heap-tuple-in-ExecComputeStoredGenerated.patchtext/plain; charset=UTF-8; name=0001-Don-t-form-heap-tuple-in-ExecComputeStoredGenerated.patch; x-mac-creator=0; x-mac-type=0Download+12-16
On Tue, 23 Apr 2019 at 20:23, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
The attached patch is based on your sketch. It's clearly better in the
long term not to rely on heap tuples here. But in testing this change
seems to make it slightly slower, certainly not a speedup as you were
apparently hoping for.Test setup:
create table t0 (a int, b int);
insert into t0 select generate_series (1, 10000000); -- 10 million
\copy t0 (a) to 'test.dat';-- for comparison, without generated column
truncate t0;
\copy t0 (a) from 'test.dat';-- master
create table t1 (a int, b int generated always as (a * 2) stored);
truncate t1;
\copy t1 (a) from 'test.dat';-- patched
truncate t1;
\copy t1 (a) from 'test.dat';
I didn't do the exact same test, but if I use COPY instead of \copy,
then for me patched is faster.
Normal table:
postgres=# copy t0 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5437.768 ms (00:05.438)
postgres=# truncate t0;
TRUNCATE TABLE
Time: 20.775 ms
postgres=# copy t0 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5272.228 ms (00:05.272)
Master:
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 6570.031 ms (00:06.570)
postgres=# truncate t1;
TRUNCATE TABLE
Time: 17.813 ms
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 6486.253 ms (00:06.486)
Patched:
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5359.338 ms (00:05.359)
postgres=# truncate table t1;
TRUNCATE TABLE
Time: 25.551 ms
postgres=# copy t1 (a) from '/home/drowley/test.dat';
COPY 10000000
Time: 5347.596 ms (00:05.348)
For the patch, I wonder if you need this line:
+ memcpy(values, slot->tts_values, sizeof(*values) * natts);
If you got rid of that and changed the datumCopy to use
slot->tts_values[i] instead.
Maybe it's also worth getting rid of the first memcpy for the null
array and just assign the element in the else clause.
It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a
variable instead of using the macro 3 times. It'd make that datumCopy
line shorter too.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2019-04-24 00:26, David Rowley wrote:
I didn't do the exact same test, but if I use COPY instead of \copy,
then for me patched is faster.
OK, confirmed that way, too.
For the patch, I wonder if you need this line:
+ memcpy(values, slot->tts_values, sizeof(*values) * natts);
If you got rid of that and changed the datumCopy to use
slot->tts_values[i] instead.
done
Maybe it's also worth getting rid of the first memcpy for the null
array and just assign the element in the else clause.
Tried that, seems to be slower. So I left it as is.
It might also be cleaner to assign TupleDescAttr(tupdesc, i) to a
variable instead of using the macro 3 times. It'd make that datumCopy
line shorter too.
Also done.
Updated patch attached.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v2-0001-Convert-ExecComputeStoredGenerated-to-use-tuple-s.patchtext/plain; charset=UTF-8; name=v2-0001-Convert-ExecComputeStoredGenerated-to-use-tuple-s.patch; x-mac-creator=0; x-mac-type=0Download+52-23
On Thu, 16 May 2019 at 05:44, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
Updated patch attached.
This patch looks okay to me.
It's not for this patch, or probably for PG12, but it would be good if
we could avoid the formation of the Tuple until right before the new
tuple is inserted.
I see heap_form_tuple() is called 3 times for a single INSERT with:
create table t (a text, b text, c text generated always as (b || b) stored);
create or replace function t_trigger() returns trigger as $$
begin
NEW.b = UPPER(NEW.a);
RETURN NEW;
end;
$$ language plpgsql;
create trigger t_on_insert before insert on t for each row execute
function t_trigger();
insert into t (a) values('one');
and heap_deform_tuple() is called once for each additional
heap_form_tuple(). That's pretty wasteful :-(
Maybe Andres can explain if this is really required, or if it's just
something that's not well optimised yet.
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On 2019-05-20 14:23:34 +1200, David Rowley wrote:
It's not for this patch, or probably for PG12, but it would be good if
we could avoid the formation of the Tuple until right before the new
tuple is inserted.I see heap_form_tuple() is called 3 times for a single INSERT with:
create table t (a text, b text, c text generated always as (b || b) stored);
create or replace function t_trigger() returns trigger as $$
begin
NEW.b = UPPER(NEW.a);
RETURN NEW;
end;
$$ language plpgsql;create trigger t_on_insert before insert on t for each row execute
function t_trigger();insert into t (a) values('one');
and heap_deform_tuple() is called once for each additional
heap_form_tuple(). That's pretty wasteful :-(Maybe Andres can explain if this is really required, or if it's just
something that's not well optimised yet.
I think we can optimize this further, but it's not unexpected.
I see:
1) ExecCopySlot() call in in ExecModifyTable(). For INSERT SELECT the
input will be in a virtual slot. We might be able to have some
trickery to avoid this one in some case. Not sure how much it'd help
- I think we most of the time would just move the forming of the
tuple around - ExecInsert() wants to materialize the slot.
2) plpgsql form/deform due to updating a field. I don't see how we could
easily fix that. We'd have to invent a mechanism that allows plpgsql to pass
slots around. I guess it's possible you could make that work somehow?
But I think we'd also need to change the external trigger interface -
which currently specifies that the return type is a HeapTuple
3) ExecComputeStoredGenerated(). I suspect it's not particularly useful
to get rid of the heap_form_tuple (from with ExecMaterialize())
here. When actually inserting we'll have to actually form the tuple
anyway. But what I do wonder is whether it would make sense to move
the materialization outside of that function. If there's constraints,
or partitioning, we'll have to deform (parts of) the tuple, to access
the necessary columns.
Currently materializing an unmaterialized slot (i.e. making it
independent from anything but memory referenced by the slot) also means
that later accesses will need to deform again. I'm fairly sure we can
improve that for many cases (IIRC we don't need to that for virtual
slots, but that's irrelevant here).
I'm pretty sure we get rid of most of this, but it'll be some work. I'm
also not sure how important it is - for INSERT/UPDATE, in how many cases
is the bottleneck those copies, rather than other parts of query
execution? I suspect you can measure it for some INSERT ... SELECT type
cases - but probably the overhead of triggers and GENERATED is going to
be higher.
Greetings,
Andres Freund
On 2019-05-20 04:23, David Rowley wrote:
On Thu, 16 May 2019 at 05:44, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:Updated patch attached.
This patch looks okay to me.
committed
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services