ALTER TABLE ADD COLUMN fast default
Attached is a patch for $subject. It's based on Serge Reilau's patch of
about a year ago, taking into account comments made at the time. with
bitrot removed and other enhancements such as documentation.
Essentially it stores a value in pg_attribute that is used when the
stored tuple is missing the attribute. This works unless the default
expression is volatile, in which case a table rewrite is forced as
happens now.
Comments welcome.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v2.patchtext/x-patch; name=fast_default-v2.patchDownload+1829-249
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
Attached is a patch for $subject. It's based on Serge Reilau's patch of
about a year ago, taking into account comments made at the time. with
bitrot removed and other enhancements such as documentation.
Essentially it stores a value in pg_attribute that is used when the
stored tuple is missing the attribute. This works unless the default
expression is volatile, in which case a table rewrite is forced as
happens now.
I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths. I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.
regards, tom lane
On 12/06/2017 10:16 AM, Tom Lane wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
Attached is a patch for $subject. It's based on Serge Reilau's patch of
about a year ago, taking into account comments made at the time. with
bitrot removed and other enhancements such as documentation.
Essentially it stores a value in pg_attribute that is used when the
stored tuple is missing the attribute. This works unless the default
expression is volatile, in which case a table rewrite is forced as
happens now.I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths. I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.
Thanks for prompt comments.
I'm sure we can reduce the footprint some.
w.r.t. heap_attisnull, only a handful of call sites actually need the
new functionality, 8 or possibly 10 (I need to check more on those in
relcache.c). So we could instead of changing the function provide a new
one to be used at those sites. I'll work on that. and submit a new patch
fairly shortly.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
On 12/06/2017 10:16 AM, Tom Lane wrote:
I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths. I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.
I'm sure we can reduce the footprint some.
w.r.t. heap_attisnull, only a handful of call sites actually need the
new functionality, 8 or possibly 10 (I need to check more on those in
relcache.c). So we could instead of changing the function provide a new
one to be used at those sites. I'll work on that. and submit a new patch
fairly shortly.
Meh, I'm not at all sure that's an improvement. A version of
heap_attisnull that ignores the possibility of a "missing" attribute value
will give the *wrong answer* if the other version should have been used
instead. Furthermore it'd be an attractive nuisance because people would
fail to notice if an existing call were now unsafe. If we're to do this
I think we have no choice but to impose that compatibility break on
third-party code.
In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated. Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea. There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.
I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values. It may
have originated from the same place but it's a distinct thing. Better
to store it in a separate sub-structure.
regards, tom lane
On 12/06/2017 11:05 AM, Tom Lane wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
On 12/06/2017 10:16 AM, Tom Lane wrote:
I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths. I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.I'm sure we can reduce the footprint some.
w.r.t. heap_attisnull, only a handful of call sites actually need the
new functionality, 8 or possibly 10 (I need to check more on those in
relcache.c). So we could instead of changing the function provide a new
one to be used at those sites. I'll work on that. and submit a new patch
fairly shortly.Meh, I'm not at all sure that's an improvement. A version of
heap_attisnull that ignores the possibility of a "missing" attribute value
will give the *wrong answer* if the other version should have been used
instead. Furthermore it'd be an attractive nuisance because people would
fail to notice if an existing call were now unsafe. If we're to do this
I think we have no choice but to impose that compatibility break on
third-party code.
Fair point.
In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated. Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea. There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values. It may
have originated from the same place but it's a distinct thing. Better
to store it in a separate sub-structure.
OK, will work on all that. Thanks again.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated. Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea. There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values. It may
have originated from the same place but it's a distinct thing. Better
to store it in a separate sub-structure.OK, will work on all that. Thanks again.
Looking closer at this. First, there is exactly one place where the
patch does s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.
making this like atttypid suggests that it belongs right outside the
TupleConstr structure. But then it seems to me that the change could
well end up being a darn site more invasive. For example, should we be
passing the number of missing values to CreateTemplateTupleDesc and
CreateTupleDesc? I'm happy to do whatever work is required, but I want
to have a firm understanding of the design before I spend lots of time
cutting code. Once I understand how tupdesc.h should look I should be good.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/12/2017 02:13 PM, Andrew Dunstan wrote:
On 12/06/2017 11:26 AM, Andrew Dunstan wrote:
In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated. Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea. There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values. It may
have originated from the same place but it's a distinct thing. Better
to store it in a separate sub-structure.OK, will work on all that. Thanks again.
Looking closer at this. First, there is exactly one place where the
patch does s/CreateTupleDescCopy/CreateTupleDescCopyConstr/.making this like atttypid suggests that it belongs right outside the
TupleConstr structure. But then it seems to me that the change could
well end up being a darn site more invasive. For example, should we be
passing the number of missing values to CreateTemplateTupleDesc and
CreateTupleDesc? I'm happy to do whatever work is required, but I want
to have a firm understanding of the design before I spend lots of time
cutting code. Once I understand how tupdesc.h should look I should be good.
Here is a new version of the patch. It separates the missing values
constructs and logic from the default values constructs and logic. The
missing values now sit alongside the default values in tupleConstr. In
some places the separation makes the logic a good bit cleaner.
Some of the logic is also reworked to remove an assumption that the
missing values structure is populated in attnum order, Also code to
support the missing stuctures is added to equalTupleDescs.
We still have that one extra place where we need to call
CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen
an easy way around that.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v4.patchtext/x-patch; name=fast_default-v4.patchDownload+1890-236
On 12/31/2017 06:48 PM, Andrew Dunstan wrote:
Here is a new version of the patch. It separates the missing values
constructs and logic from the default values constructs and logic. The
missing values now sit alongside the default values in tupleConstr. In
some places the separation makes the logic a good bit cleaner.Some of the logic is also reworked to remove an assumption that the
missing values structure is populated in attnum order, Also code to
support the missing stuctures is added to equalTupleDescs.We still have that one extra place where we need to call
CreateTupleDescCopyConstr instead of CreateTupleDescCopy. I haven't seen
an easy way around that.
New version of the patch that fills in the remaining piece in
equalTupleDescs.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v5.patchtext/x-patch; name=fast_default-v5.patchDownload+1897-236
On 2 January 2018 at 05:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
New version of the patch that fills in the remaining piece in
equalTupleDescs.
This no longer applies to current master. Can send an updated patch?
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 01/16/2018 12:13 AM, David Rowley wrote:
On 2 January 2018 at 05:01, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:New version of the patch that fills in the remaining piece in
equalTupleDescs.This no longer applies to current master. Can send an updated patch?
Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
updated version. Thanks for looking.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v6.patchtext/x-patch; name=fast_default-v6.patchDownload+1893-236
On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
updated version. Thanks for looking.
A boring semi-automated update: this no long compiles, because
8561e4840c8 added a new call to heap_attisnull(). Pretty sure it just
needs NULL in the third argument.
--
Thomas Munro
http://www.enterprisedb.com
On Thu, Jan 25, 2018 at 6:10 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Wed, Jan 17, 2018 at 2:21 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:Yeah, got caught by the bki/pg_attribute changes on Friday. here's an
updated version. Thanks for looking.A boring semi-automated update: this no long compiles, because
8561e4840c8 added a new call to heap_attisnull(). Pretty sure it just
needs NULL in the third argument.
Yeah, thanks. revised patch attached
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v7.patchapplication/octet-stream; name=fast_default-v7.patchDownload+1894-237
Hi,
On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote:
Yeah, thanks. revised patch attached
Given that this patch touches code that's a huge bottleneck in a lot of
cases, I think this needs benchmarks that heavily exercises tuple
deforming.
Greetings,
Andres Freund
On 02/01/2018 02:54 PM, Andres Freund wrote:
Hi,
On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote:
Yeah, thanks. revised patch attached
Given that this patch touches code that's a huge bottleneck in a lot of
cases, I think this needs benchmarks that heavily exercises tuple
deforming.
That's a reasonable request, I guess, and I tried to collect such data
today. I measured two cases:
1) Table with 1000 columns and 64 rows, when there were no ALTER TABLE
adding columns with 'fast' defaults. This is meant to measure the best
case, with minimal impact from the patch. See the create.sql script
attached to this message, and the query.sql which was used for tests
using pgbench like this:
pgbench -n -f q.sql -T 15 test
after 100 runs, the results (tps) look like this:
min max median
--------------------------------------
master 1827 1873 1860
patched 2023 2066 2056
That is, the patch apparently improves the performance by about 10%
(according to perf profiles this is due to slot_deform_tuple getting
cheaper).
So this case seems fine.
2) Table with 64 rows and 1000 columns, all added by ALTER TABLE with
fast default without rewrite. See create-alter.sql.
Using the same query.sql as before, this shold significant drop to only
about 40 tps (from ~2000 tps for master). The profiles something like this:
+ 98.87% 98.87% postgres [.] slot_getmissingattrs
+ 98.77% 0.00% postgres [.] PortalRun
+ 98.77% 0.00% postgres [.] ExecAgg
+ 98.74% 0.01% postgres [.] ExecInterpExpr
which is kinda understandable, although the 2000 to 40 tps seems like a
pretty significant drop. But then again, this case is constructed like a
fairly extreme corner case.
However, there seems to be some sort of bug, because when I did VACUUM
FULL - ideally this would replace the "missing" default values with
actual values stored in the heap rows, eliminating the performance
impact. But the default values got lost and replaced by NULL values,
which seems like a clear data loss scenario.
I'm not quite sure what's wrong, but try this:
\i create-alter.sql
-- this returns 64, which is correct
SELECT COUNT(*) FROM t;
-- this actually retuns 64 rows with values "1"
SELECT c1000 FROM t;
-- this returns 63, which is incorrect (should be 64)
SELECT count(c1000) FROM t;
VACUUM FULL t;
-- suddenly we only get NULL values for all 64 rows
SELECT c1000 FROM t;
-- and now we got 0 (instead of 64)
SELECT count(c1000) FROM t;
regard
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
Yeah, thanks. revised patch attached
FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
*** /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/identity.out
2018-02-04 00:56:12.146303616 +0000
--- /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/identity.out
2018-02-04 01:05:55.217932032 +0000
***************
*** 257,268 ****
INSERT INTO itest13 VALUES (1), (2), (3);
-- add column to populated table
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
SELECT * FROM itest13;
! a | b | c
! ---+---+---
! 1 | 1 | 1
! 2 | 2 | 2
! 3 | 3 | 3
(3 rows)
-- various ALTER COLUMN tests
--- 257,269 ----
INSERT INTO itest13 VALUES (1), (2), (3);
-- add column to populated table
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
+ ERROR: column "c" contains null values
SELECT * FROM itest13;
! a | b
! ---+---
! 1 | 1
! 2 | 2
! 3 | 3
(3 rows)
-- various ALTER COLUMN tests
--
Thomas Munro
http://www.enterprisedb.com
On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:Yeah, thanks. revised patch attached
FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:Yeah, thanks. revised patch attached
FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fast_default-v8.patchapplication/octet-stream; name=fast_default-v8.patchDownload+1897-242
On 09/02/18 06:24, Andrew Dunstan wrote:
On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:Yeah, thanks. revised patch attached
FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.
I see the patch does not update the ALTER TABLE docs section which
discusses table rewrites and it seems like it should.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Feb 11, 2018 at 2:50 PM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.I see the patch does not update the ALTER TABLE docs section which
discusses table rewrites and it seems like it should.
Umm it changes the second and third paras of the Notes section, which
refer to rewrites. Not sure what else it should change.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:Yeah, thanks. revised patch attached
FYI the identity regression test started failing recently with this
patch applied (maybe due to commit
533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
Here's a version that fixes the above issue and also the issue with
VACUUM that Tomas Vondra reported. I'm still working on the issue with
aggregates that Tomas also reported.
This version fixes that issue. Thanks to Tomas for his help in finding
where the problem was. There are now no outstanding issues that I'm
aware of.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services