Fast AT ADD COLUMN with DEFAULTs

Started by Serge Rielauover 9 years ago37 messageshackers

serge@rielau.com

over 9 years ago

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table without a table rewrite (i.e. at O(1) cost) from the nullable-without-default to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; => (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
The requirement is driven by large tables for which add column takes too much time and/or produces too large a transaction for comfort.
In simplified terms: * a second “exist default” is computed and stored in the catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple descriptor (e.g in attrdef) * When one of the getAttr or copytuple related routines is invoked the exist default is filled in instead of simply NULL padding if the tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?
Cheers Serge Rielau Salesforce.com
via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Serge Rielau (#1)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Serge Rielau <serge@rielau.com> wrote:

Dear Hackers,

I’m working on a patch that expands PG’s ability to add columns to a table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case. E.g.

CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
INSERT INTO T VALEUS (1), (2), (3);
ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
INSERT INTO T VALUES (4, DEFAULT);
ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
INSERT INTO T VALUS (5, DEFAULT);
SELECT * FROM T ORDER BY pk;
=>
(1, 5),
(2, 5),
(3, 5),
(4, 5),
(5, 6);

Rows 1-3 have never been updated, yet they know that their values of c1 is
5.

The requirement is driven by large tables for which add column takes too
much time and/or produces too large a transaction for comfort.

In simplified terms:

* a second “exist default” is computed and stored in
the catalogs at time of AT ADD COLUMN

* The exist default is cached in the tuple descriptor (e.g in attrdef)

* When one of the getAttr or copytuple related routines is invoked
the exist default is filled in instead of simply NULL padding if the
tuple is shorter the requested attribute number.

Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Serge Rielau

serge@rielau.com

over 9 years ago

In reply to: Vitaly Burovoy (#2)

Re: Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
On 10/5/16, Serge Rielau <serge@rielau.com> wrote:

Dear Hackers,

I’m working on a patch that expands PG’s ability to add columns to a table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case. E.g.

...

Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed. Yes, Yes, and still analyzing speed
But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk; By solution I think you mean a semantic change from what it is doing today which is: * “Now” is fixed to ALTER TABLE time for all pre-existing rows * serial will fill in the same value for all pre-existing rows Having different semantics for those would require a rewrite and probably different syntax in some form.
This is what my patch does on our PG derivative today: CREATE TABLE t (pk INT NOT NULL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO t VALUES (1), (2), (3); INSERT 0 3 postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now'; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 ----+------------------------------- 1 | 2016-10-05 21:47:58.919194+00 2 | 2016-10-05 21:47:58.919194+00 3 | 2016-10-05 21:47:58.919194+00 (3 rows)
postgres=# postgres=# ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 | c2 ----+-------------------------------+---- 1 | 2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 2016-10-05 21:47:58.919194+00 | 1 (3 rows)
postgres=# INSERT INTO t(pk) VALUES (4); INSERT 0 1 postgres=# SELECT * FROM t ORDER BY pk; pk | c1 | c2 ----+-------------------------------+---- 1 | 2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 2016-10-05 21:47:58.919194+00 | 1 4 | 2016-10-05 21:47:58.919194+00 | 2 (4 rows) P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases... Thanks. This would be my first contribution. I take it I would post a patch based on a recent PG 9.6 master for review? Or should I compose some sort of a design document?
Cheers Serge Rielau Salesforce.com

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Serge Rielau (#1)

Re: Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table without a table rewrite (i.e. at O(1) cost) from the nullable-without-default to a more general case. E.g. CREATE TABLE T(pk INT NOT NULL PRIMARY KEY); INSERT INTO T VALEUS (1), (2), (3); ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5; INSERT INTO T VALUES (4, DEFAULT); ALTER TABLE T ALTER COLUMN SET DEFAULT 6; INSERT INTO T VALUS (5, DEFAULT); SELECT * FROM T ORDER BY pk; => (1, 5), (2, 5), (3, 5), (4, 5), (5, 6);
Rows 1-3 have never been updated, yet they know that their values of c1 is 5.
The requirement is driven by large tables for which add column takes too much time and/or produces too large a transaction for comfort.
In simplified terms: * a second “exist default” is computed and stored in the catalogs at time of AT ADD COLUMN * The exist default is cached in the tuple descriptor (e.g in attrdef) * When one of the getAttr or copytuple related routines is invoked the exist default is filled in instead of simply NULL padding if the tuple is shorter the requested attribute number.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Andres Freund (#4)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Vitaly Burovoy (#5)

Re: Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Serge Rielau

serge@rielau.com

over 9 years ago

In reply to: Vitaly Burovoy (#5)

Re: Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 3:23 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Andres Freund (#6)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a
table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

I'm sorry, while I was writting "pre-alter" I meant
"pre-alter-add-column" era (not "pre-alter-set-default"), all later
default changes "current" default, whereas "pre-alter-add-column" adds
value if current column number < TupleDesc.natts.

All your DDL are in the "post-alter-add-column" era.

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Vitaly Burovoy (#8)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a
table
without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default
will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

I'm sorry, while I was writting "pre-alter" I meant
"pre-alter-add-column" era (not "pre-alter-set-default"), all later
default changes "current" default, whereas "pre-alter-add-column" adds
value if current column number < TupleDesc.natts.

All your DDL are in the "post-alter-add-column" era.

I'm so sorry, I was in a hurry. Of course,
- if current column number < TupleDesc.natts.
+ if current column number > TupleDesc.natts.

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Andres Freund (#6)

Re: Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a

table

without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized. The only
default that isn't materialized is the one in effect in the same statement
in which that column was added. Since a column can only be added once, the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

Cheers,

Jeff

#11

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Jeff Janes (#10)

Re: Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a

table

without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default will
still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized.

But that'd require a table rewrite, as none of the above INSERTs were
done when a default was in place. But each has a different "applicable"
default value.

The only
default that isn't materialized is the one in effect in the same statement
in which that column was added. Since a column can only be added once, the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

DROP DEFAULT pretty much does that, because it allows multiple (set of)
rows with no value (or a NULL) for a specific column, but with differing
applicable default values.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Andres Freund (#11)

Re: Fast AT ADD COLUMN with DEFAULTs

Andres Freund <andres@anarazel.de> writes:

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.

Need a better name for the concept, since evidently this name isn't
conveying the idea.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Pantelis Theodosiou

ypercube@gmail.com

over 9 years ago

In reply to: Jeff Janes (#10)

Re: Fast AT ADD COLUMN with DEFAULTs

On Wed, Oct 5, 2016 at 11:44 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:23:05 -0700, Vitaly Burovoy wrote:

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 11:58:33 -0700, Serge Rielau wrote:

Dear Hackers,
I’m working on a patch that expands PG’s ability to add columns to a

table

without a table rewrite (i.e. at O(1) cost) from the
nullable-without-default to a more general case.

If I understand this proposal correctly, altering a column default

will

still have trigger a rewrite unless there's previous default?

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized. The only
default that isn't materialized is the one in effect in the same statement
in which that column was added. Since a column can only be added once, the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

Cheers,

Jeff

I understood the same thing. That when the column is added the "DEFAULT
constant" means 2 things:

-a- existing rows get a value of that constant (that is not actually
written in the rows, but kept (hidden from the user) in the system tables
and only written in the rows that are updated, vacuumed, etc) and
-b- new rows, inserted after the ADD COLUMN will get the DEFAULT constant,
same way as a normal column definition would.

The b part can easily be changed later with an ALTER COLUMN that sets a new
DEFAULT.
The a part is never changed - but possibly deleted from the system table
when all rows existing before the ADD COLUMN have been updated.

Pantelis

#14

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Andres Freund (#11)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Andres Freund <andres@anarazel.de> wrote:

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

On Wed, Oct 5, 2016 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:

ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 1;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 1;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 2;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 2;
ALTER TABLE foo ALTER COLUMN withdefault DROP DEFAULT;
INSERT id = 3;
ALTER TABLE foo ALTER COLUMN withdefault SET DEFAULT 3;

The result here would be that there's three rows with a default value
for foo that's the same as their id. None of them has that column
present in the row.

My understanding is that all of those would be materialized.

But that'd require a table rewrite, as none of the above INSERTs were
done when a default was in place.

Since they did not have the default value, that tuples are written
with actual TupleDesc.natts where att_isnull for "withdefault" column
is set (actually the column does not have default for inserted tuples
in your case).

But each has a different "applicable" default value.

No, their values are constructed "from scratch", not fetched from a
heap, so "pre-alter-add-column" default is not applicable for them.

The only
default that isn't materialized is the one in effect in the same
statement
in which that column was added. Since a column can only be added once,
the
default in effect at the time the column was added can never change, no
matter what you do to the default later on.

DROP DEFAULT pretty much does that, because it allows multiple (set of)
rows with no value (or a NULL) for a specific column, but with differing
applicable default values.

DROP DEFAULT is for "post-alter-add-column" tuples, it does not
affects "pre-alter-add-column" ones.

--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Tom Lane (#12)

Re: Fast AT ADD COLUMN with DEFAULTs

On 2016-10-05 18:58:47 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

Err, yes. I forgot that altering the default of an existing column
doesn't set the default for existing values. Sorry for the noise.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Tom Lane (#12)

Re: Fast AT ADD COLUMN with DEFAULTs

I wrote:

Need a better name for the concept, since evidently this name isn't
conveying the idea.

Maybe "creation default" would work better? Point being it's the
default value at the time of column creation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Serge Rielau

serge@rielau.com

over 9 years ago

In reply to: Tom Lane (#12)

Re: Fast AT ADD COLUMN with DEFAULTs

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]
On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.
You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases. Explained so much better than I could do it :-)
I want to point out as a minor “extension” that there is no need for the default to be immutable. It is merely required that the default is evaluate at time of ADD COLUMN and then we remember the actual value for the exist default, rather than the parsed expression as we do for the “current” default. Need a better name for the concept, since evidently this name isn't
conveying the idea. By all means. Got anything in mind?
Cheers Serge Rielau

#18

Pantelis Theodosiou

ypercube@gmail.com

over 9 years ago

In reply to: Serge Rielau (#17)

Re: Fast AT ADD COLUMN with DEFAULTs

On Thu, Oct 6, 2016 at 12:05 AM, Serge Rielau <serge@rielau.com> wrote:

via Newton Mail
<https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2>

On Wed, Oct 5, 2016 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2016-10-05 15:44:56 -0700, Jeff Janes wrote:

No, "a second “exist default"" was mentioned, i.e. it is an additional
column in a system table (pg_attribute) as default column values of
the "pre-alter" era. It solves changing of the default expression of
the same column later.

Don't think that actually solves the issue. The default might be unset
for a while, for example. Essentially you'd need to be able to associate
arbitrary number of default values with an arbitrary set of rows.

I think it does work, as long as the "exists default" is immutable.
(For safety, personally, I'd restrict it to be a verbatim constant.)
The point is that you apply that when you are reading a row that has
so few columns that it must predate the original ALTER TABLE ADD COLUMN.
Subsequent ADD/DROP/SET DEFAULT affect the "active default" and hence
insertions that happen after them, but they don't affect the
interpretation of old rows. And of course all rows inserted after the
ADD COLUMN contain explicit values of the column, so their meaning is
unaffected in any case.

You do need two defaults associated with a column to make this work.
The "exists default" never changes after the column is added. But
in principle, the "exists default" just replaces the NULL value that
we implicitly insert now in such cases.

Explained so much better than I could do it :-)

I want to point out as a minor “extension” that there is no need for the
default to be immutable. It is merely required that the default is evaluate
at time of ADD COLUMN
and then we remember the actual value for the exist default, rather than
the parsed expression as we do for the “current” default.

Need a better name for the concept, since evidently this name isn't
conveying the idea.

By all means. Got anything in mind?

For comparison, SQL Server's implementation. They have a similar feature
(in their Enterprise only edition).
From https://msdn.microsoft.com/en-us/library/ms190273.aspx :

Adding NOT NULL Columns as an Online Operation

Starting with SQL Server 2012 Enterprise Edition, adding a NOT NULL column
with a default value is an online operation when the default value is
a *runtime
constant*. This means that the operation is completed almost
instantaneously regardless of the number of rows in the table. This is
because the existing rows in the table are not updated during the
operation; instead, the default value is stored only in the metadata of the
table and the value is looked up as needed in queries that access these
rows. This behavior is automatic; no additional syntax is required to
implement the online operation beyond the ADD COLUMN syntax. A runtime
constant is an expression that produces the same value at runtime for each
row in the table regardless of its determinism. For example, the constant
expression "My temporary data", or the system function GETUTCDATETIME() are
runtime constants. In contrast, the functions NEWID() or NEWSEQUENTIALID()
are not runtime constants because a unique value is produced for each row
in the table. Adding a NOT NULL column with a default value that is not a
runtime constant is always performed offline and an exclusive (SCH-M) lock
is acquired for the duration of the operation.

While the existing rows reference the value stored in metadata, the default
value is stored on the row for any new rows that are inserted and do not
specify another value for the column. The default value stored in metadata
is moved to an existing row when the row is updated (even if the actual
column is not specified in the UPDATE statement), or if the table or
clustered index is rebuilt.

#19

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Tom Lane (#16)

Re: Fast AT ADD COLUMN with DEFAULTs

On 10/5/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Need a better name for the concept, since evidently this name isn't
conveying the idea.

Maybe "creation default" would work better? Point being it's the
default value at the time of column creation.

Hmm... Personaly for me the original topic name is good enough.

But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now...
--
Best regards,
Vitaly Burovoy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Serge Rielau

serge@rielau.com

over 9 years ago

In reply to: Vitaly Burovoy (#19)

Re: Fast AT ADD COLUMN with DEFAULTs

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=9.0.74&pv=10.11.6&source=email_footer_2]
On Wed, Oct 5, 2016 at 4:19 PM, Vitaly Burovoy <vitaly.burovoy@gmail.com> wrote:
On 10/5/16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Need a better name for the concept, since evidently this name isn't
conveying the idea.

Maybe "creation default" would work better? Point being it's the
default value at the time of column creation.

Hmm... Personaly for me the original topic name is good enough. I think at issue is with the term “exist default” rather than the feature/topic name (?) But what I discover for myself is that we have pg_attrdef separately
from the pg_attribute. Why?
Is it time to join them? For not presented defaults it would be only
one bit per row(if we avoid "adsrc" as it is recommended), but for a
separate table it is 11 columns with two indexes now... In terms of footprint we may be able to remove pg_attrdef. I would consider that orthogonal to the proposed feature though. The internal representation of defaults in the tuple descriptor still needs to be a map of sorts.
To comment on Pantelis SQL Server Reference: Other vendors such as Oracle and DB2 also support this feature.
The listed restriction made me loop back to Vitaly’s original serial example: ALTER TABLE t ADD COLUMN c2 serial; and rethink Tom’s struct restriction to constants.
In PG the proposed feature would also have to be limited to immutable(?) default expressions to comply with existing behavior, which matches SQL Servers.
My current patch does not restrict that and thusly falsely "fills in" the same value for all rows.
Cheers Serge Rielau Salesforce.com

#21

Vitaly Burovoy

vitaly.burovoy@gmail.com

over 9 years ago

In reply to: Serge Rielau (#20)

#22

Serge Rielau

serge@rielau.com

over 9 years ago

In reply to: Vitaly Burovoy (#21)

#23