ALTER TABLE MODIFY COLUMN

Started by Mark Butleralmost 25 years ago4 messages
#1Mark Butler
butlerm@middle.net

I was looking at how hard it would be to support altering column types and it
seems to me that it would be trivial to support changing nullability,
increasing the maximum length of the VARCHAR data type and increasing the
precision or scale of the DECIMAL / NUMERIC data type.

Oracle allows you to update a column to null and then modify its data type to
any other type. This is easy because it stores all columns in a variable
length format and a null looks the same regardless of type.

I understand that with the current heap tuple format described in
backend/access/common.c that changing the type of any fixed length attribute
requires updating every row.

Surely if we have an write exclusive table lock we can rewrite tuples in place
rather than creating new versions with its corresponding 2x space requirement.

We could presumably do the following to change a column data type:

Preconditions:
1. Type conversion is possible from old type to new type and either
a) old type is unconditionally convertible (e.g. length/precision
increasing)
b) read locked scan of table reveals that all values are convertible
2. Exclusive write lock on table

if(new and old types are variable length and are binary compatible)
{
1. Change type in catalog
2. Done
}
else
{
1. Visit all current tuples and rewrite in place, converting attribute value
to new
type, and shifting all other attributes and null bitmask appropriately
2. Change type in catalog
3. Done
}

Does this sound reasonable? Also, is anyone working on ALTER TABLE DROP
COLUMN right now?

Speaking of which, couldn't we make it so that UPDATES and DELETES running
under an exclusive table lock do an inline vacuum?

- Mark Butler

#2Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Mark Butler (#1)
Re: ALTER TABLE MODIFY COLUMN

Mark Butler wrote:

I was looking at how hard it would be to support altering column types and it
seems to me that it would be trivial to support changing nullability,

Yes. The problem is how to formulate 'DROP CONSTRAINT' feature.

increasing the maximum length of the VARCHAR data type and increasing the
precision or scale of the DECIMAL / NUMERIC data type.

Yes. The problem is how PostgreSQL could recognize the fact.

[snip]

I understand that with the current heap tuple format described in
backend/access/common.c that changing the type of any fixed length attribute
requires updating every row.

Surely if we have an write exclusive table lock we can rewrite tuples in place
rather than creating new versions with its corresponding 2x space requirement.

PostgreSQL has a no overwrite storage manager, so this
seems to have little advantage. We now have a mechanism
to replace existent relation files safely. We could
avoid running VACUUM, multiple version of tuples in
a file etc ...

regards,
Hiroshi Inoue

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Butler (#1)
Re: ALTER TABLE MODIFY COLUMN

Mark Butler <butlerm@middle.net> writes:

Surely if we have an write exclusive table lock we can rewrite tuples
in place rather than creating new versions with its corresponding 2x
space requirement.

Nyet. Consider transaction rollback.

regards, tom lane

#4Mark Butler
butlerm@middle.net
In reply to: Mark Butler (#1)
Re: ALTER TABLE MODIFY COLUMN

Tom Lane wrote:

Mark Butler <butlerm@middle.net> writes:

Surely if we have an write exclusive table lock we can rewrite tuples
in place rather than creating new versions with its corresponding 2x
space requirement.

Nyet. Consider transaction rollback.

Well, the first thing to consider would be to make this type of DDL operation
un-abortable. If the database goes down while the table modification is in
progress, the recovery process could continue the operation to completion
before releasing the table for general access.

The problem with the standard alternatives is that they waste space and are
slow:

Alt 1. create new version of tuples in new format like DROP COLUMN proposal
Alt 2. rename column; add new column; copy column; drop (hide) old column
Alt 3. rename indices; rename table; copy table; recreate indices;

Now this probably only makes a difference in a data warehouse environment,
where the speed
of mass load / update operations is much more important than being able to
roll them back.

I suppose there are two really radical alternatives as well:

Radical Alt 1: Use format versioning to allow multiple row formats to
co-exist,
lazy update to latest format

Radical Alt 2: Delta compress different versions of the same row on the same
page

I can see that the conventional alternatives make sense for now, however.

- Mark Butler