Re: ALTER TABLE DROP COLUMN
OK, I am opening this can of worms again. I personally would like to
see this code activated, even if it does take 2x the disk space to alter
a column. Hiroshi had other ideas. Where did we leave this? We have
one month to decide on a plan.
Bruce Momjian <pgman@candle.pha.pa.us> writes:
You can exclusively lock the table, then do a heap_getnext() scan over
the entire table, remove the dropped column, do a heap_insert(), then a
heap_delete() on the current tuple, making sure to skip over the tuples
inserted by the current transaction. When completed, remove the column
from pg_attribute, mark the transaction as committed (if desired), and
run vacuum over the table to remove the deleted rows.Hmm, that would work --- the new tuples commit at the same instant that
the schema updates commit, so it should be correct. You have the 2x
disk usage problem, but there's no way around that without losing
rollback ability.A potentially tricky bit will be persuading the tuple-reading and tuple-
writing subroutines to pay attention to different versions of the tuple
structure for the same table. I haven't looked to see if this will be
difficult or not. If you can pass the TupleDesc explicitly then it
shouldn't be a problem.I'd suggest that the cleanup vacuum *not* be an automatic part of
the operation; just recommend that people do it ASAP after dropping
a column. Consider needing to drop several columns...regards, tom lane
************
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Import Notes
Reply to msg id not found: 14225.951544893@sss.pgh.pa.us
Bruce Momjian <pgman@candle.pha.pa.us> writes:
OK, I am opening this can of worms again. I personally would like to
see this code activated, even if it does take 2x the disk space to alter
a column. Hiroshi had other ideas. Where did we leave this? We have
one month to decide on a plan.
I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN
isn't an especially pressing feature, and so I don't feel that we
should be hustling to squeeze it in just before beta. We're already
overdue for beta.
regards, tom lane
-----Original Message-----
From: Tom LaneBruce Momjian <pgman@candle.pha.pa.us> writes:
OK, I am opening this can of worms again. I personally would like to
see this code activated, even if it does take 2x the disk space to alter
a column. Hiroshi had other ideas. Where did we leave this? We have
one month to decide on a plan.I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN
isn't an especially pressing feature, and so I don't feel that we
should be hustling to squeeze it in just before beta. We're already
overdue for beta.
Seems some people expect the implementation in 7.1.
(recent [GENERAL} drop column?)
I could commit my local branch if people don't mind
backward incompatibility.
I've maintained the branch for more than 1 month
and it implements the following TODOs.
* Add ALTER TABLE DROP COLUMN feature
* ALTER TABLE ADD COLUMN to inherited table put column in wrong place
* Prevent column dropping if column is used by foreign key
Comments ?
Hiroshi Inoue
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Seems some people expect the implementation in 7.1.
(recent [GENERAL} drop column?)
I could commit my local branch if people don't mind
backward incompatibility.
I've lost track --- is this different from the _DROP_COLUMN_HACK__
code that's already in CVS? I really really didn't like that
implementation :-(, but I forget what other methods were being
discussed.
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.
That would be a problem --- rule dumping depends on that code to
produce correct aliases, so making it work is not optional.
regards, tom lane
Tom Lane wrote:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Seems some people expect the implementation in 7.1.
(recent [GENERAL} drop column?)
I could commit my local branch if people don't mind
backward incompatibility.I've lost track --- is this different from the _DROP_COLUMN_HACK__
code that's already in CVS? I really really didn't like that
implementation :-(, but I forget what other methods were being
discussed.
My current local trial implementation follows your idea(logical/
physical attribute numbers).
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.That would be a problem --- rule dumping depends on that code to
produce correct aliases, so making it work is not optional.
Your change has no problem if logical==physical attribute
numbers.
Regards.
Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.That would be a problem --- rule dumping depends on that code to
produce correct aliases, so making it work is not optional.
Your change has no problem if logical==physical attribute
numbers.
But if they're not, what do we do? Can we define the order of the
alias-name lists as being one or the other numbering? (Offhand I'd
say it should be logical numbering, but I haven't chased the details.)
If neither of those work, we'll need some more complex datastructure
than a simple list.
regards, tom lane
Tom Lane wrote:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.That would be a problem --- rule dumping depends on that code to
produce correct aliases, so making it work is not optional.Your change has no problem if logical==physical attribute
numbers.But if they're not, what do we do? Can we define the order of the
alias-name lists as being one or the other numbering? (Offhand I'd
say it should be logical numbering, but I haven't chased the details.)
If neither of those work, we'll need some more complex datastructure
than a simple list.
I'm not sure if we could keep invariant attribute numbers.
Though I've used physical attribute numbers as many as possible
in my trial implementation,there's already an exception.
I had to use logical attribute numbers for FieldSelect node.
Regards.
Hiroshi Inoue
At 12:05 6/10/00 +0900, Hiroshi Inoue wrote:
Tom Lane wrote:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
P.S. I've noticed that get_rte_attribute_name() seems to
break my implementation. I'm not sure if I could solve it.That would be a problem --- rule dumping depends on that code to
produce correct aliases, so making it work is not optional.Your change has no problem if logical==physical attribute
numbers.But if they're not, what do we do? Can we define the order of the
alias-name lists as being one or the other numbering? (Offhand I'd
say it should be logical numbering, but I haven't chased the details.)
If neither of those work, we'll need some more complex datastructure
than a simple list.I'm not sure if we could keep invariant attribute numbers.
Though I've used physical attribute numbers as many as possible
in my trial implementation,there's already an exception.
I had to use logical attribute numbers for FieldSelect node.
Not really a useful suggestion at this stage, but it seems to me that
storing plans and/or parse trees is possibly a false economy. Would it be
worth considering storing the relevant SQL (or a parse tree with field &
table names) and compiling the rule in each backend the first time it is
used? (and keep it for the life of the backend).
This would allow underlying view tables to be deleted/added as well as make
the above problem go away. The 'parse tree with names' would also enable
easy construction of dependency information when and if that is implemented...
----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/
seconded ...
On Fri, 29 Sep 2000, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
OK, I am opening this can of worms again. I personally would like to
see this code activated, even if it does take 2x the disk space to alter
a column. Hiroshi had other ideas. Where did we leave this? We have
one month to decide on a plan.I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN
isn't an especially pressing feature, and so I don't feel that we
should be hustling to squeeze it in just before beta. We're already
overdue for beta.regards, tom lane
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Thu, 5 Oct 2000, Tom Lane wrote:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
Seems some people expect the implementation in 7.1.
(recent [GENERAL} drop column?)
I could commit my local branch if people don't mind
backward incompatibility.
there have been several ideas thrown back and forth ... the best one that
I saw, forgetting who suggested it, had to do with the idea of locking the
table and doing an effective vacuum on that table with a 'row re-write'
happening ...
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?
Yes, I liked that too.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?
Yes, I liked that too.
What happens if you crash partway through?
I don't think it's possible to build a crash-robust rewriting ALTER
process that doesn't use 2X disk space: you must have all the old tuples
AND all the new tuples down on disk simultaneously just before you
commit. The only way around 2X disk space is to adopt some logical
renumbering approach to the columns, so that you can pretend the dropped
column isn't there anymore when it really still is.
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?Yes, I liked that too.
What happens if you crash partway through?
I don't think it's possible to build a crash-robust rewriting ALTER
process that doesn't use 2X disk space: you must have all the old tuples
AND all the new tuples down on disk simultaneously just before you
commit. The only way around 2X disk space is to adopt some logical
renumbering approach to the columns, so that you can pretend the dropped
column isn't there anymore when it really still is.
Yes, I liked the 2X disk space, and making the new tuples visible all at
once at the end.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 9 Oct 2000, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?Yes, I liked that too.
What happens if you crash partway through?
what happens if you crash partway through a vacuum?
I don't think it's possible to build a crash-robust rewriting ALTER
process that doesn't use 2X disk space: you must have all the old
tuples AND all the new tuples down on disk simultaneously just before
you commit. The only way around 2X disk space is to adopt some
logical renumbering approach to the columns, so that you can pretend
the dropped column isn't there anymore when it really still is.
how about a combination of the two? basically, we're gonna want a vacuum
of the table after the alter to clean out those extra columns that we've
marked as 'dead' ... basically, anything that avoids tht whole 2x disk
space option is cool ...
On Mon, 9 Oct 2000, Bruce Momjian wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Basically, move the first 100 rows to the end of the table file, then take
100 and write it to position 0, 101 to position 1, etc ... that way, at
max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table
size ... either method is going to lock the file for a period of time, but
one is much more friendly as far as disk space is concerned *plus*, if RAM
is available for this, it might even be something that the backend could
use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and
the table is 24Meg in size, it could do it all in memory?Yes, I liked that too.
What happens if you crash partway through?
I don't think it's possible to build a crash-robust rewriting ALTER
process that doesn't use 2X disk space: you must have all the old tuples
AND all the new tuples down on disk simultaneously just before you
commit. The only way around 2X disk space is to adopt some logical
renumbering approach to the columns, so that you can pretend the dropped
column isn't there anymore when it really still is.Yes, I liked the 2X disk space, and making the new tuples visible all at
once at the end.
man, are you ever wishy-washy on this issue, aren't you? :) you like not
using 2x, you like using 2x ... :)
The Hermit Hacker <scrappy@hub.org> writes:
What happens if you crash partway through?
what happens if you crash partway through a vacuum?
Nothing. Vacuum is crash-safe. ALTER TABLE should be too.
regards, tom lane
On Mon, 9 Oct 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
What happens if you crash partway through?
what happens if you crash partway through a vacuum?
Nothing. Vacuum is crash-safe. ALTER TABLE should be too.
Sorry, that's what I meant ... why should marking a column as 'deleted'
and running a 'vacuum' to clean up the physical table be any less
crash-safe?
On Mon, 9 Oct 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
What happens if you crash partway through?
what happens if you crash partway through a vacuum?
Nothing. Vacuum is crash-safe. ALTER TABLE should be too.
Sorry, that's what I meant ... why should marking a column as 'deleted'
and running a 'vacuum' to clean up the physical table be any less
crash-safe?
It is not. The only downside is 2x disk space to make new versions of
the tuple.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Mon, 9 Oct 2000, Bruce Momjian wrote:
On Mon, 9 Oct 2000, Tom Lane wrote:
The Hermit Hacker <scrappy@hub.org> writes:
What happens if you crash partway through?
what happens if you crash partway through a vacuum?
Nothing. Vacuum is crash-safe. ALTER TABLE should be too.
Sorry, that's what I meant ... why should marking a column as 'deleted'
and running a 'vacuum' to clean up the physical table be any less
crash-safe?It is not. The only downside is 2x disk space to make new versions of
the tuple.
huh? vacuum moves/cleans up tuples, as well as compresses them, so that
the end result is a smaller table then what it started with, at/with very
little increase in the total size/space needed to perform the vacuum ...
if we reduced vacuum such that it compressed at the field level vs tuple,
we could move a few tuples to the end of the table (crash safe) and then
move N+1 to position 1 minus that extra field. If we mark the column as
being deleted, then if the system crashes part way through, it should be
possible to continue after the system is brought up, no?
Sorry, that's what I meant ... why should marking a column as 'deleted'
and running a 'vacuum' to clean up the physical table be any less
crash-safe?It is not. The only downside is 2x disk space to make new versions of
the tuple.huh? vacuum moves/cleans up tuples, as well as compresses them, so that
the end result is a smaller table then what it started with, at/with very
little increase in the total size/space needed to perform the vacuum ...if we reduced vacuum such that it compressed at the field level vs tuple,
we could move a few tuples to the end of the table (crash safe) and then
move N+1 to position 1 minus that extra field. If we mark the column as
being deleted, then if the system crashes part way through, it should be
possible to continue after the system is brought up, no?
If it crashes in the middle, some rows have the column removed, and some
do not.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026