feature request - adding columns with default value

Started by Henrik Steffenabout 23 years ago13 messagesgeneral
Jump to latest
#1Henrik Steffen
steffen@city-map.de

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

thank you

--

Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com Tel. +49 4141 991230
mail: steffen@topconcepts.com Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

#2Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

On Friday 04 April 2003 13:38, you wrote:

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

May be I don't get it right, but altering defaults on live table, would yield
inconsistent data. Some NULLS in past rows and zeros being inserted in new
rows and that is not correct. Am I right here?

Furthermore mass updating close to 450,000 row would give you a nice waste of
dead tuples and your next vacuum will take loooong time unless you are
vacuuming in tight loops while updating.

I recommend you schedule a downtime, dump the table and reload it. This way
there would be no wasted tuples. Any change in database schema should go thr.
a scheduled maintenance , if you ask me.

Shridhar

#3Henrik Steffen
steffen@city-map.de
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

fortunately, the vacuum after the update does only take
approx. 30 seconds thanks to the ultra-fast machine we
are using...

--

Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com Tel. +49 4141 991230
mail: steffen@topconcepts.com Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

----- Original Message -----
From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>
To: "pgsql" <pgsql-general@postgresql.org>
Sent: Friday, April 04, 2003 10:24 AM
Subject: Re: [GENERAL] feature request - adding columns with default value

On Friday 04 April 2003 13:38, you wrote:

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6

minutes

all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

May be I don't get it right, but altering defaults on live table, would

yield

inconsistent data. Some NULLS in past rows and zeros being inserted in new
rows and that is not correct. Am I right here?

Furthermore mass updating close to 450,000 row would give you a nice waste

of

dead tuples and your next vacuum will take loooong time unless you are
vacuuming in tight loops while updating.

I recommend you schedule a downtime, dump the table and reload it. This

way

there would be no wasted tuples. Any change in database schema should go

thr.

Show quoted text

a scheduled maintenance , if you ask me.

Shridhar

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#4Andrew Sullivan
andrew@libertyrms.info
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

On Fri, Apr 04, 2003 at 10:08:50AM +0200, Henrik Steffen wrote:

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

Why do you think that having the back end set the default value on
those other rows won't sugger the very same problem? What esle is it
to do except implicitly iees the same UPDATE you do?

One way around this problem is to write a little script (I use perl
for this, but pick your favourite flavour) which does the updates
1000 at a time. This minimises the locking, so you don't have to
have anything WAITING for 10 minutes.

It also allows you to insert the occasional VACUUM so that your FSM
settings don't have to be huge.

A

-- 
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110
#5Diogo Biazus
diogo@ikono.com.br
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

Henrik Steffen wrote:

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

thank you

Maybe I'm wrong, but this feature was supported in 7.1,
and it was removed in 7.2.
If what I'm saying is true, I would like to know why was this feature
removed?

--
Diogo de Oliveira Biazus
diogo@ikono.com.br
Ikono Sistemas e Automação
http://www.ikono.com.br

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Diogo Biazus (#5)
Re: feature request - adding columns with default value

On Fri, 4 Apr 2003, Diogo de Oliveira Biazus wrote:

Henrik Steffen wrote:

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

thank you

Maybe I'm wrong, but this feature was supported in 7.1,
and it was removed in 7.2.
If what I'm saying is true, I would like to know why was this feature
removed?

AFAIR, some older versions accepted the syntax but then totally ignored
the extra attributes (ie, you could say something like int2 default 0, but
the default was just silently lost).

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

"Henrik Steffen" <steffen@city-map.de> writes:

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

As someone else already pointed out, allowing "alter table add column"
to specify a default would simply mean the system has to go through
these same steps behind your back. It would *not* be any faster. What
it would be is a lot worse from a concurrency standpoint, because the
"alter table" transaction has to take an exclusive lock, which would
then have to be held throughout the update of the data rows. Doing it
in separate steps allows the exclusive lock to be held only while the
critical catalog updates are performed. The UPDATE part is then just
an ordinary writer that needs no exclusive lock.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

Other transactions should only need to block if they try to update a row
already updated by the "set blah=0" transaction. You might consider
breaking the big update into a series of smaller transactions, say
updating 10% of the rows at a time. (If you vacuumed after each of
these smaller updates, you could also minimize table bloat.)

We will eventually support "add column" with default, but AFAICS it will
only be a convenience feature; if you are concerned about keeping the
table available for concurrent use, the multi-step manual approach will
always win.

regards, tom lane

#8Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

On Fri, 4 Apr 2003, Henrik Steffen wrote:

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

IIRC, ADD COLUMN adds the column to the end of the list, but doesn't
modify the actual rows stored on disk. I think when the column past the
end is accessed on a row that doesn't have it a NULL is returned. SET
DEFAULT just sets the default up but also doesn't modify the actual rows
stored on disk. Presumably SET DEFAULT should probably do the update, but
it's not likely to be appreciably faster than you doing the update.

Other options would include things like updating the rows on select (that
sounds like it could lead to wierd deadlocks between a select and an
update), returning the default rather than NULL for the access (but that
doesn't work for non-immutable defaults unless you update the row during
the select) or building a new copy of the table (requires potentially alot
of disk space). In addition, since the transaction in question probably
has an exclusive lock to the table, the other transactions need to wait
while the alter runs (so a new copy doesn't save you from locking reads
while the table is building unless you drop the lock, which is probably
dangerous). I think this was discussed in the past, so the archives can
probably do a better job than I can (since I wasn't paying all that much
attention).

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Diogo Biazus (#5)
Re: feature request - adding columns with default value

Diogo de Oliveira Biazus <diogo@ikono.com.br> writes:

Henrik Steffen wrote:

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

Maybe I'm wrong, but this feature was supported in 7.1,
and it was removed in 7.2.

THe old implementation was really equivalent to "alter table add column"
followed by "set default", because it didn't touch the actual rows
(the implied new column stayed NULL). That's not what the command
should do per SQL spec, so we took it out until we get around to
implementing the spec-compliant behavior. The current behavior conforms
to spec as far as it goes.

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#8)
Re: feature request - adding columns with default value

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

SET DEFAULT just sets the default up but also doesn't modify the actual rows
stored on disk. Presumably SET DEFAULT should probably do the update,

No, it shouldn't; the existing behavior is per spec. SET DEFAULT just
changes the default for future insertions, it's not supposed to touch
the present table contents.

ALTER TABLE ADD COLUMN with a default clause *is* supposed to fill all
rows of the table with the default. The reason we reject it is we don't
have that behavior implemented.

regards, tom lane

#11Dennis Gearon
gearond@cvc.net
In reply to: Henrik Steffen (#1)
Re: feature request - adding columns with default value

do ANY databases do this? How does it fit the SQL standard. And, whether you do
it manually or the ALTER TABLE command does it, it's still going to block other
updates.

Henrik Steffen wrote:

Show quoted text

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

thank you

--

Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com Tel. +49 4141 991230
mail: steffen@topconcepts.com Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#12Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#10)
Re: feature request - adding columns with default value

On Fri, 4 Apr 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

SET DEFAULT just sets the default up but also doesn't modify the actual rows
stored on disk. Presumably SET DEFAULT should probably do the update,

No, it shouldn't; the existing behavior is per spec. SET DEFAULT just
changes the default for future insertions, it's not supposed to touch
the present table contents.

Yeah, realized that didn't make much sense afterwards.

ALTER TABLE ADD COLUMN with a default clause *is* supposed to fill all
rows of the table with the default. The reason we reject it is we don't
have that behavior implemented.

Right. We should probably change the error text then since it doesn't
actually mention the update being necessary (and what it suggests is just
as wrong as the behavior it would have if we just did the default in one
step).

#13Wayne Armstrong
wdarmst@bacchus.com.au
In reply to: Dennis Gearon (#11)
Re: feature request - adding columns with default value

** Reply to message from Dennis Gearon <gearond@cvc.net> on Fri, 04 Apr 2003
08:19:24 -0800
Yeah,
Db2 does.
The reall usefulness of this is the form :-
Alter table add blah varchar(10) not null default 'No'

The backend appears to initialise the new column to the default value (just
about instantly - even for large tables), and you can add a not null column to
the table in one statement.

Regards,
Wayne

Show quoted text

do ANY databases do this? How does it fit the SQL standard. And, whether you do
it manually or the ALTER TABLE command does it, it's still going to block other
updates.

Henrik Steffen wrote:

hello,

when doing an "alter table blub add column blah int2"
and then issuing an "alter table blub alter column blah set default 0"
I will always have to manually "update blub set blah=0" to initialize
all existing records.... this is quite annoying while performing this
update on a running database with 100 simultaneous clients accessing
the database and about 443482 rows in the table.

it takes about 10 minutes to perform this update, and for about 6 minutes
all other updates to the table are in status "waiting", this leads to
a very high load and all max_connection setting is reached soon
(on a Dual-Xeon 2.4 GHz machine with 2 GB RAM)

i would suggest to change the "alter table add column" function to
adding a default value while adding the column if this is possible
and if this minimizes "down-time".

thank you

--

Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com Tel. +49 4141 991230
mail: steffen@topconcepts.com Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline: +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
Ihr SMS-Gateway: JETZT NEU unter: http://sms.city-map.de
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html