BUG #10591: setting newly added columns to null is slow
The following bug has been logged on the website:
Bug reference: 10591
Logged by: Eike Dierks
Email address: eike@inter.net
PostgreSQL version: 9.2.1
Operating system: OSX 10.9, centos on virtuozzo linux-2.6.18-028stab
Description:
- given demo_table with some amount of data
- add new_column (works instantely)
- add a unique contraint on new_column
- update demo_table set new_colum=null; (quite slow)
-> takes a long time: seems to rewrite the whole table
I guess that adding a new column does not yet reserve space for that new
column on disk, but that the on-disk layout is only changed as soon as I do
the update (aka making the column physically present on disk)
I fully understand that this is a sensible tradeoff and that this is the
intended behaviour.
However from my uneducated guess (not tested) I believe that a vacuum full
followed by the update would have been much faster.
I somehow believe that my first sequence of commands did make the new_column
virtual first and then physically for every row in turn, thereby taking a
long time to complete.
I'd like to suggest to add to the documentation (if it's not already in
there)
It should be stated that a vacuum full might speed up further access to a
newly introduced column.
But again I did not fully test that scenario, it's just my guess.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
"eike@inter.net" <eike@inter.net> wrote:
- given demo_table with some amount of data
- add new_column (works instantely)
- add a unique contraint on new_column
- update demo_table set new_colum=null; (quite slow)
-> takes a long time: seems to rewrite the whole table
This is not a bug. It has nothing to do with having added the
column. An UPDATE statement with no WHERE clause will delete and
re-add every row in the table, even if your update is setting a
column to the value it already has.
I guess that adding a new column does not yet reserve space for
that new column on disk, but that the on-disk layout is only
changed as soon as I do the update (aka making the column
physically present on disk)
No, the table can immediately be used, and the new column will be
NULL. The UPDATE was completely unnecessary. When in doubt, use a
WHERE clause that ensures you are not updating unnecessarily, like:
UPDATE demo_table
SET new_column = NULL
WHERE new_column IS NOT NULL;
However from my uneducated guess (not tested) I believe that a
vacuum full followed by the update would have been much faster.
It would not have been. VACUUM FULL is not often a good idea;
although you are now in a state where this table is heavily
bloated by the unnecessary update of every row, so it might
actually be worth it now.
http://rhaas.blogspot.com/2014/03/vacuum-full-doesnt-mean-vacuum-but.html
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Eike Dierks wrote
The following bug has been logged on the website:
Bug reference: 10591
Logged by: Eike Dierks
Email address:
eike@
PostgreSQL version: 9.2.1
Operating system: OSX 10.9, centos on virtuozzo linux-2.6.18-028stab
Description:- given demo_table with some amount of data
- add new_column (works instantely)
- add a unique contraint on new_column
- update demo_table set new_colum=null; (quite slow)
-> takes a long time: seems to rewrite the whole tableI guess that adding a new column does not yet reserve space for that new
column on disk, but that the on-disk layout is only changed as soon as I
do
the update (aka making the column physically present on disk)I fully understand that this is a sensible tradeoff and that this is the
intended behaviour.
As you say - this is not a bug...
However from my uneducated guess (not tested) I believe that a vacuum full
followed by the update would have been much faster.
Doubtful and not without its own problems. While the UPDATE could perform
faster the entire transaction will definately take longer as a whole.
I somehow believe that my first sequence of commands did make the
new_column
virtual first and then physically for every row in turn, thereby taking a
long time to complete.
Yes, this is how MVCC (multi-version concurrency control) works.
I'd like to suggest to add to the documentation (if it's not already in
there)
It should be stated that a vacuum full might speed up further access to a
newly introduced column.
This is not a bug and while I understand your confusion if you are going to
suggest that changes are needed is it very helpful to actually suggest what
you think needs to change.
Or, at least, tell us what parts of the documentation you did read so that
we may try and better understand how users are using the documentation and
also point to you anything you may have missed.
But again I did not fully test that scenario, it's just my guess.
If you are not going to actually test your theories you should then at least
provide logical support for your reasoning.
As a final note, a better wording for the message would be: "setting newly
added columns to null is pointless" (or maybe- "is not optimized"). But any
optimization to make this pointless use-case faster would likely slow down
real-world use-cases...
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/BUG-10591-setting-newly-added-columns-to-null-is-slow-tp5806732p5806737.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs