updating all records of a table

Started by Gauthier, Daveabout 15 years ago10 messagesgeneral
Jump to latest
#1Gauthier, Dave
dave.gauthier@intel.com

Hi:

I have to update all the records of a table. I'm worried about what the table will look like in terms of fragmentation when this is finished. Is there some sort of table healing/reorg/rebuild measure I should take if I want the resulting table to operate at optimal efficiency? What about indexes, should I drop/recreate those?

(I remember the bad-ole days with Oracle where table defragging and index rebuilding was something we had to do)

Thanks for any help !

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Gauthier, Dave (#1)
Re: updating all records of a table

On Thu, 2011-03-03 at 20:03 -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about what
the table will look like in terms of fragmentation when this is
finished. Is there some sort of table healing/reorg/rebuild measure I
should take if I want the resulting table to operate at optimal
efficiency? What about indexes, should I drop/recreate those?

Well it depends on the size of table but yes it is going to create a lot
of dead space. A cluster or reindex of the table will solve this for
you.

JD

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

#3Andrew Sullivan
ajs@crankycanuck.ca
In reply to: Gauthier, Dave (#1)
Re: updating all records of a table

On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about what the table will look like in terms of fragmentation when this is finished. Is there some sort of table healing/reorg/rebuild measure I should take if I want the resulting table to operate at optimal efficiency? What about indexes, should I drop/recreate those?

Is it really important that it happen in one transaction?

In the past when I've had to do this on large numbers of rows, I
always tried to do it in batches. You can run vacuums in between
groups, so that the table doesn't get too bloated.

Otherwise, yeah, you're better off to do some of the cleanup Joshua
suggested.

A

--
Andrew Sullivan
ajs@crankycanuck.ca

#4Vibhor Kumar
vibhor.kumar@enterprisedb.com
In reply to: Andrew Sullivan (#3)
Re: updating all records of a table

On Mar 4, 2011, at 5:17 PM, Andrew Sullivan wrote:

On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about what the table will look like in terms of fragmentation when this is finished. Is there some sort of table healing/reorg/rebuild measure I should take if I want the resulting table to operate at optimal efficiency? What about indexes, should I drop/recreate those?

Is it really important that it happen in one transaction?

In the past when I've had to do this on large numbers of rows, I
always tried to do it in batches. You can run vacuums in between
groups, so that the table doesn't get too bloated.

Otherwise, yeah, you're better off to do some of the cleanup Joshua
suggested.

A

+1

If UPDATE is for all rows, then
1. CTAS with change value in SELECT
2. Rename the tables. -- This will give zero Bloats.

Thanks & Regards,
Vibhor Kumar
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
vibhor.kumar@enterprisedb.com
Blog:http://vibhork.blogspot.com

#5Rob Sargent
robjsargent@gmail.com
In reply to: Vibhor Kumar (#4)
Re: updating all records of a table

On 03/04/2011 04:54 AM, Vibhor Kumar wrote:

On Mar 4, 2011, at 5:17 PM, Andrew Sullivan wrote:

On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about what the table will look like in terms of fragmentation when this is finished. Is there some sort of table healing/reorg/rebuild measure I should take if I want the resulting table to operate at optimal efficiency? What about indexes, should I drop/recreate those?

Is it really important that it happen in one transaction?

In the past when I've had to do this on large numbers of rows, I
always tried to do it in batches. You can run vacuums in between
groups, so that the table doesn't get too bloated.

Otherwise, yeah, you're better off to do some of the cleanup Joshua
suggested.

A

+1

If UPDATE is for all rows, then
1. CTAS with change value in SELECT
2. Rename the tables. -- This will give zero Bloats.

Elegant, but of course, this doubles the disk space consumed. Not
always tenable.

#6Gauthier, Dave
dave.gauthier@intel.com
In reply to: Rob Sargent (#5)
Re: updating all records of a table

I like the "cluster" and "reindex" ideas. The table is not that big and I do have the disk space. This table will also grow over time, so if the table ends up taking more space in the end, that's OK, it'll get used. The DB will also be unavailable to the users while this is happening, so I won't have to be contending with interactive users.

Thanks for all the suggestions!

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Rob Sargent
Sent: Friday, March 04, 2011 10:21 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] updating all records of a table

On 03/04/2011 04:54 AM, Vibhor Kumar wrote:

On Mar 4, 2011, at 5:17 PM, Andrew Sullivan wrote:

On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about what the table will look like in terms of fragmentation when this is finished. Is there some sort of table healing/reorg/rebuild measure I should take if I want the resulting table to operate at optimal efficiency? What about indexes, should I drop/recreate those?

Is it really important that it happen in one transaction?

In the past when I've had to do this on large numbers of rows, I
always tried to do it in batches. You can run vacuums in between
groups, so that the table doesn't get too bloated.

Otherwise, yeah, you're better off to do some of the cleanup Joshua
suggested.

A

+1

If UPDATE is for all rows, then
1. CTAS with change value in SELECT
2. Rename the tables. -- This will give zero Bloats.

Elegant, but of course, this doubles the disk space consumed. Not
always tenable.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Chris Browne
cbbrowne@acm.org
In reply to: Gauthier, Dave (#1)
Re: updating all records of a table

robjsargent@gmail.com (Rob Sargent) writes:

On 03/04/2011 04:54 AM, Vibhor Kumar wrote:

On Mar 4, 2011, at 5:17 PM, Andrew Sullivan wrote:

On Thu, Mar 03, 2011 at 08:03:59PM -0700, Gauthier, Dave wrote:

Hi:

I have to update all the records of a table. I'm worried about
what the table will look like in terms of fragmentation when this
is finished. Is there some sort of table healing/reorg/rebuild
measure I should take if I want the resulting table to operate at
optimal efficiency? What about indexes, should I drop/recreate
those?

Is it really important that it happen in one transaction?

In the past when I've had to do this on large numbers of rows, I
always tried to do it in batches. You can run vacuums in between
groups, so that the table doesn't get too bloated.

Otherwise, yeah, you're better off to do some of the cleanup Joshua
suggested.

A

+1

If UPDATE is for all rows, then
1. CTAS with change value in SELECT
2. Rename the tables. -- This will give zero Bloats.

Elegant, but of course, this doubles the disk space consumed. Not
always tenable.

... But if you needed to do it in one Swell Foop, there really wasn't
any other choice.

The only way *not* to double (or more) space consumption is to do
incremental updates, vacuuming around each increment.
--
select 'cbbrowne' || '@' || 'linuxdatabases.info';
http://www3.sympatico.ca/cbbrowne/lisp.html
We are MICROS~1. You will be assimilated. Resistance is futile.
(Attributed to B.G., Gill Bates)

#8ray
ray@aarden.us
In reply to: Gauthier, Dave (#1)
Re: updating all records of a table

This has been a great thread! I am missing something because I do not
know what CTAS is. WOuld someone please help me understand.

ray

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: ray (#8)
Re: updating all records of a table

On Sat, Mar 05, 2011 at 07:38:23AM -0800, ray wrote:

This has been a great thread! I am missing something because I do not
know what CTAS is. WOuld someone please help me understand.

Create Table As Select.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle

#10Willy-Bas Loos
willybas@gmail.com
In reply to: Martijn van Oosterhout (#9)
Re: updating all records of a table

you might consider lowering the fillfactor a bit. It will consume more
space, but it will make updates and inserts faster.

http://www.postgresql.org/docs/8.4/interactive/sql-createtable.html :
fillfactor (integer)

The fillfactor for a table is a percentage between 10 and 100. 100 (complete
packing) is the default. When a smaller fillfactor is specified,
INSERT operations
pack table pages only to the indicated percentage; the remaining space on
each page is reserved for updating rows on that page. This gives UPDATE a
chance to place the updated copy of a row on the same page as the original,
which is more efficient than placing it on a different page. For a table
whose entries are never updated, complete packing is the best choice, but in
heavily updated tables smaller fillfactors are appropriate. This parameter
cannot be set for TOAST tables.
--
"Patriotism is the conviction that your country is superior to all others
because you were born in it." -- George Bernard Shaw