Primary key vs unique index
Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables that have significant amounts of updates performed on them, as a result the primary key indexes are becoming significantly bloated. There are other indexes on the tables that also become bloated as a result of this, but these are automatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to be very low.
We don't want to remove the unique constraint of that the primary key is providing, but the space on disk will continue to grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuilt concurrently, or would this be considered bad design? The reasoning behind this would be that the unique index could be rebuilt concurrently without taking the application down or exclusively locking the table for an extending period of time. Are there other advantages to a primary key outside of a uniqueness constraint and an index?
On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote:
Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables that have significant amounts of updates performed on them, as a result the primary key indexes are becoming significantly bloated. There are other indexes on the tables that also become bloated as a result of this, but these are automatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to be very low.
We don’t want to remove the unique constraint of that the primary key is providing, but the space on disk will continue to grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuilt concurrently, or would this be considered bad design? The reasoning behind this would be that the unique index could be rebuilt concurrently without taking the application down or exclusively locking the table for an extending period of time. Are there other advantages to a primary key outside of a uniqueness constraint and an index?
So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long?
The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary key. Which for example in the db allows foreign key constraints to be created without specifying that column. And some ORM/apps/frameworks can automatically make use of the information as well. I like having them for clarity, but you really can do away with them if your deployment needs to do so.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
Is there a fundamental difference between a primary key and a unique index?
Currently we have primary keys on tables that have significant amounts of
updates performed on them, as a result the primary key indexes are becoming
significantly bloated. There are other indexes on the tables that also
become bloated as a result of this, but these are automatically rebuild
periodically by the application (using the concurrently flag) when read
usage is expected to be very low.
If you're experiencing bloat, but not deleting huge chunks of your
table at a time, then you're not vacuuming aggressively enough
On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
Is there a fundamental difference between a primary key and a unique index?
Currently we have primary keys on tables that have significant amounts of
updates performed on them, as a result the primary key indexes are becoming
significantly bloated. There are other indexes on the tables that also
become bloated as a result of this, but these are automatically rebuild
periodically by the application (using the concurrently flag) when read
usage is expected to be very low.If you're experiencing bloat, but not deleting huge chunks of your
table at a time, then you're not vacuuming aggressively enough
Or you're on 8.3 or before and blowing out your free space map.
Thanks for the reply, that's what I was looking for. I just wasn't sure if there was another compelling advantage to use primary keys instead of a unique index.
-----Original Message-----
From: Scott Ribe [mailto:scott_ribe@elevated-dev.com]
Sent: Thursday, March 17, 2011 12:13 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index
On Mar 17, 2011, at 6:19 AM, Voils, Steven M wrote:
Is there a fundamental difference between a primary key and a unique index? Currently we have primary keys on tables that have significant amounts of updates performed on them, as a result the primary key indexes are becoming significantly bloated. There are other indexes on the tables that also become bloated as a result of this, but these are automatically rebuild periodically by the application (using the concurrently flag) when read usage is expected to be very low.
We don't want to remove the unique constraint of that the primary key is providing, but the space on disk will continue to grow unbounded so we must do something. Can we replace the primary key with a unique index that could be rebuilt concurrently, or would this be considered bad design? The reasoning behind this would be that the unique index could be rebuilt concurrently without taking the application down or exclusively locking the table for an extending period of time. Are there other advantages to a primary key outside of a uniqueness constraint and an index?
So, you can't use REINDEX because it lacks the CONCURRENTLY option, and that would lock too much for too long?
The only thing the primary key designation provides beyond not null & unique is the metadata about what is the primary key. Which for example in the db allows foreign key constraints to be created without specifying that column. And some ORM/apps/frameworks can automatically make use of the information as well. I like having them for clarity, but you really can do away with them if your deployment needs to do so.
--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
(303) 722-0567 voice
Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table which I thought was the cause of the bloat. We are already performing automatic vacuums nightly.
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 2:52 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index
On Thu, Mar 17, 2011 at 12:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Thu, Mar 17, 2011 at 6:19 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
Is there a fundamental difference between a primary key and a unique index?
Currently we have primary keys on tables that have significant amounts of
updates performed on them, as a result the primary key indexes are becoming
significantly bloated. There are other indexes on the tables that also
become bloated as a result of this, but these are automatically rebuild
periodically by the application (using the concurrently flag) when read
usage is expected to be very low.If you're experiencing bloat, but not deleting huge chunks of your
table at a time, then you're not vacuuming aggressively enough
Or you're on 8.3 or before and blowing out your free space map.
On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:
Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table which I thought was the cause of the bloat. We are already performing automatic vacuums nightly.
Automatic regular vacuums? So you do or don't have autovac turned on?
What version of pg are you running (8.3 or before, 8.4 or later?)
Are your nightly vacuums FULL or regular vacuums?
Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring end users to enable autovacuum; the vacuums being performed are regular.
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 6:31 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index
On Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:
Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table which I thought was the cause of the bloat. We are already performing automatic vacuums nightly.
Automatic regular vacuums? So you do or don't have autovac turned on?
What version of pg are you running (8.3 or before, 8.4 or later?)
Are your nightly vacuums FULL or regular vacuums?
Autovacuum is generally more effective as it can run when it needs to
not having to wait til the end of the day. If you delete big chunks
several times a day autovac can keep up. Also, it's enabled by
default in 8.4 and up so the end user would have to actively turn it
off in this instance.
On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring end users to enable autovacuum; the vacuums being performed are regular.
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 6:31 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique indexOn Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table which I thought was the cause of the bloat. We are already performing automatic vacuums nightly.
Automatic regular vacuums? So you do or don't have autovac turned on?
What version of pg are you running (8.3 or before, 8.4 or later?)
Are your nightly vacuums FULL or regular vacuums?
--
To understand recursion, one must first understand recursion.
What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sections of the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Friday, March 18, 2011 8:36 AM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index
Autovacuum is generally more effective as it can run when it needs to
not having to wait til the end of the day. If you delete big chunks
several times a day autovac can keep up. Also, it's enabled by
default in 8.4 and up so the end user would have to actively turn it
off in this instance.
On Fri, Mar 18, 2011 at 6:21 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
Our application supports pg 8.4 onwards. The vacuums are run automatically by the application rather than requiring end users to enable autovacuum; the vacuums being performed are regular.
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Thursday, March 17, 2011 6:31 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique indexOn Thu, Mar 17, 2011 at 12:59 PM, Voils, Steven M
<steve@sensorswitch.com> wrote:Thanks for the reply. I should have mentioned in the first post that we do delete significant amounts of the table which I thought was the cause of the bloat. We are already performing automatic vacuums nightly.
Automatic regular vacuums? So you do or don't have autovac turned on?
What version of pg are you running (8.3 or before, 8.4 or later?)
Are your nightly vacuums FULL or regular vacuums?
--
To understand recursion, one must first understand recursion.
On Fri, Mar 18, 2011 at 8:38 AM, Voils, Steven M <steve@sensorswitch.com>wrote:
What are the general guidelines under which autovacuum will trigger? I was
unaware it was turned on by default for the newer versions. Would it be
worthwhile to leave the manual vacuuming on? Currently it runs immediately
after large sections of the tables are deleted. Or would it be expected
that autovac would pick these changes up and run anyway?
Autovacuum will tend to run after those types of changes. As described
here:
http://www.postgresql.org/docs/8.4/interactive/routine-vacuuming.html#AUTOVACUUM
What PostgreSQL looks for is a portion of the tuples to be obsoleted. The
configuration is essentially "if some portion (percentage) of the table is
obsolete, vacuum it" but also has an added scalar (base threshold) which is
required on top of that portion (scale factor).
My understanding is that the base threshold is there to prevent small tables
from being vacuumed for little or no reason, but for large tables it should
be insignificant in comparison to the scale factor. So if your scale factor
is .5, when you delete half of your table, you can expect an autovacuum to
run on the next iteration of the daemon.
Note that the default scale factor is .2 (20%) and the default base
threshold is 50. Both can be modified for the cluster as well as for
individual tables.
As someone else already alluded, VACUUM FULL is generally bad for indexes,
where VACUUM will help indexes (by reclaiming space the same way as it does
for the table). More details on the same page linked above.
Derrick
On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sections of the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?
Until 8.3 autovacuum was more of a proof of concept rather than
production ready code. By 8.3 two things had happened, vacuum
costing, which is important so you can tune vacuuming / autovacuuming
to your hardware and usage patterns, and multi-threaded autovacuuming
daemon, which meant that autovac could now handle the scenario where
one or more table would take a long (sometimes very long) time to
vacuum, especially with costing factors slowing it down, and another
table would get bloated while waiting its turn. With a server with
LOTS of random IO capability you can run quite a few threads at once,
since each one is only a small impact against the maximum IO of the
drive array. If you've got 1,000 tables and a couple dozen big ones
that can take 30 minutes or more to vacuum, it's a good thing to be
able to run autovac on more than one at a time.
The next HUGE improvement came with 8.4, which took the free space map
and put it on the drives, removing the need to constantly monitor and
adjust free space map to prevent blowout. If you've got a well tuned
<= pg 8.3 you're ok. If you need to tune an older version, it's often
easier AND safer to migrate to 8.4 or above.
Thanks for the help. I think we're only going to support pg 8.4 onwards. I'll turn off the application's manual vacuuming and just let autovac do its thing.
-----Original Message-----
From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Sent: Monday, March 21, 2011 9:13 PM
To: Voils, Steven M
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Primary key vs unique index
On Fri, Mar 18, 2011 at 6:38 AM, Voils, Steven M <steve@sensorswitch.com> wrote:
What are the general guidelines under which autovacuum will trigger? I was unaware it was turned on by default for the newer versions. Would it be worthwhile to leave the manual vacuuming on? Currently it runs immediately after large sections of the tables are deleted. Or would it be expected that autovac would pick these changes up and run anyway?
Until 8.3 autovacuum was more of a proof of concept rather than
production ready code. By 8.3 two things had happened, vacuum
costing, which is important so you can tune vacuuming / autovacuuming
to your hardware and usage patterns, and multi-threaded autovacuuming
daemon, which meant that autovac could now handle the scenario where
one or more table would take a long (sometimes very long) time to
vacuum, especially with costing factors slowing it down, and another
table would get bloated while waiting its turn. With a server with
LOTS of random IO capability you can run quite a few threads at once,
since each one is only a small impact against the maximum IO of the
drive array. If you've got 1,000 tables and a couple dozen big ones
that can take 30 minutes or more to vacuum, it's a good thing to be
able to run autovac on more than one at a time.
The next HUGE improvement came with 8.4, which took the free space map
and put it on the drives, removing the need to constantly monitor and
adjust free space map to prevent blowout. If you've got a well tuned
<= pg 8.3 you're ok. If you need to tune an older version, it's often
easier AND safer to migrate to 8.4 or above.
We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?
Thanks
RV
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Fri, Jul 06, 2012 at 09:07:53AM -0700, rverghese wrote:
We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?
http://www.depesz.com/2011/07/06/bloat-happens/
depesz
--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/
Bloat in primary key indexes has been a long standing issue (although not
faced by many), and especially since online rebuild of primary keys was
never possible in production environments.
Since version 9.1 we have a nice little feature of being able to change a
primary key's underlying index. Look at the 'table_constraint_using_index'
clause in ALTER TABLE docs [1]http://www.postgresql.org/docs/9.1/static/sql-altertable.html. And example in the same doc specifically
shows how to solve the problem in just two commands:
<quote>
To recreate a primary key constraint, without blocking updates while the
index is rebuilt:
CREATE UNIQUE INDEX CONCURRENTLY dist_id_temp_idx ON distributors (dist_id);
ALTER TABLE distributors DROP CONSTRAINT distributors_pkey,
ADD CONSTRAINT distributors_pkey PRIMARY KEY USING INDEX
dist_id_temp_idx;
</quote>
[1]: http://www.postgresql.org/docs/9.1/static/sql-altertable.html
Best regards,
PS: Shameless plug: I am credited for this feature :)
http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107778
On Fri, Jul 6, 2012 at 12:07 PM, rverghese <riyav@hotmail.com> wrote:
We are experiencing a similar problem, even though we are on 8.4 and have
been for a while, and have autovacuum turned on. I have regular concurrent
reindexes on the indexes but the primary key is seriously bloated. I was
considering doing the same thing, that is, create another primary key that
is built on a sequence ( primarily for slony) and then change my current
multi-column primary key to a unique index. Have you been able to work
around the problem in any other way?Thanks
RV--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715712.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Yes I am using that option for one of my POstgres 9.1 database and it works
well. But its still an issue with Foreign keys, which you need to drop and
recreate . Also I use Slony for replication and it uses the primary key to
check repl. So I don't want that to be interrupted by dropping PK and
recreating PK.
Thanks
RV
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Primary-key-vs-unique-index-tp3883778p5715729.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Fri, Jul 6, 2012 at 2:07 PM, rverghese <riyav@hotmail.com> wrote:
Yes I am using that option for one of my POstgres 9.1 database and it works
well. But its still an issue with Foreign keys, which you need to drop and
recreate .
Having to drop and create foriegn keys is a legitimate concern. I am
looking into improving that.
Also I use Slony for replication and it uses the primary key to
check repl. So I don't want that to be interrupted by dropping PK and
recreating PK.
If you look closely at that example, DROP and CREATE of the primary key is
being done in one command (and hence one transaction), so anything that
depends this constraint should not be affected except from the fact that
this table will be locked in exclusive mode for the duration of this
operation, which should be very short.
Best regards,
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company