Do non-sequential primary keys slow performance significantly??

Started by Damian Careyover 19 years ago6 messagesgeneral

jamianb@gmail.com

over 19 years ago

Hello,
The most difficult part of this question is justifying WHY we would
want to use random primary keys! There is a very strong reason for
doing so, although not quite compelling.

We are Java developers developing desktop applications that persist
data in postgres. This is a pretty "low spec" database as it will only
servicing a few PCs. We do this via Hibernate so our SQL & Postrges
skills and insights are relatively lacking. I certainly don't really
understand the gory internal details of postgres.

We have an internal proposal to use what are virtually random 128 bit
numbers for our primary keys. These are not truley random in any
mathematical sense, and they will be unique, but they are certainly
NOT sequential.

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys. Can anyone provide any "rules of
thumb" for how this may effect performance?? Is it a plain dumb
idea?? Or maybe it would have only modest impact??

Any comments, insights, pointers are very much appreciated,

Thanks,
-Damian

tgl@sss.pgh.pa.us

over 19 years ago

In reply to: Damian Carey (#1)

Re: Do non-sequential primary keys slow performance significantly??

"Damian C" <jamianb@gmail.com> writes:

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys.

More slowly compared to what?

If your performance bottleneck is concurrent insertions, random keys
should win over sequential keys because the insert load is distributed
over the whole index, leading to less page-level lock contention.
There might be other scenarios where sequential keys are better.

For a database servicing "only a few PCs" I'm not sure you should even
spend any time thinking about it --- do what's easiest for your
application code.

regards, tom lane

pgsql@007Marketing.com

over 19 years ago

In reply to: Damian Carey (#1)

Re: Do non-sequential primary keys slow performance

On 29/9/2006 14:59, "Damian C" <jamianb@gmail.com> wrote:

Hello,
The most difficult part of this question is justifying WHY we would
want to use random primary keys! There is a very strong reason for
doing so, although not quite compelling.

Either you want it that way because that's your first thought of how to do
it or you have reasoning behind why you want to do it that way.

We are Java developers developing desktop applications that persist
data in postgres. This is a pretty "low spec" database as it will only
servicing a few PCs. We do this via Hibernate so our SQL & Postrges
skills and insights are relatively lacking. I certainly don't really
understand the gory internal details of postgres.

We have an internal proposal to use what are virtually random 128 bit
numbers for our primary keys. These are not truley random in any
mathematical sense, and they will be unique, but they are certainly
NOT sequential.

Specifying it as the primary key or a unique index will ensure the
uniqueness of entries.

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys. Can anyone provide any "rules of
thumb" for how this may effect performance?? Is it a plain dumb
idea?? Or maybe it would have only modest impact??

Any comments, insights, pointers are very much appreciated,

As for being non-sequential this is how a live database will end up (to some
degree) - as rows get deleted the auto assigned sequential numbers get
deleted along with them and don't get re-used leaving gaps.

I believe the only performance different will stem from using a 128bit
column for the key against say a 32 or 64 bit serial. As in larger data to
save and search through and compare.

When you say that you want it as your primary key, how much will you use it
to link to other tables? If it is creating a link to 3 other tables it
wouldn't be a problem but if you use it to link 50 tables with complex joins
in your searches then you may want to reconsider or develop a work around.
Or is it simply just your unique row identifier?

The fact that you say it will only service a few pc's would lead me to think
that you aren't expecting millions of rows to be entered, this would suggest
that the performance difference would not be intolerable even if it is
noticeable to your application and you could get better performance with a
different design.

--

Shane Ambler
Postgres@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Richard Broersma Jr

rabroersma@yahoo.com

over 19 years ago

In reply to: Damian Carey (#1)

Re: Do non-sequential primary keys slow performance significantly??

The most difficult part of this question is justifying WHY we would
want to use random primary keys! There is a very strong reason for
doing so, although not quite compelling.

One problem with using random generated primary keys that I've
recently read about deal with insert failing do to primary key
duplication.

If the size of your dataset grows to become a significant percentage
of the size of the integer type used for your random primary key,
the probability of inserting a duplicated number dramatically
increases. I imagine that this problem could contribute to poor
preformance for large bulk inserts that have to add logic for
dealing with re-trying a insert if a duplicate number is created.

Regards,

Richard Broersma Jr.

BAiken@winemantech.com

over 19 years ago

In reply to: Damian Carey (#1)

Re: [NOVICE] Do non-sequential primary keys slow performance significantly??

I would expect no performance difference at all. All primary keys
automatically get an index, and the index is effectively an optimized
dictionary, hash, two-dimensional array, or list of tuples of the key
values and the address of the record for that key. Indexes are designed
to eliminate the physical performance penalty from arbitrarily large and
variable data sets.

My only trepidation is using unpredictable values for primary keys.
Certainly they're candidate keys and should be unique in the table, but
I wouldn't be comfortable using an unpredictable value as a primary key.
A surrogate key combined with a unique constraint on your random field
seems like a better choice here, but that's entirely a subjective
opinion.

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org] On Behalf Of Damian C
Sent: Friday, September 29, 2006 1:29 AM
To: pgsql-novice@postgresql.org
Subject: [NOVICE] Do non-sequential primary keys slow performance
significantly??

Hello,
The most difficult part of this question is justifying WHY we would
want to use random primary keys! There is a very strong reason for
doing so, although not quite compelling.

We are Java developers developing desktop applications that persist
data in postgres. This is a pretty "low spec" database as it will only
servicing a few PCs. We do this via Hibernate so our SQL & Postrges
skills and insights are relatively lacking. I certainly don't really
understand the gory internal details of postgres.

We have an internal proposal to use what are virtually random 128 bit
numbers for our primary keys. These are not truley random in any
mathematical sense, and they will be unique, but they are certainly
NOT sequential.

In my ignorant bliss I would suspect that postgres will run more
slowly using random primary keys. Can anyone provide any "rules of
thumb" for how this may effect performance?? Is it a plain dumb
idea?? Or maybe it would have only modest impact??

Any comments, insights, pointers are very much appreciated,

Thanks,
-Damian

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Bruno Wolff III

bruno@wolff.to

over 19 years ago

In reply to: Richard Broersma Jr (#4)

Re: Do non-sequential primary keys slow performance significantly??

On Fri, Sep 29, 2006 at 08:10:23 -0700,
Richard Broersma Jr <rabroersma@yahoo.com> wrote:

The most difficult part of this question is justifying WHY we would
want to use random primary keys! There is a very strong reason for
doing so, although not quite compelling.

One problem with using random generated primary keys that I've
recently read about deal with insert failing do to primary key
duplication.

If the size of your dataset grows to become a significant percentage
of the size of the integer type used for your random primary key,
the probability of inserting a duplicated number dramatically
increases. I imagine that this problem could contribute to poor
preformance for large bulk inserts that have to add logic for
dealing with re-trying a insert if a duplicate number is created.

They are using 128 bit keys! If their random number generator actually
works, they shouldn't have a problem until they have generated on the order
of 2^64 keys. That isn't likely to happen any time soon.