Primary keys and composite unique keys(basic question)

Started by Mohan Radhakrishnanabout 5 years ago19 messagesgeneral
Jump to latest
#1Mohan Radhakrishnan
radhakrishnan.mohan@gmail.com

Hello,
We have UUIDs in our tables which are primary keys. But in
some cases
we also identify a composite unique key apart from the primary key.

My assumption is that there should be a unique key index created by us
using the composite key. And when we fetch using this composite key instead
of the primary key we have a performance boost due to the index.

Is this right ? Are there more details I should read to understand this
better ? Please point. If I should use the query planner to look at the
statistics I will.
.
Thanks,
Mohan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mohan Radhakrishnan (#1)
Re: Primary keys and composite unique keys(basic question)

Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> writes:

We have UUIDs in our tables which are primary keys. But in
some cases
we also identify a composite unique key apart from the primary key.

My assumption is that there should be a unique key index created by us
using the composite key. And when we fetch using this composite key instead
of the primary key we have a performance boost due to the index.

You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key
is often an antipattern. The UUIDs are quasi-random, meaning there's
no locality of reference in the primary key index, resulting in
inefficiency in searches and insertions. If the composite key you
mention has some actual relationship to your application's usage
patterns, it could be winning as a result of better locality of
access to that index.

regards, tom lane

#3Mohan Radhakrishnan
radhakrishnan.mohan@gmail.com
In reply to: Tom Lane (#2)
Re: Primary keys and composite unique keys(basic question)

I will cover the UUIDs first. They are indispensable to us.
1. The data is distributed over regions So we need the row to be unique.
2. This distributed data is sent to services as events. That is the
application architecture.

But we don't search using UUIDs always. Only when data from another
distributed service
is received we need them and in such cases we have to join using them.

But for local data we can identify another composite unique key. Does
PostgreSql
create a unique index for us ? What about a FK that references this
composite
unique key ? Does it create a FK index ?

Thank you.

On Wed, Mar 31, 2021 at 7:42 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Mohan Radhakrishnan <radhakrishnan.mohan@gmail.com> writes:

We have UUIDs in our tables which are primary keys. But in
some cases
we also identify a composite unique key apart from the primary key.

My assumption is that there should be a unique key index created by us
using the composite key. And when we fetch using this composite key

instead

of the primary key we have a performance boost due to the index.

You haven't provided a lot of detail, but use-a-UUID-as-a-primary-key
is often an antipattern. The UUIDs are quasi-random, meaning there's
no locality of reference in the primary key index, resulting in
inefficiency in searches and insertions. If the composite key you
mention has some actual relationship to your application's usage
patterns, it could be winning as a result of better locality of
access to that index.

regards, tom lane

#4Michael Lewis
mlewis@entrata.com
In reply to: Mohan Radhakrishnan (#3)
Re: Primary keys and composite unique keys(basic question)

Etiquette on these lists is to reply in line or below the relevant portion,
not top-post with full quoting like default gmail behavior.

On Wed, Mar 31, 2021 at 9:18 AM Mohan Radhakrishnan <
radhakrishnan.mohan@gmail.com> wrote:

But we don't search using UUIDs always. Only when data from another
distributed service
is received we need them and in such cases we have to join using them.

I haven't used them so I don't recall exactly, but I believe there is a
type of UUID generation which has some leading correlation to time which
would help with reducing the random I/O issue that Tom Lane mentioned. A
quick search of the archive may lead you to that, or someone else may chime
in with the name I expect.

But for local data we can identify another composite unique key. Does
PostgreSql
create a unique index for us ? What about a FK that references this
composite
unique key ? Does it create a FK index ?

It is up to you to create whichever fkeys and indexes you require.

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Mohan Radhakrishnan (#1)
Re: Primary keys and composite unique keys(basic question)

On Wed, Mar 31, 2021 at 3:36 AM Mohan Radhakrishnan
<radhakrishnan.mohan@gmail.com> wrote:

Hello,
We have UUIDs in our tables which are primary keys. But in some cases
we also identify a composite unique key apart from the primary key.

My assumption is that there should be a unique key index created by us using the composite key. And when we fetch using this composite key instead of the primary key we have a performance boost due to the index.

Is this right ? Are there more details I should read to understand this better ? Please point. If I should use the query planner to look at the statistics I will.

This is one of the great debates in computer science and it is not
settled. There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations. I would
never use UUIDS for keys though.

merlin

#6Rob Sargent
robjsargent@gmail.com
In reply to: Merlin Moncure (#5)
Re: Primary keys and composite unique keys(basic question)

On 4/1/21 8:28 PM, Merlin Moncure wrote:

This is one of the great debates in computer science and it is not
settled. There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations. I would
never use UUIDS for keys though.

merlin

And, pray tell, for what exactly would you use universally unique
identifiers.

#7Hemil Ruparel
hemilruparel2002@gmail.com
In reply to: Rob Sargent (#6)
Re: Primary keys and composite unique keys(basic question)

I used uuid4 for customer ids because i needed to interface with payment
providers. Is that wrong? All other places except transaction ids, i have
used serial ints

On Fri 2 Apr, 2021, 8:56 AM Rob Sargent, <robjsargent@gmail.com> wrote:

Show quoted text

On 4/1/21 8:28 PM, Merlin Moncure wrote:

This is one of the great debates in computer science and it is not
settled. There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations. I would
never use UUIDS for keys though.

merlin

And, pray tell, for what exactly would you use universally unique
identifiers.

#8Bertrand Mamasam
golgote@gmail.com
In reply to: Hemil Ruparel (#7)
Re: Primary keys and composite unique keys(basic question)

On Fri, Apr 2, 2021 at 8:57 AM Hemil Ruparel <hemilruparel2002@gmail.com>
wrote:

I used uuid4 for customer ids because i needed to interface with payment
providers. Is that wrong? All other places except transaction ids, i have
used serial ints

On Fri 2 Apr, 2021, 8:56 AM Rob Sargent, <robjsargent@gmail.com> wrote:

On 4/1/21 8:28 PM, Merlin Moncure wrote:

This is one of the great debates in computer science and it is not
settled. There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations. I would
never use UUIDS for keys though.

merlin

And, pray tell, for what exactly would you use universally unique
identifiers.

ULID perform better than UUID with btree indexes. The first part is based
on a timestamp, so if you don't mind exposing this information, they are a
better choice. There is an implementation for postgres here:
https://github.com/geckoboard/pgulid
Maybe there are others.

#9Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Merlin Moncure (#5)
Re: Primary keys and composite unique keys(basic question)

On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote:

I would never use UUIDS for keys though.

That makes me curious for your reasons.

I see the following disadvantages:

- A UUID requires twice as much storage space as a bigint.

- B-tree indexes are space optimized for inserting at the
rightmost leaf page, but UUIDs are random.

- UUIDs are more expensive to generate.

On the other hand, many processes trying to insert into
the same index page might lead to contention.

Is there anything I have missed?

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#10Merlin Moncure
mmoncure@gmail.com
In reply to: Rob Sargent (#6)
Re: Primary keys and composite unique keys(basic question)

On Thu, Apr 1, 2021 at 10:26 PM Rob Sargent <robjsargent@gmail.com> wrote:

On 4/1/21 8:28 PM, Merlin Moncure wrote:

This is one of the great debates in computer science and it is not
settled. There are various tradeoffs around using a composite key
derived from the data (aka natural key) vs generated identifiers. It's
a complex topic with many facets: performance, organization,
validation, and correctness are all relevant considerations. I would
never use UUIDS for keys though.

merlin

And, pray tell, for what exactly would you use universally unique
identifiers.

I don't disagree that UUID are an ok choice in that scenario. I'll
tell you what though, that scenario comes up fairly rarely. However,
there are a couple of alternatives if you're curious.

*) Generate ids from a generator service. This pattern is fairly
common. It has some downsides (slower, more complicated inserts
mainly) but works well in other ways. You can mitigate the
performance downsides by allocated identifiers in blocks.

*) Use sequences, but with a sequence id added as a composite or
maksed into the integer. This works pretty well in practice.

merlin

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Laurenz Albe (#9)
Re: Primary keys and composite unique keys(basic question)

On Fri, Apr 2, 2021 at 3:40 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2021-04-01 at 21:28 -0500, Merlin Moncure wrote:

I would never use UUIDS for keys though.

That makes me curious for your reasons.

I see the following disadvantages:

- A UUID requires twice as much storage space as a bigint.

- B-tree indexes are space optimized for inserting at the
rightmost leaf page, but UUIDs are random.

- UUIDs are more expensive to generate.

On the other hand, many processes trying to insert into
the same index page might lead to contention.

Is there anything I have missed?

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out. Everything is randomized, just awful. There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.

merlin

#12Rob Sargent
robjsargent@gmail.com
In reply to: Merlin Moncure (#11)
Re: Primary keys and composite unique keys(basic question)

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.

Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out. Everything is randomized, just awful. There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.

That b-tree’s have been optimized to handle serial ints might be a considered a reaction to that popular (and distasteful) choice. Perhaps there should be a ’non-optimized’ option.

#13Merlin Moncure
mmoncure@gmail.com
In reply to: Rob Sargent (#12)
Re: Primary keys and composite unique keys(basic question)

On Mon, Apr 5, 2021 at 9:37 PM Rob Sargent <robjsargent@gmail.com> wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.

Performance-wise, UUIDS are absolutely horrible for data at scale as
Tom rightly points out. Everything is randomized, just awful. There
are some alternate implementations of UUID that mitigate this but I've
never seen them used in the wild in actual code.

That b-tree’s have been optimized to handle serial ints might be a considered a reaction to that popular (and distasteful) choice. Perhaps there should be a ’non-optimized’ option.

It's not just the BTree, but the heap as well. For large tables, you
are pretty much guaranteed to read a page for each record you want to
load via the key regardless of the pattern of access. It's incredibly
wasteful regardless of the speed of the underlying storage fabric.
Very few developers actually understand this.

If computers were infinitely fast this wouldn't matter, but they aren't :-).

merlin

#14Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#12)
Re: Primary keys and composite unique keys(basic question)

On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus.  Spent too much of my life correcting
others’ use of “remembered” id’s that just happened to perfectly match the
wrong thing.

People seem to have stopped appending check digits to identifiers about 20
years ago, and I'm not sure why.

--
Angular momentum makes the world go 'round.

#15Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#14)
Re: Primary keys and composite unique keys(basic question)

On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:

 On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus. Spent too much of my life correcting others’ use of “remembered” id’s that just happened to perfectly match the wrong thing.

People seem to have stopped appending check digits to identifiers about 20 years ago, and I'm not sure why.

No the problem is “start from one”. User has item/I’d 10875 in hand and types in 10785 which of course in a sequence supplied ID steam is perfectly valid and wrong. Really hard to track down.

Show quoted text

--
Angular momentum makes the world go 'round.

#16Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#15)
Re: Primary keys and composite unique keys(basic question)

On 4/7/21 11:35 AM, Rob Sargent wrote:

On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:

 On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus.  Spent too much of my life
correcting others’ use of “remembered” id’s that just happened to
perfectly match the wrong thing.

People seem to have stopped appending check digits to identifiers about
20 years ago, and I'm not sure why.

No the problem is “start from one”. User has item/I’d 10875 in hand and
types in 10785 which of course in a sequence supplied ID steam is
perfectly valid and wrong.  Really hard to track down.

That's my point.  Adding a check digit (turning 10875 into 108753) would
have caught that, since 107853 does not match 107854 (which is 10785 with a
check digit added).

--
Angular momentum makes the world go 'round.

#17Rob Sargent
robjsargent@gmail.com
In reply to: Ron (#16)
Re: Primary keys and composite unique keys(basic question)

On 4/7/21 11:59 AM, Ron wrote:

On 4/7/21 11:35 AM, Rob Sargent wrote:

On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:

 On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus.  Spent too much of my life
correcting others’ use of “remembered” id’s that just happened to
perfectly match the wrong thing.

People seem to have stopped appending check digits to identifiers
about 20 years ago, and I'm not sure why.

No the problem is “start from one”. User has item/I’d 10875 in hand
and types in 10785 which of course in a sequence supplied ID steam is
perfectly valid and wrong.  Really hard to track down.

That's my point.  Adding a check digit (turning 10875 into 108753)
would have caught that, since 107853 does not match 107854 (which is
10785 with a check digit added).

Well you forget that 108753 is also a number in the series from 1 to
maxint.   Maybe you're on to something though: a checksum dispensing
sequence!

#18Ron
ronljohnsonjr@gmail.com
In reply to: Rob Sargent (#17)
Re: Primary keys and composite unique keys(basic question)

On 4/7/21 1:16 PM, Rob Sargent wrote:

On 4/7/21 11:59 AM, Ron wrote:

On 4/7/21 11:35 AM, Rob Sargent wrote:

On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:

 On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value.  Sequential numeric identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus.  Spent too much of my life
correcting others’ use of “remembered” id’s that just happened to
perfectly match the wrong thing.

People seem to have stopped appending check digits to identifiers about
20 years ago, and I'm not sure why.

No the problem is “start from one”. User has item/I’d 10875 in hand and
types in 10785 which of course in a sequence supplied ID steam is
perfectly valid and wrong.  Really hard to track down.

That's my point.  Adding a check digit (turning 10875 into 108753) would
have caught that, since 107853 does not match 107854 (which is 10785 with
a check digit added).

Well you forget that 108753 is also a number in the series from 1 to
maxint.   Maybe you're on to something though: a checksum dispensing sequence!

Call a function which reads the next value from the sequence, appends the
check digit and returns that number.  We were doing that 25 years ago.

--
Angular momentum makes the world go 'round.

#19Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Rob Sargent (#15)
Re: Primary keys and composite unique keys(basic question)

On 2021-04-07 10:35:55 -0600, Rob Sargent wrote:

On Apr 7, 2021, at 10:17 AM, Ron <ronljohnsonjr@gmail.com> wrote:

On 4/5/21 9:37 PM, Rob Sargent wrote:

It's a small thing, but UUIDs are absolutely not memorizable by
humans; they have zero semantic value. Sequential numeric
identifiers
are generally easier to transpose and the value gives some clues to
its age (of course, in security contexts this can be a downside).

I take the above as a definite plus. Spent too much of my life
correcting others’ use of “remembered” id’s that just happened to
perfectly match the wrong thing.

People seem to have stopped appending check digits to identifiers about 20
years ago, and I'm not sure why.

No the problem is “start from one”. User has item/I’d 10875 in hand and types
in 10785 which of course in a sequence supplied ID steam is perfectly valid and
wrong. Really hard to track down.

What I've often done when writing software for an Oracle RDBMS is to use
a global sequence instead of one sequence per table (Oracle didn't have
a SERIAL type, so you had to use an explicit sequence and write trigger
anyway). That caught wrong joins (an id value used in one table is never
used in another table) as well as most typos (since ids in a single
table were not dense).

With PostgreSQL I've stopped doing this since the SERIAL type makes it
much more convenient to have a separate sequence per table. But of
course that means that almost any table will have a row with id 10785
and one with 10875.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"