gen_random_uuid key collision

Started by jesusthefrogover 4 years ago11 messagesgeneral
Jump to latest
#1jesusthefrog
jesusthefrog@gmail.com

Hello, I'm hoping someone might be able to shed a little light on a strange
situation I encountered recently.

I work with a postgres instance which has dozens (probably hundreds) of
tables which each have a column defined as "uuid primary key default
gen_random_uuid()".

Most of the time this is fine, but one specific table has recently started
repeatedly having inserts fail because of a unique constraint violation
involving the primary key. The table only has ~10,000 rows, but I'll
sometimes see two or three collisions in a single day. No other table (even
those with many, many more rows) exhibit this issue.

We're running postgres 12, so I believe the gen_random_uuid function is
provided by the pgcrypto extension, but either way it'll be the same for
that entire database instance, so I can't explain why only one table would
be having problems if it were due to a bug in the function. Also, since I
believe it just uses openssl (which we have linked) to generate random
bytes, the chance of a bug should be very low.

Anyone have any thoughts on this?

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: jesusthefrog (#1)
Re: gen_random_uuid key collision

On 9/2/21 4:25 PM, jesusthefrog wrote:

Hello, I'm hoping someone might be able to shed a little light on a
strange situation I encountered recently.

Most of the time this is fine, but one specific table has recently
started repeatedly having inserts fail because of a unique constraint
violation involving the primary key. The table only has ~10,000 rows,
but I'll sometimes see two or three collisions in a single day. No other
table (even those with many, many more rows) exhibit this issue.

What is the table schema as returned by \d <table> in psql?

Anyone have any thoughts on this?

--
Adrian Klaver
adrian.klaver@aklaver.com

#3jesusthefrog
jesusthefrog@gmail.com
In reply to: Adrian Klaver (#2)
Re: gen_random_uuid key collision

On Thu, Sep 2, 2021 at 7:35 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

What is the table schema as returned by \d <table> in psql?

The tables are in various schemas; that one is in one called
"access_control", but we always set the search path explicitly to (in this
case) "access_control, public".
Anyway, if if were a problem with finding the function, I would be seeing a
different error .The function successfully runs, it is just (apparently)
occasionally generating the same UUID multiple times.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: jesusthefrog (#1)
Re: gen_random_uuid key collision

jesusthefrog <jesusthefrog@gmail.com> writes:

I work with a postgres instance which has dozens (probably hundreds) of
tables which each have a column defined as "uuid primary key default
gen_random_uuid()".

Most of the time this is fine, but one specific table has recently started
repeatedly having inserts fail because of a unique constraint violation
involving the primary key. The table only has ~10,000 rows, but I'll
sometimes see two or three collisions in a single day. No other table (even
those with many, many more rows) exhibit this issue.

That is pretty weird, all right. The only idea that comes to mind
immediately is that maybe that table's pkey index is corrupt and needs
to be reindexed. This isn't a great theory, because I don't see why
a corrupt index would lead to bogus unique-constraint errors rather
than missed ones. But at least it squares with the observation that
only that table is having issues.

BTW, are you *entirely* certain that your application never inserts
non-default values into that column?

regards, tom lane

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: jesusthefrog (#3)
Re: gen_random_uuid key collision

On 9/2/21 4:38 PM, jesusthefrog wrote:

On Thu, Sep 2, 2021 at 7:35 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

What is the table schema as returned by \d <table> in psql?

The tables are in various schemas; that one is in one called

But only one is generating errors. Schema refers to an object's
definition as well as a namespace. So what does:

\d <table>

return?

"access_control", but we always set the search path explicitly to (in
this case) "access_control, public".
Anyway, if if were a problem with finding the function, I would be
seeing a different error .The function successfully runs, it is just
(apparently) occasionally generating the same UUID multiple times.

--
-----BEGIN GEEK CODE BLOCK-----
  Version: 3.12
  GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
  N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

--
Adrian Klaver
adrian.klaver@aklaver.com

#6jesusthefrog
jesusthefrog@gmail.com
In reply to: Tom Lane (#4)
Re: gen_random_uuid key collision

On Thu, Sep 2, 2021 at 7:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

BTW, are you *entirely* certain that your application never inserts
non-default values into that column?

regards, tom lane

Yes, I double checked that we never attempt to bind a value for that
column. I'll have a go at just rebuilding the pkey index and see if that
helps.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

#7jesusthefrog
jesusthefrog@gmail.com
In reply to: Adrian Klaver (#5)
Re: gen_random_uuid key collision

On Thu, Sep 2, 2021 at 8:05 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

But only one is generating errors. Schema refers to an object's
definition as well as a namespace. So what does:

\d <table>

return?

I see what you mean. I don't have access to the instance at the moment so
I'd have to take a look tomorrow.
What, specifically, would be interesting in that output? When I looked at
it this morning, I didn't see anything which looked out of the ordinary to
me.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

#8Adrian Klaver
adrian.klaver@aklaver.com
In reply to: jesusthefrog (#7)
Re: gen_random_uuid key collision

On 9/2/21 5:38 PM, jesusthefrog wrote:

On Thu, Sep 2, 2021 at 8:05 PM Adrian Klaver <adrian.klaver@aklaver.com
<mailto:adrian.klaver@aklaver.com>> wrote:

But only one is generating errors. Schema refers to an object's
definition as well as a namespace. So what does:

\d <table>

return?

I see what you mean. I don't have access to the instance at the moment
so I'd have to take a look tomorrow.
What, specifically, would be interesting in that output? When I looked

I have no idea, probably more about eliminating possibilities then anything.

at it this morning, I didn't see anything which looked out of the
ordinary to me.

--
-----BEGIN GEEK CODE BLOCK-----
  Version: 3.12
  GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
  N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
  X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------

--
Adrian Klaver
adrian.klaver@aklaver.com

In reply to: Tom Lane (#4)
Re: gen_random_uuid key collision

On Thu, Sep 2, 2021 at 4:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That is pretty weird, all right. The only idea that comes to mind
immediately is that maybe that table's pkey index is corrupt and needs
to be reindexed. This isn't a great theory, because I don't see why
a corrupt index would lead to bogus unique-constraint errors rather
than missed ones. But at least it squares with the observation that
only that table is having issues.

This is easy enough to check using the contrib/amcheck extension.

jesusthefrog could try this, and report back what they see:

CREATE EXTENSION IF NOT EXISTS amcheck
SELECT bt_index_check('my_uuid_index', true);

If that doesn't show any errors, then there is a chance that this will:

SELECT bt_index_parent_check('my_uuid_index', true);

Note that the parent variant takes a disruptive lock that will block
write DML. You might prefer to just use the first query if this is
running in a production environment.

--
Peter Geoghegan

#10Mark Dilger
mark.dilger@enterprisedb.com
In reply to: jesusthefrog (#1)
Re: gen_random_uuid key collision

On Sep 2, 2021, at 4:25 PM, jesusthefrog <jesusthefrog@gmail.com> wrote:

Anyone have any thoughts on this?

I agree with Peter's suggestion upthread to run amcheck on the index, but if that comes back with no corruption, can you verify that there are no rules or triggers that might cause multiple copies of the rows to be inserted? Likewise, can you verify that you have no replication subscriptions that could be putting duplicates into the table?

Another idea that seems unlikely given your lack of trouble with other tables is that you might check whether you have any functions that reset the seed for your random generator. I haven't looked specifically at your uuid generator, and I don't know if it gets nondeterministic randomness from /dev/random or similar, but deterministic random generators can be made to produce the same sequence again if the seed it reset.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11jesusthefrog
jesusthefrog@gmail.com
In reply to: Peter Geoghegan (#9)
Re: gen_random_uuid key collision

Note that the parent variant takes a disruptive lock that will block
write DML. You might prefer to just use the first query if this is
running in a production environment.

Fortunately this has only been observed on the dev instance.
This morning I tried just dropping and recreating the index, so I'll see if
that has solved it. If it has, then the root cause may remain a mystery; if
not, I'll try amcheck.

--
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GIT d- s+ a- C++++ L+++ S++ B+ P++>++++ E++ W+++
N o? K- !w++++ O- M- V? PS++ PE- Y+ PGP t+ 5+++
X R+ tv b+++ DI++ D++ G+ e-- h- r++ y
------END GEEK CODE BLOCK------