Random-looking primary keys in the range 100000..999999

Started by Kynn Jonesalmost 12 years ago9 messagesgeneral
Jump to latest
#1Kynn Jones
kynnjo@gmail.com

I'm looking for a way to implement pseudorandom primary keys in the range
100000..999999.

The randomization scheme does not need to be cryptographically strong. As
long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

/messages/by-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

/messages/by-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the
domain of the permutation has a cardinality that is not a power of 2, as it
is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques
that probably could do what I want to do, but their focus on cryptographic
strength makes learning and implementing them tough going, plus, the
performance will probably be poor, since high workloads are an asset for
such crypto applications. Since cryptographic strength is not something I
need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn

In reply to: Kynn Jones (#1)
Re: Random-looking primary keys in the range 100000..999999

How many rows do you plan on having in this table? Why this particular key
range?

depesz

On Fri, Jul 4, 2014 at 3:24 PM, Kynn Jones <kynnjo@gmail.com> wrote:

Show quoted text

I'm looking for a way to implement pseudorandom primary keys in the range
100000..999999.

The randomization scheme does not need to be cryptographically strong. As
long as it is not easy to figure out in a few minutes it's good enough.

My starting point for this is the following earlier message to this list:

/messages/by-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

/messages/by-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where the
domain of the permutation has a cardinality that is not a power of 2, as it
is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption" techniques
that probably could do what I want to do, but their focus on cryptographic
strength makes learning and implementing them tough going, plus, the
performance will probably be poor, since high workloads are an asset for
such crypto applications. Since cryptographic strength is not something I
need, I'm trying to find non-crypt-grade alternatives.)

Thanks in advance!

kynn

#3Kynn Jones
kynnjo@gmail.com
In reply to: hubert depesz lubaczewski (#2)
Re: Random-looking primary keys in the range 100000..999999

On Fri, Jul 4, 2014 at 10:13 AM, hubert depesz lubaczewski <depesz@gmail.com

wrote:

How many rows do you plan on having in this table?

Currently, only around 10K, but there's expectation that the number will
grow. It's hard to predict how much, hence the generous extra space.

Why this particular key range?

The requirements I've been given for the keys is that they be numeric,
reasonably easy to type (hence, no 40-digit keys), never beginning with 0,
and carrying no additional information content (or even suggesting it).
Among the pieces of information that the key should not include is the
relative time of entry into the DB (hence, the keys should be more or less
evenly distributed over the 100K-1M range).

k

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kynn Jones (#3)
Re: Random-looking primary keys in the range 100000..999999

Kynn Jones <kynnjo@gmail.com> writes:

The requirements I've been given for the keys is that they be numeric,
reasonably easy to type (hence, no 40-digit keys), never beginning with 0,
and carrying no additional information content (or even suggesting it).

Why not just

(random()*899999)::int + 100000

This is unlikely to be cryptographically secure, but you didn't say you
needed that. You will need to check for collisions, but that seems like
something you really ought to do anyway.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Kynn Jones (#1)
Re: Random-looking primary keys in the range 100000..999999

On 05/07/14 01:24, Kynn Jones wrote:

I'm looking for a way to implement pseudorandom primary keys in the
range 100000..999999.

The randomization scheme does not need to be cryptographically
strong. As long as it is not easy to figure out in a few minutes it's
good enough.

My starting point for this is the following earlier message to this list:

/messages/by-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

/messages/by-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case where
the domain of the permutation has a cardinality that is not a power of
2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption"
techniques that probably could do what I want to do, but their focus
on cryptographic strength makes learning and implementing them tough
going, plus, the performance will probably be poor, since high
workloads are an asset for such crypto applications. Since
cryptographic strength is not something I need, I'm trying to find
non-crypt-grade alternatives.)

Thanks in advance!

kynn

Hi Kynn,

How about (note that 'payload' could be any set of valid columns):

-- using a crude Linear Congruential Generator
-- not very random, but does NOT create duplicates

DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY default(100000 + (nextval('rseq') * 543537 +
997) % 900000),
payload int NOT NULL
);

INSERT INTO rtab (payload) VALUES (generate_series(1, 100000));

TABLE rtab;

Sample output:

id | payload
--------+---------
644534 | 1
288071 | 2
831608 | 3
475145 | 4
118682 | 5
662219 | 6
305756 | 7
849293 | 8
492830 | 9
136367 | 10
679904 | 11
323441 | 12
866978 | 13
510515 | 14
154052 | 15
697589 | 16
341126 | 17
884663 | 18
528200 | 19
171737 | 20

Cheers,
Gavin

#6Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Gavin Flower (#5)
Re: Random-looking primary keys in the range 100000..999999

On 05/07/14 15:48, Gavin Flower wrote:

On 05/07/14 01:24, Kynn Jones wrote:

I'm looking for a way to implement pseudorandom primary keys in the
range 100000..999999.

The randomization scheme does not need to be cryptographically
strong. As long as it is not easy to figure out in a few minutes
it's good enough.

My starting point for this is the following earlier message to this list:

/messages/by-id/49F96730.4000706@postnewspapers.com.au

The answer given to it here

/messages/by-id/448163db-cac5-4e99-8c4c-57cbc6f6af78@mm

...is really cool, but I don't see how to modify it for the case
where the domain of the permutation has a cardinality that is not a
power of 2, as it is in my case (cardinality = 900000).

---

(In the crypto world there are "format preserving encryption"
techniques that probably could do what I want to do, but their focus
on cryptographic strength makes learning and implementing them tough
going, plus, the performance will probably be poor, since high
workloads are an asset for such crypto applications. Since
cryptographic strength is not something I need, I'm trying to find
non-crypt-grade alternatives.)

Thanks in advance!

kynn

Hi Kynn,

How about (note that 'payload' could be any set of valid columns):

-- using a crude Linear Congruential Generator
-- not very random, but does NOT create duplicates

DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY default(100000 + (nextval('rseq') * 543537
+ 997) % 900000),
payload int NOT NULL
);

INSERT INTO rtab (payload) VALUES (generate_series(1, 100000));

TABLE rtab;

Sample output:

id | payload
--------+---------
644534 | 1
288071 | 2
831608 | 3
475145 | 4
118682 | 5
662219 | 6
305756 | 7
849293 | 8
492830 | 9
136367 | 10
679904 | 11
323441 | 12
866978 | 13
510515 | 14
154052 | 15
697589 | 16
341126 | 17
884663 | 18
528200 | 19
171737 | 20

Cheers,
Gavin

Hmm...

for a 10 times larger range
id int PRIMARY KEY default(1000000 + (nextval('rseq') *
543537 + 997) % 9000000),
also works!

#7Martijn van Oosterhout
kleptog@svana.org
In reply to: Kynn Jones (#1)
Re: Random-looking primary keys in the range 100000..999999

On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:

I'm looking for a way to implement pseudorandom primary keys in the range
100000..999999.

The randomization scheme does not need to be cryptographically strong. As
long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981) ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer

#8Kynn Jones
kynnjo@gmail.com
In reply to: Martijn van Oosterhout (#7)
Re: Random-looking primary keys in the range 100000..999999

Thanks to Gavin and Martijn for their suggestions. They're both simple
good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly because
I understand better how it avoids collisions, so it was easier for me to
tweak it in various ways.

In particular, I changed the prime number from 899981 to the very lucky
prime 900001. This happens to work *perfectly*, because the range of such
a generator is p-1, not p. (BTW, Martijn's choice of the "random" 2345 for
the multiplier was a somewhat lucky one, since such generators are not full
for arbitrary multipliers; for example, the one with modulus 899981 is not
full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of python's
pow function, and implemented a 3-argument pow for PL/PgSQL. I include all
the code below, including a snippet borrowed from Gavin's post, and
modified here and there. (I'm not very experienced with PL/PgSQL, so
please feel free to point out ways in which my PL/PgSQL code can be
improved.)

First the functions:

CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m bigint)
returns bigint AS $$
DECLARE
x bigint;
xx bigint;
BEGIN
IF n = 0 THEN RETURN 1; END IF;

x := bigx % m;
xx := (x * x) % m;

IF n % 2 = 0 THEN
RETURN pow_mod(xx, n/2, m);
ELSE
RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
END IF;

END;
$$ LANGUAGE plpgsql strict immutable;

-- "mcg" = "multiplicative congruential generator"
CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
BEGIN
-- CHECK (0 < i AND i < 900001)

RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
END;
$$ LANGUAGE plpgsql strict immutable;

And here's a small demo:

DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
payload int NOT NULL
);

\timing on \\ INSERT INTO rtab (payload) VALUES (generate_series(1,
900000)); \timing off
-- Timing is on.
-- INSERT 0 900000
-- Time: 201450.781 ms
-- Timing is off.

SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
-- id | payload
-- --------+---------
-- 539815 | 449991
-- 901731 | 449992
-- 878336 | 449993
-- 564275 | 449994
-- 863664 | 449995
-- 720159 | 449996
-- 987833 | 449997
-- 999471 | 449998
-- 999977 | 449999
-- 999999 | 450000
-- 921739 | 450001
-- 722684 | 450002
-- 596638 | 450003
-- 121592 | 450004
-- 687895 | 450005
-- 477734 | 450006
-- 585988 | 450007
-- 942869 | 450008
-- 175776 | 450009
-- 377207 | 450010
-- (20 rows)

kj

On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout <kleptog@svana.org>
wrote:

Show quoted text

On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:

I'm looking for a way to implement pseudorandom primary keys in the range
100000..999999.

The randomization scheme does not need to be cryptographically strong.

As

long as it is not easy to figure out in a few minutes it's good enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981) ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he

does

not attach much importance to his own thoughts.

-- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
dwQY8zHxuOQ=
=IJFu
-----END PGP SIGNATURE-----

#9Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Kynn Jones (#8)
Re: Random-looking primary keys in the range 100000..999999

Please don't top post!

See below for my comments.

On 09/07/14 07:04, Kynn Jones wrote:

Thanks to Gavin and Martijn for their suggestions. They're both simple
good-ol' LCGs, and both avoid the need to check for collisions.

I ultimately went with a multiplicative LCG, like Martijn's, mostly
because I understand better how it avoids collisions, so it was easier
for me to tweak it in various ways.

In particular, I changed the prime number from 899981 to the very
lucky prime 900001. This happens to work *perfectly*, because the
range of such a generator is p-1, not p. (BTW, Martijn's choice of
the "random" 2345 for the multiplier was a somewhat lucky one, since
such generators are not full for arbitrary multipliers; for example,
the one with modulus 899981 is not full for a multiplier of 3456, say.)

I also followed Martijn's pointer regarding the 3-argument form of
python's pow function, and implemented a 3-argument pow for PL/PgSQL.
I include all the code below, including a snippet borrowed from
Gavin's post, and modified here and there. (I'm not very experienced
with PL/PgSQL, so please feel free to point out ways in which my
PL/PgSQL code can be improved.)

First the functions:

CREATE OR REPLACE FUNCTION pow_mod(bigx bigint, n bigint, m
bigint) returns bigint AS $$
DECLARE
x bigint;
xx bigint;
BEGIN
IF n = 0 THEN RETURN 1; END IF;

x := bigx % m;
xx := (x * x) % m;

IF n % 2 = 0 THEN
RETURN pow_mod(xx, n/2, m);
ELSE
RETURN (x * pow_mod(xx, (n-1)/2, m)) % m;
END IF;

END;
$$ LANGUAGE plpgsql strict immutable;

-- "mcg" = "multiplicative congruential generator"
CREATE OR REPLACE FUNCTION mcg_900001(i bigint) returns int AS $$
BEGIN
-- CHECK (0 < i AND i < 900001)
RETURN 99999 + pow_mod(<INSERT YOUR MULTIPLIER HERE>, i, 900001);
END;
$$ LANGUAGE plpgsql strict immutable;

And here's a small demo:

DROP TABLE IF EXISTS rtab;
DROP SEQUENCE IF EXISTS rseq;

CREATE SEQUENCE rseq;

CREATE TABLE rtab
(
id int PRIMARY KEY DEFAULT mcg_900001(nextval('rseq')),
payload int NOT NULL
);

\timing on \\ INSERT INTO rtab (payload) VALUES
(generate_series(1, 900000)); \timing off
-- Timing is on.
-- INSERT 0 900000
-- Time: 201450.781 ms
-- Timing is off.

SELECT * FROM rtab WHERE 449990 < payload AND payload < 450011;
-- id | payload
-- --------+---------
-- 539815 | 449991
-- 901731 | 449992
-- 878336 | 449993
-- 564275 | 449994
-- 863664 | 449995
-- 720159 | 449996
-- 987833 | 449997
-- 999471 | 449998
-- 999977 | 449999
-- 999999 | 450000
-- 921739 | 450001
-- 722684 | 450002
-- 596638 | 450003
-- 121592 | 450004
-- 687895 | 450005
-- 477734 | 450006
-- 585988 | 450007
-- 942869 | 450008
-- 175776 | 450009
-- 377207 | 450010
-- (20 rows)

kj

On Sat, Jul 5, 2014 at 4:35 AM, Martijn van Oosterhout
<kleptog@svana.org <mailto:kleptog@svana.org>> wrote:

On Fri, Jul 04, 2014 at 09:24:31AM -0400, Kynn Jones wrote:

I'm looking for a way to implement pseudorandom primary keys in

the range

100000..999999.

The randomization scheme does not need to be cryptographically

strong. As

long as it is not easy to figure out in a few minutes it's good

enough.

Well, a trick that produces a not too easy to guess sequence is:

X(n) = p^n mod q

where q is prime. Pick the largest prime that will fit, in this case
899981 (I beleive) and some random p, say 2345.

Then 100000 + (2345^n) mod 899981

should be a sequence fitting your purpose. Unfortunatly, the pow()
function in Postgres can't be used here (too slow and it overflows),
but python has a helpful function:

In [113]: len( set( pow(2345, n, 899981) for n in range(899981) ) )
Out[113]: 899980

You could probably write an equivalent function in Postgres if
necessary.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org
<mailto:kleptog@svana.org>> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset

that he does

not attach much importance to his own thoughts.

-- Arthur Schopenhauer

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQIVAwUBU7e450vt++dL5i1EAQhwew/9Fps1rkjMl85kAhD4nj9i5Gy+Y6T71vbS
gXkHyJOVHr9r9kT8/1shG8OtTDEIKI1FEjDD5wdkTvj+K//wswPcpCIcj5eJVu5K
56v8ITYnc3/YeYthoBI829adAreP7kjBAJlB8lENTAbxkdJmRBEGA3KjEnSLj7I/
pdqlrrbkUq7r/OBFlJYFnv/YXLAFeOWQRAk+Be+UorAUmkrvoA0g7gW4VEFnQ1Qk
k1kTYIEU3HUXVDHUeYTC2jjLj7cFVhYaQ52FA950MzkpkqFAej34gpitcOFC8yf+
KSglMq4nAFNF6VCU50RwPLjMIXXbHTSYxjJ5n3qYo4CExlg0wBLcmuu25GHc69qP
wEx71lPvXb4yfI3YNNHcH3Cwgl46u5M5Dt2aqWDcr+haAy8Hmhm5zqjTcfpUhyD+
efi8B512YDr4HoDV6qEKx0MdjHUFptX34L8tjkmnNYQlXj89ATE82lUoTusiIxts
axwJwbjl81cg3ZbtfoWPQ3LXXSRNI0NhIkHX0k2xp3uJ+TX76UmPZYSzQ3M2QrhX
atFCkcE4RqY26zwtxCp27yjnKMsMkEeo4z7JIQKjkwLzHGPavNd2PFV1EfCXNhwz
aDXXZHzwymJjhgP1BH0mXrL6VD0UiQb3RTRH82RpG0MaNBRImcncCAjKlN5UzDur
dwQY8zHxuOQ=
=IJFu
-----END PGP SIGNATURE-----

Didn't I mention that my code & variable names are copyrighted and my
methods patented? :-)

More seriously, its great that you picked out what you want from
multiple replies! Understanding what is given to you is far better than
blindly cutting & pasting.

Actually I just realized that having an additive constant is essentially
meaningless in this Use Case, as we are feeding in a sequence of
consecutive integers.

Cheers,
Gavin