UUID v7

amborodin86@gmail.com

almost 3 years ago

In reply to: Tom Lane (#3)

Re: UUID v7

On Fri, Feb 10, 2023 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:

Perhaps we should name the function something like
gen_time_ordered_random_uuid() instead? That gives us a bit more flexibility
about what uuid version we generate. And it might be easier for users, anyway.

I think users would be happy with any name.

Still not sure what version we'd best use for now. Perhaps v8?

V8 is just a "custom data" format. Like "place whatever you want".
Though I agree that its sample implementation looks to be better.

On Fri, Feb 10, 2023 at 5:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.

I think there is no need to rush this into v16. Let's wait for the
standardization process to play out.

Standardization per se does not bring value to users. However, I agree
that eager users can just have it today as an extension and be happy
with it [0]https://github.com/x4m/pg_uuid_next.
Maybe it's fine to wait a year for others...

Best regards, Andrey Borodin.

[0]: https://github.com/x4m/pg_uuid_next

peter.eisentraut@enterprisedb.com

almost 3 years ago

In reply to: Andres Freund (#2)

Re: UUID v7

On 11.02.23 02:14, Andres Freund wrote:

On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:

As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance.

I agree it's very useful to have.

[0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04

That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now
been taken on by a working group for formal review. If there is a
prototype implementation, this is a good time to provide feedback. But
it's too early to ship a production version.

Kyzer Davis (kydavis)

kydavis@cisco.com

almost 3 years ago

In reply to: Peter Eisentraut (#5)

1 attachment(s)

RE: UUID v7

Hello Group,

I am happy to see others interested in the improvements provided by UUIDv7!

I caught up on the thread and you all are correct.

Work has moved on GitHub from uuid6/uuid6-ietf-draft to
ietf-wg-uuidrev/rfc4122bis
- Draft 00 merged RFC4122 with Draft 04 and fixed as many problems as possible
with RFC4122.
- Draft 01 continued to iterate on RFC4122 problems:
https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-01
- Draft 02 items being changed are summarized in the latest PR for review in
the upcoming interim meeting (Feb 16th):
https://github.com/ietf-wg-uuidrev/rfc4122bis/pull/60
Note: Draft 02 should be published by the end of the week and long term we
have one more meeting at IETF 116 to iron out the replacement of RFC4122,
perform last call and submit to the IESG for official review and consideration
for replacement of RFC4122 (actual timeline for that varies based on what IESG
wants me to fix.)

That all being said:
The point is 99% of the work since adoption by the IETF has been ironing out
RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a
very good state.

If anybody has any feedback found during draft reviewing or prototyping;
please either email uuidrev@ietf.org or drop an issue on the tracker:
https://github.com/ietf-wg-uuidrev/rfc4122bis/issues

Lastly, I have added the C/SQL implementation to the prototypes page below:
https://github.com/uuid6/prototypes

Thanks!

-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Saturday, February 11, 2023 10:51 AM
To: Andres Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com; Kyzer Davis (kydavis) <kydavis@cisco.com>; Nikolay
Samokhvalov <samokhvalov@gmail.com>
Subject: Re: UUID v7

On 11.02.23 02:14, Andres Freund wrote:

On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:

As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance.

I agree it's very useful to have.

[0]
https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid
-format-04

That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis

Yes, this means that the draft that an individual had uploaded has now been
taken on by a working group for formal review. If there is a prototype
implementation, this is a good time to provide feedback. But it's too early
to ship a production version.

samokhvalov@gmail.com

over 2 years ago

In reply to: Kyzer Davis (kydavis) (#6)

Re: UUID v7

On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:

I am happy to see others interested in the improvements provided by UUIDv7!

Thank you for providing the details!

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Nikolay Samokhvalov (#7)

Re: UUID v7

On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:

I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.

--
Daniel Gustafsson

Matthias van de Meent

boekewurm+postgres@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#8)

Re: UUID v7

On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:

I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.

I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month. I don't
know how long this publishing process usually takes, but it seems like
the WG considers the text final, so unless this would take months I
wouldn't mind keeping this patch around as "waiting for external
process to complete". Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#10

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Matthias van de Meent (#9)

Re: UUID v7

On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:

Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:

I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.

I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month.

Sure, but this document is in AD Evaluation and there are many stages left in
the IESG process, it may still take a fair bit of time before this is done.

Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project. I'm only commenting
on this particular Internet standard which we have plenty of time to commit
before v17 without rushing to beat a standards committee.

Also, if you look you can see that I moved it to the next CF in a vague hope
that standardization will be swift (which is admittedly never is).

--
Daniel Gustafsson

#11

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Daniel Gustafsson (#10)

Re: UUID v7

Daniel Gustafsson <daniel@yesql.se> writes:

On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.

We have done a couple of things that way recently. An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version. I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.

regards, tom lane

#12

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Tom Lane (#11)

Re: UUID v7

On 06.07.23 16:02, Tom Lane wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.

I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.

We have done a couple of things that way recently. An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version. I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.

(I have been attending some meetings and I'm on the mailing list.)

Anyway, I think it would be reasonable to review this patch now. We
might leave it hanging in "Ready for Committer" for a while when we get
there. But surely review can start now.

#13

[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Peter Eisentraut (#12)

1 attachment(s)

Re: UUID v7

On 6 Jul 2023, at 21:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I think it would be reasonable to review this patch now.

+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness, but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated numbers.

I think that "sequence" is there to help generate local ascending identifiers when the real time clock do not provide enough resolution. So "sequence" field must be placed after 6 bytes of time-generated identifier.

On a contrary "node ID" must differentiate identifiers generated on different nodes. So it makes sense to place "node ID" before timing. So identifiers generated on different nodes will tend to be in different ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID" is there to decrease the likelihood of a collision. So my intuition might be wrong here.

Do we want to provide this "vendor-specific" UUID with tweaks for databases? Or should we limit the scope with well defined UUID v7?

Best regards, Andrey Borodin.

Attachments:

v2-0001-Implement-UUID-v7-and-v8-as-per-IETF-draft.patchapplication/octet-stream; name=v2-0001-Implement-UUID-v7-and-v8-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 9f4c97a81aae3087581a024374a06e49156f2689 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <xformmm@amazon.com>
Date: Fri, 10 Feb 2023 15:38:40 -0800
Subject: [PATCH v2] Implement UUID v7 and v8 as per IETF draft

---
 doc/src/sgml/func.sgml                   | 18 ++++-
 src/backend/utils/adt/uuid.c             | 87 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  6 ++
 src/test/regress/expected/opr_sanity.out |  2 +
 src/test/regress/expected/uuid.out       | 20 ++++++
 src/test/regress/sql/uuid.sql            | 12 ++++
 6 files changed, 144 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce4343..b8b5ee210a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13947,13 +13947,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>gen_uuid_v8</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes three functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>gen_uuid_v8</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 8 (time-ordered + random + node ID + rolling sequence number) UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..44deead6b1 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -421,3 +424,87 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	struct timeval tp;
+	uint64_t tms;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000;
+	tms += ((uint64_t)tp.tv_usec) / 1000;
+
+	tms = pg_hton64(tms<<16);
+
+	/* Fill in time part */
+	memcpy(&uuid->data[0], &tms, 6);
+
+	/* fill everything after the timestamp with random bytes */
+	if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+static uint8_t sequence_counter;
+
+Datum
+gen_uuid_v8(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	struct timeval tp;
+	uint32_t t;
+	uint16_t ut;
+	uint8_t node_id = GetSystemIdentifier();
+	uint8_t sequence = sequence_counter++;
+
+	/*
+	TODO: Consider supplying node ID and rolling sequence number
+	if (PG_NARGS() >= 1)
+		node_id = PG_GETARG_CHAR(0);
+	if (PG_NARGS() >= 2)
+		node_id = PG_GETARG_CHAR(1);
+	*/
+
+	gettimeofday(&tp, NULL);
+	t = tp.tv_sec - 1577836800;
+	t = pg_hton32(t);
+	memcpy(&uuid->data[0], &t, 4);
+
+	/* 16 bit subsecond fraction (~15 microsecond resolution) */
+	ut = ((uint64_t)tp.tv_usec << 16) / 1000000;
+	memcpy(&uuid->data[4], &ut, 2);
+
+	/* fill everything after the timestamp with random bytes */
+	if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 8" UID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-creating-a-uuidv8-value
+	 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x80;
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	uuid->data[14] = node_id;
+	uuid->data[15] = sequence;
+
+	PG_RETURN_UUID_P(uuid);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..0c82f9280f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9119,6 +9119,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '3813', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
+{ oid => '3814', descr => 'generate UUID version 8',
+  proname => 'gen_uuid_v8', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v8' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index a1bdf2c0b5..1fb9c654d3 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -857,6 +857,8 @@ sha384(bytea)
 sha512(bytea)
 gen_random_uuid()
 starts_with(text,text)
+gen_uuid_v7()
+gen_uuid_v8()
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
 macaddr8_le(macaddr8,macaddr8)
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..516d4998a7 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,25 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v8
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v8());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v8());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..0d6784e70b 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,17 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v8
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v8());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v8());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#14

Kyzer Davis (kydavis)

kydavis@cisco.com

over 2 years ago

In reply to: Andrey M. Borodin (#13)

1 attachment(s)

RE: UUID v7

Great discussions group,

I think it would be reasonable to review this patch now.

I am happy to review the format and logic for any proposed v7 and/or v8
UUID. Just point me to a PR or some code review location.

Distributed UUID Generation" states that "node ID" is there to decrease
the likelihood of a collision.

Correct, node identifiers help provide some bit space that ensures no
collision in the event the stars align where two nodes create the exact
UUID.

From what I have seen UUIDv7 should meet the requirements outlined thus far
In this thread.

Also to add, there are two UUID prototypes for postgres from my checks.
Although they are outdated from the latest draft sent up for official
Publication so review them from an academic perspective.)
- https://github.com/uuid6/prototypes
- pg_uuid_next (see this thread which nicely summarizes some UUIDv7
"checkboxes" https://github.com/x4m/pg_uuid_next/issues/1)
- UUID_v7_for_Postgres.sql

Don't forget, if we have UUIDv1 already implemented in the postgres code you
may want to examine UUIDv6.
UUIDv6 is simply a fork of that code and swap of the timestamp bits.
In terms of effort UUIDv6 easy to implement and gives you a time ordered
UUID re-using 99% of the code you may already have.

Lastly, my advice on v8 is that I would examine/implement v6 or v7 first
before jumping to v8
because whatever you do for implementing v6 or v7 will help you implement a
better v8.
There are also a number of v8 prototype implementations (at the previous
link) if somebody wants to give them a scroll.

Happy to answer any other questions where I can provide input.

Thanks,

-----Original Message-----
From: Andrey M. Borodin <x4mmm@yandex-team.ru>
Sent: Friday, July 7, 2023 8:06 AM
To: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Daniel Gustafsson <daniel@yesql.se>;
Matthias van de Meent <boekewurm+postgres@gmail.com>; Nikolay Samokhvalov
<samokhvalov@gmail.com>; Kyzer Davis (kydavis) <kydavis@cisco.com>; Andres
Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>;
PostgreSQL Hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com
Subject: Re: UUID v7

On 6 Jul 2023, at 21:38, Peter Eisentraut

<peter.eisentraut@enterprisedb.com> wrote:

I think it would be reasonable to review this patch now.

+1.

Also, I think we should discuss UUID v8. UUID version 8 provides an
RFC-compatible format for experimental or vendor-specific use cases.
Revision 1 of IETF draft contained interesting code for v8: almost similar
to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8
per [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness,
but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated
numbers.

I think that "sequence" is there to help generate local ascending
identifiers when the real time clock do not provide enough resolution. So
"sequence" field must be placed after 6 bytes of time-generated identifier.

On a contrary "node ID" must differentiate identifiers generated on
different nodes. So it makes sense to place "node ID" before timing. So
identifiers generated on different nodes will tend to be in different
ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID"
is there to decrease the likelihood of a collision. So my intuition might be
wrong here.

Do we want to provide this "vendor-specific" UUID with tweaks for databases?
Or should we limit the scope with well defined UUID v7?

Best regards, Andrey Borodin.

[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01

#15

peter.eisentraut@enterprisedb.com

over 2 years ago

In reply to: Andrey M. Borodin (#13)

Re: UUID v7

On 07.07.23 14:06, Andrey M. Borodin wrote:

Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0].

I suggest we keep this thread to v7, which has pretty straightforward
semantics for PostgreSQL. v8 by definition has many possible
implementations, so you're going to have to make pretty strong arguments
that yours is the best and only one, if you are going to claim the
gen_uuid_v8 function name.

#16

[0]: https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776 <https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776>

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Peter Eisentraut (#15)

1 attachment(s)

Re: UUID v7

On 10 Jul 2023, at 21:50, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

I suggest we keep this thread to v7, which has pretty straightforward semantics for PostgreSQL. v8 by definition has many possible implementations, so you're going to have to make pretty strong arguments that yours is the best and only one, if you are going to claim the gen_uuid_v8 function name.

Thanks Peter, I'll follow this course of action.

After discussion on GitHub with Sergey Prokhorenko [0]https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776 <https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776> I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.

Best regards, Andrey Borodin.

Attachments:

v3-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v3-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 67f4202555d1c8cf3230235e134d070ddb82173c Mon Sep 17 00:00:00 2001
From: Andrey Borodin <xformmm@amazon.com>
Date: Fri, 10 Feb 2023 15:38:40 -0800
Subject: [PATCH v3] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   | 10 ++++-
 src/backend/utils/adt/uuid.c             | 50 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  3 ++
 src/test/regress/expected/opr_sanity.out |  1 +
 src/test/regress/expected/uuid.out       | 10 +++++
 src/test/regress/sql/uuid.sql            |  6 +++
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..b2d89cf415 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13947,13 +13947,21 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes three functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..49f9c03995 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -421,3 +424,50 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint16_t sequence_counter;
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	struct timeval tp;
+	uint64_t tms;
+	uint16_t local_counter = sequence_counter++;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000;
+	tms += ((uint64_t)tp.tv_usec) / 1000;
+
+	tms = pg_hton64(tms<<16);
+
+	/* Fill in time part */
+	memcpy(&uuid->data[0], &tms, 6);
+
+
+	/* fill everything after the timestamp and counter with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/* most significant 4 bits of counter */
+	uuid->data[6] = (unsigned char)(local_counter>>12);
+	/* next 8 bits */
+	uuid->data[7] = (unsigned char)(local_counter>>4);
+	/* least significant 4 bits in a middle of a byte, leaving 2 bits of entropy */
+	uuid->data[8] = (unsigned char)(local_counter<<2);
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6996073989..b11e0382c0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9119,6 +9119,9 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '3813', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index a1bdf2c0b5..3141183b01 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -857,6 +857,7 @@ sha384(bytea)
 sha512(bytea)
 gen_random_uuid()
 starts_with(text,text)
+gen_uuid_v7()
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
 macaddr8_le(macaddr8,macaddr8)
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..fc9f50e69e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..02b8e7f10c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,11 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#17

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Andrey M. Borodin (#16)

1 attachment(s)

Re: UUID v7

On 30 Jul 2023, at 13:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.

Here's a new patch version. Now counter is initialised with strong random on every time change (each ms). However, one first bit of the counter is preserved to zero. This is done to extend counter capacity (I left comments with reference to RFC with explanations).

Thanks!

Best regards, Andrey Borodin.

Attachments:

v4-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From f53c76291c2b832aab9bcac0dd96b05ad37c37cd Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v4] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   | 10 +++-
 src/backend/utils/adt/uuid.c             | 76 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  3 +
 src/test/regress/expected/opr_sanity.out |  1 +
 src/test/regress/expected/uuid.out       | 10 ++++
 src/test/regress/sql/uuid.sql            |  6 ++
 6 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..b2d89cf415 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13947,13 +13947,21 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes three functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..fed0b1bc52 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -421,3 +424,76 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	tms = pg_hton64(tms<<16);
+
+	/* Fill in time part */
+	memcpy(&uuid->data[0], &tms, 6);
+
+	if (tms == previous_timestamp)
+	{
+		/* Time did not change from the previous generation, we must increment counter */
+		++sequence_counter;
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 12fac15ceb..4e6089060a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9125,6 +9125,9 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '3813', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index a1bdf2c0b5..3141183b01 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -857,6 +857,7 @@ sha384(bytea)
 sha512(bytea)
 gen_random_uuid()
 starts_with(text,text)
+gen_uuid_v7()
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
 macaddr8_le(macaddr8,macaddr8)
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..fc9f50e69e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..02b8e7f10c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,11 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#18

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Andrey M. Borodin (#17)

3 attachment(s)

Re: UUID v7

On 20 Aug 2023, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch>

I've observed, that pre-generating and buffering random numbers makes UUID generation 10 times faster.

Without buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 5286.572 ms (00:05.287)

With buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 390.091 ms

This can speed up gen_random_uuid() on the same scale too. PFA implementation of this technique.

Best regards, Andrey Borodin.

Attachments:

v5-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From f53c76291c2b832aab9bcac0dd96b05ad37c37cd Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v5 1/3] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   | 10 +++-
 src/backend/utils/adt/uuid.c             | 76 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  3 +
 src/test/regress/expected/opr_sanity.out |  1 +
 src/test/regress/expected/uuid.out       | 10 ++++
 src/test/regress/sql/uuid.sql            |  6 ++
 6 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..b2d89cf415 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13947,13 +13947,21 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes three functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..fed0b1bc52 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -421,3 +424,76 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	tms = pg_hton64(tms<<16);
+
+	/* Fill in time part */
+	memcpy(&uuid->data[0], &tms, 6);
+
+	if (tms == previous_timestamp)
+	{
+		/* Time did not change from the previous generation, we must increment counter */
+		++sequence_counter;
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 12fac15ceb..4e6089060a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9125,6 +9125,9 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '3813', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index a1bdf2c0b5..3141183b01 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -857,6 +857,7 @@ sha384(bytea)
 sha512(bytea)
 gen_random_uuid()
 starts_with(text,text)
+gen_uuid_v7()
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
 macaddr8_le(macaddr8,macaddr8)
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..fc9f50e69e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..02b8e7f10c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,11 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

v5-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v5-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download

From b8b61133c36babae861ec3d0f38597314308de93 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:34:55 +0300
Subject: [PATCH v5 2/3] Buffer random numbers

This allows to generate uuids 10 times faster
---
 src/backend/utils/adt/uuid.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index fed0b1bc52..a4e6349440 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -405,6 +405,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+#define UUID_RND_CACHE_LEN 512
+int rnd_cache_ptr = UUID_RND_CACHE_LEN;
+unsigned char random_cache[UUID_RND_CACHE_LEN];
+
+static bool
+cached_strong_random(void *buf, size_t len)
+{
+	if (len + rnd_cache_ptr >= UUID_RND_CACHE_LEN)
+	{
+		if (!pg_strong_random(random_cache, UUID_RND_CACHE_LEN))
+			return false;
+		rnd_cache_ptr = 0;
+	}
+	memcpy(buf, &random_cache[rnd_cache_ptr], len);
+	rnd_cache_ptr += len;
+	return true;
+}
+
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -428,7 +446,6 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 static uint32_t sequence_counter;
 static uint64_t previous_timestamp = 0;
 
-
 Datum
 gen_uuid_v7(PG_FUNCTION_ARGS)
 {
@@ -450,7 +467,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 		/* Time did not change from the previous generation, we must increment counter */
 		++sequence_counter;
 		/* fill everything after the timestamp and counter with random bytes */
-		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		if (!cached_strong_random(&uuid->data[8], UUID_LEN - 8))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
@@ -465,7 +482,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 	else
 	{
 		/* fill everything after the timestamp with random bytes */
-		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		if (!cached_strong_random(&uuid->data[6], UUID_LEN - 6))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download

From 1e76676eff1c57a7cab049c2e2e4ed0a65fa6a5b Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:35:57 +0300
Subject: [PATCH v5 3/3] Use cached random numbers in gen_random_uuid() too

---
 src/backend/utils/adt/uuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index a4e6349440..2b1d6aaae3 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -428,7 +428,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 
-	if (!pg_strong_random(uuid, UUID_LEN))
+	if (!cached_strong_random(uuid, UUID_LEN))
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

#19

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Andrey M. Borodin (#18)

3 attachment(s)

Re: UUID v7

On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward

Best regards, Andrey Borodin.

Attachments:

v6-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v6-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 12bd390775c43a9ccf53451eae35c8930f97d481 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v6 1/3] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   | 10 ++-
 src/backend/utils/adt/uuid.c             | 89 ++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  3 +
 src/test/regress/expected/opr_sanity.out |  1 +
 src/test/regress/expected/uuid.out       | 10 +++
 src/test/regress/sql/uuid.sql            |  6 ++
 6 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..b2d89cf415 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -13947,13 +13947,21 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes three functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..7a493016c9 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -421,3 +424,89 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	if (tms <= previous_timestamp)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 12fac15ceb..4e6089060a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9125,6 +9125,9 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '3813', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index a1bdf2c0b5..3141183b01 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -857,6 +857,7 @@ sha384(bytea)
 sha512(bytea)
 gen_random_uuid()
 starts_with(text,text)
+gen_uuid_v7()
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
 macaddr8_le(macaddr8,macaddr8)
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..fc9f50e69e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..02b8e7f10c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,11 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download

From d94739169c1fee8e1c7a810ca51f34130453d98d Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:35:57 +0300
Subject: [PATCH v6 3/3] Use cached random numbers in gen_random_uuid() too

---
 src/backend/utils/adt/uuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 28a79a7590..2ea4f84c91 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -428,7 +428,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 
-	if (!pg_strong_random(uuid, UUID_LEN))
+	if (!cached_strong_random(uuid, UUID_LEN))
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

v6-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v6-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download

From 4be94504ab11fe44d3a14d0a7a4a64eb7167e6d4 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:34:55 +0300
Subject: [PATCH v6 2/3] Buffer random numbers

This allows to generate uuids 10 times faster
---
 src/backend/utils/adt/uuid.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 7a493016c9..28a79a7590 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -405,6 +405,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+#define UUID_RND_CACHE_LEN 512
+int rnd_cache_ptr = UUID_RND_CACHE_LEN;
+unsigned char random_cache[UUID_RND_CACHE_LEN];
+
+static bool
+cached_strong_random(void *buf, size_t len)
+{
+	if (len + rnd_cache_ptr >= UUID_RND_CACHE_LEN)
+	{
+		if (!pg_strong_random(random_cache, UUID_RND_CACHE_LEN))
+			return false;
+		rnd_cache_ptr = 0;
+	}
+	memcpy(buf, &random_cache[rnd_cache_ptr], len);
+	rnd_cache_ptr += len;
+	return true;
+}
+
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -428,7 +446,6 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 static uint32_t sequence_counter;
 static uint64_t previous_timestamp = 0;
 
-
 Datum
 gen_uuid_v7(PG_FUNCTION_ARGS)
 {
@@ -455,7 +472,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 		tms = previous_timestamp;
 
 		/* fill everything after the timestamp and counter with random bytes */
-		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		if (!cached_strong_random(&uuid->data[8], UUID_LEN - 8))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
@@ -470,7 +487,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 	else
 	{
 		/* fill everything after the timestamp with random bytes */
-		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		if (!cached_strong_random(&uuid->data[6], UUID_LEN - 6))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

#20

Mat Arye

mat@timescaledb.com

over 2 years ago

In reply to: Andrey M. Borodin (#19)

Re: UUID v7

Andrey,

Thanks for all your work on this. I think this will be really useful.

From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for
defining constraints if partitioning by the uuid-embedded timestamps, for
instance).
- Can we add an optional timestamptz argument to gen_uuid_v7 so that you
can explicitly specify a time instead of always generating for the current
time? If the argument is NULL, then use current time. This could be useful
for backfilling and other applications.

Thanks,
Matvey Arye
Timescale software developer.

On Wed, Aug 30, 2023 at 3:05 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

Show quoted text

On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru>

wrote:

<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>

FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated
on the same backend
- counter overflow is now translated into ms step forward

Best regards, Andrey Borodin.

#21

x4mmm@yandex-team.ru

over 2 years ago

In reply to: Mat Arye (#20)

Re: UUID v7

Thanks for interesting ideas, Mat!

On 31 Aug 2023, at 20:32, Mat Arye <mat@timescaledb.com> wrote:

From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for defining constraints if partitioning by the uuid-embedded timestamps, for instance).

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

- Can we add an optional timestamptz argument to gen_uuid_v7 so that you can explicitly specify a time instead of always generating for the current time? If the argument is NULL, then use current time. This could be useful for backfilling and other applications.

I think this makes sense. We could also have a counter as an argument. I'll try to implement that.
However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

Best regards, Andrey Borodin.

#22

Chris Travers

chris.travers@gmail.com

over 2 years ago

In reply to: Andrey M. Borodin (#21)

Re: UUID v7

So I am in the process of reviewing the patch and hopefully can provide something there soon.

However I want to address in the mean time the question of timestamp functions. I know that is outside the scope of this patch but I would be in favor of adding them generally, not just as an extension but eventually into core. I understand (and generally agree with) the logic of not generally extracting timestamps from UUIDs or other such field,s but there are cases where it is really, really helpful to be able to do. In particular when you are troubleshooting misbehavior, all information you can get is helpful. And so extracting all of the subfields can be helpful.

The problem with putting this in an extension is that this is mostly useful when debugging systems (particularly larger distributed systems) and so the chances of it hitting a critical mass enough to be supported by all major cloud vendors is effectively zero.

So I am not asking for this to be included in this patch but I am saying I would love to see these sort of things contributed at some point to core.

#23

Nick Babadzhanian

pgnickb@gmail.com

over 2 years ago

In reply to: Andrey M. Borodin (#21)

Re: UUID v7

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

Also, a shameless plug with my extension for UUID v1 that implements
extract and create from (and an opclass):
https://github.com/pgnickb/uuid_v1_ops

#24

Jelte Fennema

postgres@jeltef.nl

over 2 years ago

In reply to: Nick Babadzhanian (#23)

Re: UUID v7

On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

No reasons are given but the RFC states this:

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

Show quoted text

Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)

#25

Brad Peabody

brad@peabody.io

over 2 years ago

In reply to: Jelte Fennema (#24)

Re: UUID v7

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

I guess some of the detail may have been edited out over time with all of the changes, but it’s basically this: https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/main/draft-ietf-uuidrev-rfc4122bis.md#opacity-opacity. The rationale is that when you introspect a UUID you essentially add interoperability concerns. E.g. if we say that applications can rely on being able to parse the timestamp from the UUID then it means that other implementations must provide guarantees about what that timestamp is. And since the point of a UUID is to provide a unique value, not to transmit additional metadata, the decision was made early on that it’s more realistic and representative of the reality of the situation to say that applications should generate values, try not to parse them if they don’t have to, but if they do it’s only going to be as accurate as the original data put into it. So systems with no NTP enabled, or that fuzz part of the time so as not to leak the exact moment in time something was done, etc - those are things that are going to happen and so buyer beware when parsing.

If the question is whether or not a function should exist to parse a timestamp from a UUID, I would say sure go ahead, just mention that the timestamp is only accurate as the input, and the spec doesn’t guarantee anything if your UUID came from another source. I imagine a common case would be UUIDs generated in within the same database, and someone wants to extract the timestamp, which would be as reliable as the timestamp on the database machine - seems like a perfectly good case where supporting timestamp extraction as practical value.

Show quoted text

On Oct 9, 2023, at 11:11 AM, Jelte Fennema <postgres@jeltef.nl> wrote:

On Mon, 9 Oct 2023 at 18:46, Nick Babadzhanian <pgnickb@gmail.com> wrote:

On Thu, 31 Aug 2023 at 23:10, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But we still can have such functions...maybe as an extension?

Do you know of any reason for that?

No reasons are given but the RFC states this:

UUIDs SHOULD be treated as opaque values and implementations SHOULD NOT examine the bits in a UUID to whatever extent is possible. However, where necessary, inspectors should refer to Section 4 for more information on determining UUID version and variant.

However, so far I haven't figured out how to implement optional arguments for catalog functions. I'd appreciate any pointers here.

I'd argue that the time argument shouldn't be optional. Asking the
user to supply time would force them to think whether they want to go
with `now()` or `clock_timestamp()` or something else.

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover. I think using a monotonic clock source is the
only reasonable thing to do. From the RFC:

Implementations SHOULD use the current timestamp from a reliable source to provide values that are time-ordered and continually increasing. Care SHOULD be taken to ensure that timestamp changes from the environment or operating system are handled in a way that is consistent with implementation requirements. For example, if it is possible for the system clock to move backward due to either manual adjustment or corrections from a time synchronization protocol, implementations must decide how to handle such cases. (See Altering, Fuzzing, or Smearing bullet below.)

#26

amborodin86@gmail.com

over 2 years ago

In reply to: Jelte Fennema (#24)

Re: UUID v7

On Mon, Oct 9, 2023 at 11:11 PM Jelte Fennema <postgres@jeltef.nl> wrote:

I think using `now()` is quite prone to sequence rollover. With the
current patch inserting more than 2^18~=0.26M rows into a table with
`gen_uuid_v7()` as the default in a single transaction would already
cause sequence rollover.

Well, the current patch will just use now()+1ms when 2^18 is
exhausted. Even if now() would be passed as an argument (however
current patch does not support an argument).

Best regards, Andrey Borodin.

#27

x4mmm@yandex-team.ru

about 2 years ago

In reply to: Andrey Borodin (#26)

3 attachment(s)

Re: UUID v7

On 9 Oct 2023, at 23:46, Andrey Borodin <amborodin86@gmail.com> wrote:

Here's next iteration of the patch. I've added get_uuid_v7_time().
This function extracts timestamp from uuid, iff it is v7. Timestamp correctness only guaranteed if the timestamp was generated by the same implementation (6 bytes for milliseconds obtained by gettimeofday()).
Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more tolerant values for slow test machines.

Best regards, Andrey Borodin.

Attachments:

v7-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v7-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 5e0127ec534fac1eda0218657ee358ae756a0e2c Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v7 1/3] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  18 +++-
 src/backend/utils/adt/uuid.c             | 114 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |   6 ++
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  22 +++++
 src/test/regress/sql/uuid.sql            |  14 +++
 6 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42c0..7a1b728bed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>get_uuid_v7_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes two functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>get_uuid_v7_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID version 7.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..3455b9f564 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -20,6 +23,7 @@
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +425,113 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	if (tms <= previous_timestamp)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+get_uuid_v7_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if (((uuid->data[6] & 0xf0) != 0x70)
+		|| ((uuid->data[8] & 0xc0) != 0x80))
+		elog(ERROR,"get_uuid_v7_time() can only extract timestamp from UUID v7");
+
+	tms =			  uuid->data[5];
+	tms += ((uint64_t)uuid->data[4]) << 8;
+	tms += ((uint64_t)uuid->data[3]) << 16;
+	tms += ((uint64_t)uuid->data[2]) << 24;
+	tms += ((uint64_t)uuid->data[1]) << 32;
+	tms += ((uint64_t)uuid->data[0]) << 40;
+
+	ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	PG_RETURN_TIMESTAMPTZ(ts);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b67784731..209d1bd0ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
+{ oid => '9896', descr => 'extract timestamp from UUID version 7',
+  proname => 'get_uuid_v7_time', proleakproof => 't', provolatile => 'i',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'get_uuid_v7_time' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..853d3574a3 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,8 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+gen_uuid_v7()
+get_uuid_v7_time(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..f68af659a0 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '1ms') AND (d >= '-1ms') FROM uuid_time_extraction;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+ERROR:  get_uuid_v7_time() can only extract timestamp from UUID v7
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..6843e17e88 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '1ms') AND (d >= '-1ms') FROM uuid_time_extraction;
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

v7-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v7-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download

From 594c2fda15c3a647b4b5a423a110821344e98995 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:35:57 +0300
Subject: [PATCH v7 3/3] Use cached random numbers in gen_random_uuid() too

---
 src/backend/utils/adt/uuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 6b1ea457cb..af88ec2490 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -429,7 +429,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 
-	if (!pg_strong_random(uuid, UUID_LEN))
+	if (!cached_strong_random(uuid, UUID_LEN))
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

v7-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v7-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download

From dc1691ffa968b485adf8587b03f2be12f2e4e4f2 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:34:55 +0300
Subject: [PATCH v7 2/3] Buffer random numbers

This allows to generate uuids 10 times faster
---
 src/backend/utils/adt/uuid.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 3455b9f564..6b1ea457cb 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -406,6 +406,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+#define UUID_RND_CACHE_LEN 512
+int rnd_cache_ptr = UUID_RND_CACHE_LEN;
+unsigned char random_cache[UUID_RND_CACHE_LEN];
+
+static bool
+cached_strong_random(void *buf, size_t len)
+{
+	if (len + rnd_cache_ptr >= UUID_RND_CACHE_LEN)
+	{
+		if (!pg_strong_random(random_cache, UUID_RND_CACHE_LEN))
+			return false;
+		rnd_cache_ptr = 0;
+	}
+	memcpy(buf, &random_cache[rnd_cache_ptr], len);
+	rnd_cache_ptr += len;
+	return true;
+}
+
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -429,7 +447,6 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 static uint32_t sequence_counter;
 static uint64_t previous_timestamp = 0;
 
-
 Datum
 gen_uuid_v7(PG_FUNCTION_ARGS)
 {
@@ -456,7 +473,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 		tms = previous_timestamp;
 
 		/* fill everything after the timestamp and counter with random bytes */
-		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		if (!cached_strong_random(&uuid->data[8], UUID_LEN - 8))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
@@ -471,7 +488,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 	else
 	{
 		/* fill everything after the timestamp with random bytes */
-		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		if (!cached_strong_random(&uuid->data[6], UUID_LEN - 6))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

#28

x4mmm@yandex-team.ru

about 2 years ago

In reply to: Andrey M. Borodin (#27)

3 attachment(s)

Re: UUID v7

On 2 Jan 2024, at 14:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Tests verify that get_uuid_v7_time(gen_uuid_v7()) differs no more than 1ms from now(). Maybe we should allow more tolerant values for slow test machines.

Indeed, CFbot complained about flaky tests. I've increased test tolerance to 100ms. (this does not affect test time)

Best regards, Andrey Borodin.

Attachments:

v8-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v8-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 175dff14da96fe531c3c42fba114642236a9dd94 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v8 1/3] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  18 +++-
 src/backend/utils/adt/uuid.c             | 114 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |   6 ++
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  22 +++++
 src/test/regress/sql/uuid.sql            |  14 +++
 6 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42c0..7a1b728bed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>get_uuid_v7_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes two functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>get_uuid_v7_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID version 7.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f7aa768fd..3455b9f564 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -20,6 +23,7 @@
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +425,113 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	if (tms <= previous_timestamp)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+get_uuid_v7_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if (((uuid->data[6] & 0xf0) != 0x70)
+		|| ((uuid->data[8] & 0xc0) != 0x80))
+		elog(ERROR,"get_uuid_v7_time() can only extract timestamp from UUID v7");
+
+	tms =			  uuid->data[5];
+	tms += ((uint64_t)uuid->data[4]) << 8;
+	tms += ((uint64_t)uuid->data[3]) << 16;
+	tms += ((uint64_t)uuid->data[2]) << 24;
+	tms += ((uint64_t)uuid->data[1]) << 32;
+	tms += ((uint64_t)uuid->data[0]) << 40;
+
+	ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	PG_RETURN_TIMESTAMPTZ(ts);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b67784731..209d1bd0ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
+{ oid => '9896', descr => 'extract timestamp from UUID version 7',
+  proname => 'get_uuid_v7_time', proleakproof => 't', provolatile => 'i',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'get_uuid_v7_time' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..853d3574a3 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,8 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+gen_uuid_v7()
+get_uuid_v7_time(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..dfb446b4cb 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '100ms') AND (d >= '-100ms') FROM uuid_time_extraction;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+ERROR:  get_uuid_v7_time() can only extract timestamp from UUID v7
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..dccb8335cd 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '100ms') AND (d >= '-100ms') FROM uuid_time_extraction;
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

v8-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v8-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download

From f32820ed2642dd17c623232e3999c0d39f5cf2f0 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:35:57 +0300
Subject: [PATCH v8 3/3] Use cached random numbers in gen_random_uuid() too

---
 src/backend/utils/adt/uuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 6b1ea457cb..af88ec2490 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -429,7 +429,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 
-	if (!pg_strong_random(uuid, UUID_LEN))
+	if (!cached_strong_random(uuid, UUID_LEN))
 		ereport(ERROR,
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

v8-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v8-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download

From 68370cc8c9d1079c0ef5617ef9ca5a67afb30b30 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Mon, 21 Aug 2023 11:34:55 +0300
Subject: [PATCH v8 2/3] Buffer random numbers

This allows to generate uuids 10 times faster
---
 src/backend/utils/adt/uuid.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 3455b9f564..6b1ea457cb 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -406,6 +406,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+#define UUID_RND_CACHE_LEN 512
+int rnd_cache_ptr = UUID_RND_CACHE_LEN;
+unsigned char random_cache[UUID_RND_CACHE_LEN];
+
+static bool
+cached_strong_random(void *buf, size_t len)
+{
+	if (len + rnd_cache_ptr >= UUID_RND_CACHE_LEN)
+	{
+		if (!pg_strong_random(random_cache, UUID_RND_CACHE_LEN))
+			return false;
+		rnd_cache_ptr = 0;
+	}
+	memcpy(buf, &random_cache[rnd_cache_ptr], len);
+	rnd_cache_ptr += len;
+	return true;
+}
+
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -429,7 +447,6 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 static uint32_t sequence_counter;
 static uint64_t previous_timestamp = 0;
 
-
 Datum
 gen_uuid_v7(PG_FUNCTION_ARGS)
 {
@@ -456,7 +473,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 		tms = previous_timestamp;
 
 		/* fill everything after the timestamp and counter with random bytes */
-		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		if (!cached_strong_random(&uuid->data[8], UUID_LEN - 8))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
@@ -471,7 +488,7 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 	else
 	{
 		/* fill everything after the timestamp with random bytes */
-		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		if (!cached_strong_random(&uuid->data[6], UUID_LEN - 6))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
 					errmsg("could not generate random values")));
-- 
2.37.1 (Apple Git-137.1)

#29

przemyslaw@sztoch.pl

about 2 years ago

In reply to: Andrey M. Borodin (#28)

Re: Re: UUID v7

Dear Andrey,

1. Is it possible to add a function that returns the version of the
generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID
that inform about the version.

2. If there is any doubt about adding the function to the main sources
(standard development in progress), in my opinion you can definitely add
this function to the uuid-ossp extension.

3. Wouldn't it be worth including UUID version 6 as well?

4. Sometimes you will need to generate a uuid for historical time. There
should be an additional function gen_uuid_v7(timestamp).

Nevertheless, the need for uuid v6/7/8 is very high and I'm glad it's
coming to PostgreSQL. It should be a PG17 version.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

#30

x4mmm@yandex-team.ru

about 2 years ago

In reply to: Przemysław Sztoch (#29)

2 attachment(s)

Re: UUID v7

Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7 ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.

2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.

3. Wouldn't it be worth including UUID version 6 as well?

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Thanks!

Best regards, Andrey Borodin.

[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1]: http://www.ossp.org/

Attachments:

v8-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v8-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From b3f82cc9f95d8e9193cd98a23be40dd864310673 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v8 1/2] Implement UUID v7 as per IETF draft

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  18 +++-
 src/backend/utils/adt/uuid.c             | 114 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |   6 ++
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  22 +++++
 src/test/regress/sql/uuid.sql            |  14 +++
 6 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cec21e42c0..7a1b728bed 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>get_uuid_v7_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes two functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>get_uuid_v7_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID version 7.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..ce4be60698 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -20,6 +23,7 @@
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +425,113 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+
+	gettimeofday(&tp, NULL);
+
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+
+	if (tms <= previous_timestamp)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+get_uuid_v7_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if (((uuid->data[6] & 0xf0) != 0x70)
+		|| ((uuid->data[8] & 0xc0) != 0x80))
+		elog(ERROR,"get_uuid_v7_time() can only extract timestamp from UUID v7");
+
+	tms =			  uuid->data[5];
+	tms += ((uint64_t)uuid->data[4]) << 8;
+	tms += ((uint64_t)uuid->data[3]) << 16;
+	tms += ((uint64_t)uuid->data[2]) << 24;
+	tms += ((uint64_t)uuid->data[1]) << 32;
+	tms += ((uint64_t)uuid->data[0]) << 40;
+
+	ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	PG_RETURN_TIMESTAMPTZ(ts);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7979392776..21560c0e81 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 7',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
+{ oid => '9896', descr => 'extract timestamp from UUID version 7',
+  proname => 'get_uuid_v7_time', proleakproof => 't', provolatile => 'i',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'get_uuid_v7_time' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..853d3574a3 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,8 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+gen_uuid_v7()
+get_uuid_v7_time(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..dfb446b4cb 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '100ms') AND (d >= '-100ms') FROM uuid_time_extraction;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+ERROR:  get_uuid_v7_time() can only extract timestamp from UUID v7
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..dccb8335cd 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+WITH uuid_time_extraction AS
+(SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
+SELECT (d <= '100ms') AND (d >= '-100ms') FROM uuid_time_extraction;
+
+-- get_uuid_v7_time() must refuse to accept non-UUIDv7
+select get_uuid_v7_time(gen_random_uuid());
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

v8-0002-Add-optional-argument-unix_ts_ms-to-gen_uuid_v7.patchapplication/octet-stream; name=v8-0002-Add-optional-argument-unix_ts_ms-to-gen_uuid_v7.patch; x-unix-mode=0644Download

From 9936f08868ef06272a684c6e3fefb09109e21fe0 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Thu, 4 Jan 2024 23:01:15 +0500
Subject: [PATCH v8 2/2] Add optional argument unix_ts_ms to gen_uuid_v7()

This would allow to do k-way sorted uuids.
---
 src/backend/utils/adt/pseudotypes.c      | 12 ++++++++++--
 src/backend/utils/adt/uuid.c             | 22 ++++++++++++++++++----
 src/include/catalog/pg_proc.dat          |  6 ++++--
 src/test/regress/expected/opr_sanity.out |  2 +-
 src/test/regress/expected/uuid.out       | 10 ++++++++++
 src/test/regress/sql/uuid.sql            |  6 ++++++
 6 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index a3a991f634..82c2d1309d 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -23,6 +23,7 @@
 #include "postgres.h"
 
 #include "libpq/pqformat.h"
+#include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/rangetypes.h"
@@ -332,11 +333,18 @@ shell_out(PG_FUNCTION_ARGS)
  *
  * We must disallow input of pg_node_tree values because the SQL functions
  * that operate on the type are not secure against malformed input.
- * We do want to allow output, though.
+ * We do want to allow output, though. Also we need input during bootstrap.
  */
-PSEUDOTYPE_DUMMY_INPUT_FUNC(pg_node_tree);
 PSEUDOTYPE_DUMMY_RECEIVE_FUNC(pg_node_tree);
 
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+	if (!IsBootstrapProcessingMode())
+		elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+	return textin(fcinfo);
+}
+
 Datum
 pg_node_tree_out(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index ce4be60698..9e07987045 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -436,12 +436,26 @@ gen_uuid_v7(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 	uint64_t tms;
 	struct timeval tp;
+	bool increment_counter;
 
-	gettimeofday(&tp, NULL);
-
-	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = tms <= previous_timestamp;
+	}
+	else
+	{
+		tms = PG_GETARG_INT64(0);
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+	}
 
-	if (tms <= previous_timestamp)
+	if (increment_counter)
 	{
 		/* Time did not increment from the previous generation, we must increment counter */
 		++sequence_counter;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 21560c0e81..873ea90871 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,9 +9174,11 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
-{ oid => '9895', descr => 'generate UUID version 7',
+{ oid => '9895', descr => 'generate UUID version 7', proisstrict => 'f',
   proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
-  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_uuid_v7' },
+  prorettype => 'uuid', proargtypes => 'int8', prosrc => 'gen_uuid_v7',
+  proargnames => '{unix_ts_ms}', pronargdefaults => 1, proargmodes => '{i}',
+  proargdefaults => '({CONST :consttype 20 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull true :location 47 :constvalue <>})' },
 { oid => '9896', descr => 'extract timestamp from UUID version 7',
   proname => 'get_uuid_v7_time', proleakproof => 't', provolatile => 'i',
   prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'get_uuid_v7_time' },
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 853d3574a3..3fa90ae59e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,7 +872,7 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
-gen_uuid_v7()
+gen_uuid_v7(bigint)
 get_uuid_v7_time(uuid)
 -- restore normal output mode
 \a\t
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index dfb446b4cb..cdad0024a6 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -178,6 +178,16 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(12345));
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(12345));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- check that timestamp is extracted correctly
 WITH uuid_time_extraction AS
 (SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index dccb8335cd..a2de91feed 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -91,6 +91,12 @@ INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
 INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(12345));
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(12345));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 -- check that timestamp is extracted correctly
 WITH uuid_time_extraction AS
 (SELECT get_uuid_v7_time(gen_uuid_v7())-now() d)
-- 
2.37.1 (Apple Git-137.1)

#31

postgres@jeltef.nl

about 2 years ago

In reply to: Andrey M. Borodin (#30)

Re: UUID v7

First of all, I'm a huge fan of UUID v7. So I'm very excited that this
is progressing. I'm definitely going to look closer at this patch
soon. Some tiny initial feedback:

(bikeshed) I'd prefer renaming `get_uuid_v7_time` to the shorter
`uuid_v7_time`, the `get_` prefix seems rarely used in Postgres
functions (e.g. `date_part` is not called `get_date_part`). Also it's
visually very similar to the gen_ prefix.

On Thu, 4 Jan 2024 at 19:20, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.

I do agree that those functions would be useful, especially now that
we're introducing a function that errors when it's passed a UUID
that's not of version 7. With the version extraction function you
could return something else for other uuids if you have many and not
all of them are version 7.

I do think though that these functions should not have v7 in their
name, since they would apply to all uuids of all versions (so if also
removing the get_ prefix they would be called uuid_ver and uuid_var)

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?

I think timestamp would be quite useful. timestamp would encode the
time in the same way as gen_uuid_v7() would, but based on the given
time instead of the current time.

#32

przemyslaw@sztoch.pl

about 2 years ago

In reply to: Jelte Fennema-Nio (#31)

Re: UUID v7

Andrey M. Borodin wrote on 1/4/2024 7:20 PM:

Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

What do you think if we have functions get_uuid_v7_ver(uuid) and get_uuid_v7_var(uuid) to extract bit fields according to [0] ? Or, perhaps, this should be one function with two return parameters?
It's not in a patch yet, I'm just considering how this functionality should look like.

uuid_ver(uuid) -> smallint/integer 1/3/4/5/6/7/8

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant
then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.

2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.

From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?

After two days of thinking about UUID v7, I consider it a very important
functionality that should be included in PG17.

3. Wouldn't it be worth including UUID version 6 as well?

The standard in [0] says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.

IETF standard should provide information about possibility of conversion
from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth
implementing this version as well.

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?

I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an
error.

We also talked about uuid-ossp. Still, v5 is a great solution in some
applications.
It is worth moving this function from extension to PG17. Many people
don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (/|namespace|/ |uuid|, /|name|/ |text|) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return
NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

The naming of this family of functions needs to be rethought.
Do we adopt the naming standard from Postgres and the uuid-ossp extension?
Or should we continue with a slightly less accurate name for PG:
get_random_uuid (get_random_uuid, get_uuid_v7)?

5. Please add in docs reference to RFC4122
(https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)

Thanks!

Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1] http://www.ossp.org/

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#33

sergeyprokhorenko@yahoo.com.au

about 2 years ago

In reply to: Przemysław Sztoch (#32)

Re: UUID v7

Hello Przemysław and Andrey,
When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead of uuid_v7. When people search Google for new versions of UUIDs, they enter the abbreviation uuidv7 into the search bar. The name generate_uuidv7() looks good, as well as uuidv1_to_uuidv6() and timestamp_to_uuidv7().
Best regards,

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Friday, 5 January 2024 at 11:53:04 am GMT+3, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

Andrey M. Borodin wrote on 1/4/2024 7:20 PM:

Hello Przemysław,

thanks for your interest in this patch!

On 3 Jan 2024, at 04:37, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

1. Is it possible to add a function that returns the version of the generated uuid?
It will be very useful.
I don't know if it's possible, but I think there are bits in the UUID that inform about the version.

Of course there is RFC 4122 Variant "bits: 10x". If it is other variant then uuid_ver should return -1 OR NULL.
For UUIDs generated by your patch this function should always return 7.

2. If there is any doubt about adding the function to the main sources (standard development in progress), in my opinion you can definitely add this function to the uuid-ossp extension.

From my POV we can just have this function in the core. OSSP support for UUID seems more or less dead [1]http://www.ossp.org/: "Newsflash: 04-Jul-2008: Released OSSP uuid 1.6.2". Or do I look into wrong place?
After two days of thinking about UUID v7, I consider it a very important functionality that should be included in PG17.

3. Wouldn't it be worth including UUID version 6 as well?

The standard in [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7 says "Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 Section 5.7 instead." If there's a point in developing v6 - I'm OK to do so.
IETF standard should provide information about possibility of conversion from v1 to v6.
Then the usefulness of v6 is much greater and it would be worth implementing this version as well.

4. Sometimes you will need to generate a uuid for historical time. There should be an additional function gen_uuid_v7(timestamp).

Done, please see patch attached. But I changed signature to gen_uuid_v7(int8), to avoid messing with bytes from user who knows what they want. Or do you think gen_uuid_v7(timestamp) would be more convenient?
I talked to my colleagues and everyone chooses the timestamp version.
If timestamp is outside the allowed range, the function must return an error.

We also talked about uuid-ossp. Still, v5 is a great solution in some applications.
It is worth moving this function from extension to PG17. Many people don't use it because they don't know it and this uuid schema.

We think it would be quite reasonable to add:
uuid_generate_v5 (namespace uuid, name text) -> uuid
uuid_generate_v6 () -> uuid
uuid_generate_v6 (timestamptz) -> uuid
uuid_generate_v7() -> uuid
uuid_generate_v7(timestamptz) -> uuid
uuid_ver(uuid) -> smallint -1/1/2/3/4/5/6/7/8
uuid_ts(uuid) -> timestamptz (for 1/6/7 version, for other should return NULL, error is too heavy in our opinion)
uuid_v1_to_v6 (uuid) -> uuid

5. Please add in docs reference to RFC4122 (https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuid)
People should read standards. :-)

Thanks!

Best regards, Andrey Borodin.

[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#uuidv7
[1]: http://www.ossp.org/

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#34

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Sergey Prokhorenko (#33)

1 attachment(s)

Re: UUID v7

On 5 Jan 2024, at 15:57, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
2. Renamed get_uuid_v7_time() to uuid_v7_time()
3. Added uuid_ver() and uuid_var().

What do you think?

Best regards, Andrey Borodin.

Attachments:

v9-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v9-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 25460815ccbfcd6f86d39beddc6c0aa7005fc0a4 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v9] Implement UUID v7 as per IETF draft

This commit addes function to generate UUID v7.
This function optionally accepts datetime used to generate
next UUID.
Also we add a function to extract timestamp from UUID v7.

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  18 ++-
 src/backend/utils/adt/pseudotypes.c      |  12 +-
 src/backend/utils/adt/uuid.c             | 157 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  14 ++
 src/test/regress/expected/opr_sanity.out |   4 +
 src/test/regress/expected/uuid.out       |  43 +++++++
 src/test/regress/sql/uuid.sql            |  22 ++++
 7 files changed, 267 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..d6c83cb13f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>gen_uuid_v7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_v7_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes two functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>gen_uuid_v7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>uuid_v7_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID version 7.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/pseudotypes.c b/src/backend/utils/adt/pseudotypes.c
index a3a991f634..82c2d1309d 100644
--- a/src/backend/utils/adt/pseudotypes.c
+++ b/src/backend/utils/adt/pseudotypes.c
@@ -23,6 +23,7 @@
 #include "postgres.h"
 
 #include "libpq/pqformat.h"
+#include "miscadmin.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/rangetypes.h"
@@ -332,11 +333,18 @@ shell_out(PG_FUNCTION_ARGS)
  *
  * We must disallow input of pg_node_tree values because the SQL functions
  * that operate on the type are not secure against malformed input.
- * We do want to allow output, though.
+ * We do want to allow output, though. Also we need input during bootstrap.
  */
-PSEUDOTYPE_DUMMY_INPUT_FUNC(pg_node_tree);
 PSEUDOTYPE_DUMMY_RECEIVE_FUNC(pg_node_tree);
 
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+	if (!IsBootstrapProcessingMode())
+		elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+	return textin(fcinfo);
+}
+
 Datum
 pg_node_tree_out(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..66d9672dd1 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,9 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -20,6 +23,7 @@
 #include "utils/builtins.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +425,156 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+gen_uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = tms <= previous_timestamp;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_v7_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if (((uuid->data[6] & 0xf0) != 0x70)
+		|| ((uuid->data[8] & 0xc0) != 0x80))
+		elog(ERROR,"uuid_v7_time() can only extract timestamp from UUID v7");
+
+	tms =			  uuid->data[5];
+	tms += ((uint64_t)uuid->data[4]) << 8;
+	tms += ((uint64_t)uuid->data[3]) << 16;
+	tms += ((uint64_t)uuid->data[2]) << 24;
+	tms += ((uint64_t)uuid->data[1]) << 32;
+	tms += ((uint64_t)uuid->data[0]) << 40;
+
+	ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	PG_RETURN_TIMESTAMPTZ(ts);
+}
+
+Datum
+uuid_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		elog(ERROR,"uuid_ver() is only defined for RFC 4122 variants");
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..c00cd1320f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'gen_uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'gen_uuid_v7',
+  proargnames => '{unix_ts_ms}', pronargdefaults => 1, proargmodes => '{i}',
+  proargdefaults => '({CONST :consttype 1184 :consttypmod -1 :constcollid 0 :constlen 8 :constbyval true :constisnull true :location 46 :constvalue <>})' },
+{ oid => '9896', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_v7_time' },
+{ oid => '9897', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_ver', proleakproof => 't', provolatile => 'i',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_ver' },
+{ oid => '9898', descr => 'extract variant from UUID',
+  proname => 'uuid_var', proleakproof => 't', provolatile => 'i',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..163658f002 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,10 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+gen_uuid_v7(timestamp with time zone)
+uuid_v7_time(uuid)
+uuid_ver(uuid)
+uuid_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..b8426b1f8e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,48 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(now()));
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_v7_time(gen_uuid_v7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_ver(gen_uuid_v7());
+ uuid_ver 
+----------
+        7
+(1 row)
+
+SELECT uuid_var(gen_uuid_v7());
+ uuid_var 
+----------
+        2
+(1 row)
+
+-- uuid_v7_time() must refuse to accept non-UUIDv7
+select uuid_v7_time(gen_random_uuid());
+ERROR:  uuid_v7_time() can only extract timestamp from UUID v7
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..fb28766ece 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,27 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(now()));
+INSERT INTO guid1 (guid_field) VALUES (gen_uuid_v7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_v7_time(gen_uuid_v7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_ver(gen_uuid_v7());
+SELECT uuid_var(gen_uuid_v7());
+
+-- uuid_v7_time() must refuse to accept non-UUIDv7
+select uuid_v7_time(gen_random_uuid());
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#35

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey Borodin (#34)

Re: UUID v7

Hi Andrey,

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()
2. Renamed get_uuid_v7_time() to uuid_v7_time()
3. Added uuid_ver() and uuid_var().

What do you think?

Many thanks for the updated patch. It's an important work and I very
much hope we will see this in the upcoming PG release.

```
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+    if (!IsBootstrapProcessingMode())
+        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+    return textin(fcinfo);
+}
```

Not 100% sure what this is for. Any chance this could be part of another patch?

One thing I don't particularly like about the tests is the fact that
they don't check if a correct UUID was actually generated. I realize
that's not quite trivial due to the random nature of the function, but
maybe we could use some substring/regex magic here? Something like:

```
select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
?column?
----------
t

select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
'[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
regexp_replace
--------------------------------------
018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
```

```
+ proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
```

I don't think we conventionally specify IMMUTABLE volatility, it's the
default. Other values also are worth checking.

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

Also I would like to point out that part of the documentation is
missing, but I guess at this stage of the game it's OK.

Last but not least: maybe we should support casting Timestamp[Tz] to
UUIDv7 and vice versa? Shouldn't be difficult to implement and I
suspect somebody will request this eventually. During the cast to UUID
we will always get the same value for the given Timestamp[Tz], which
probably can be useful in certain applications. It can't be done with
gen_uuid_v7() and its volatility doesn't permit it.

--
Best regards,
Aleksander Alekseev

#36

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#35)

Re: UUID v7

Thanks for your review, Aleksander!

On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
```
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+    if (!IsBootstrapProcessingMode())
+        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+    return textin(fcinfo);
+}
```
Not 100% sure what this is for. Any chance this could be part of another patch?

Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults have type pg_node_tree which has no-op in function.

One thing I don't particularly like about the tests is the fact that
they don't check if a correct UUID was actually generated. I realize
that's not quite trivial due to the random nature of the function, but
maybe we could use some substring/regex magic here? Something like:

```
select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
?column?
----------
t

select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
'[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
regexp_replace
--------------------------------------
018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
```

Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

```
+ proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
```

I don't think we conventionally specify IMMUTABLE volatility, it's the
default. Other values also are worth checking.

Makes sense, I’ll drop this values in next version.
BTW I’m in doubt if provided functions are leakproof. They ERROR-out with messages that can give a clue about several bits of UUID. Does this break leakproofness? I think yest, but I’m not sure.
gen_uuid_v7() seems leakproof to me.

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

I mean, this argument is expected to be used to implement K-way sorted identifiers. In this context, it seems to me, it’s good to remember to developer that time shift also depend on timezones.
But this is too vague.
Do you have any reasons that apply to UUID generation?

Also I would like to point out that part of the documentation is
missing, but I guess at this stage of the game it's OK.

Last but not least: maybe we should support casting Timestamp[Tz] to
UUIDv7 and vice versa? Shouldn't be difficult to implement and I
suspect somebody will request this eventually. During the cast to UUID
we will always get the same value for the given Timestamp[Tz], which
probably can be useful in certain applications. It can't be done with
gen_uuid_v7() and its volatility doesn't permit it.

I’m strongly opposed to doing this cast. I was not adding this function to extract timestamp from UUID, because standard does not recommend it. But a lot of people asked for this.
But supporting easy way to do unrecommended thing seem bad.

Best regards, Andrey Borodin.

#37

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#36)

Re: UUID v7

Andrey,

It is not clear how to interpret uuid_v7_time():
- uuid_v7 to time() (extracting the timestamp)
- time() to uuid_v7 (generation of the uuid_v7)
It is worth improving the naming, for example, adding prepositions.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Tuesday, 16 January 2024 at 05:44:51 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Thanks for your review, Aleksander!

On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
```
+Datum
+pg_node_tree_in(PG_FUNCTION_ARGS)
+{
+    if (!IsBootstrapProcessingMode())
+        elog(ERROR, "cannot accept a value of type pg_node_tree_in");
+    return textin(fcinfo);
+}
```
Not 100% sure what this is for. Any chance this could be part of another patch?

Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults have type pg_node_tree which has no-op in function.

One thing I don't particularly like about the tests is the fact that
they don't check if a correct UUID was actually generated. I realize
that's not quite trivial due to the random nature of the function, but
maybe we could use some substring/regex magic here? Something like:

```
select gen_uuid_v7() :: text ~ '^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}$';
?column?
----------
t

select regexp_replace(gen_uuid_v7('2024-01-16 15:45:33 MSK') :: text,
'[0-9a-f]{4}-[0-9a-f]{12}$', 'XXXX-' || repeat('X', 12));
regexp_replace
--------------------------------------
018d124e-39c8-74c7-XXXX-XXXXXXXXXXXX
```

Any 8 bytes which have ver and var bits (6 bits total) are correct UUID.
This is checked by tests when uuid_var() and uuid_ver() functions are exercised.

```
+ proname => 'uuid_v7_time', proleakproof => 't', provolatile => 'i',
```

I don't think we conventionally specify IMMUTABLE volatility, it's the
default. Other values also are worth checking.

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

Also I would like to point out that part of the documentation is
missing, but I guess at this stage of the game it's OK.

Last but not least: maybe we should support casting Timestamp[Tz] to
UUIDv7 and vice versa? Shouldn't be difficult to implement and I
suspect somebody will request this eventually. During the cast to UUID
we will always get the same value for the given Timestamp[Tz], which
probably can be useful in certain applications. It can't be done with
gen_uuid_v7() and its volatility doesn't permit it.

Best regards, Andrey Borodin.

#38

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Sergey Prokhorenko (#37)

Re: UUID v7

On 16 Jan 2024, at 21:49, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

It is not clear how to interpret uuid_v7_time():
• uuid_v7 to time() (extracting the timestamp)
• time() to uuid_v7 (generation of the uuid_v7)
It is worth improving the naming, for example, adding prepositions.

Previously, Jelte had some thoughts on idiomatic function names.

Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?
Of cause, it would be great to hear opinion from anyone else.

Best regards, Andrey Borodin.

#39

postgres@jeltef.nl

almost 2 years ago

In reply to: Andrey M. Borodin (#36)

Re: UUID v7

On Tue, 16 Jan 2024 at 15:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 16 Jan 2024, at 18:00, Aleksander Alekseev <aleksander@timescale.com> wrote:
Not 100% sure what this is for. Any chance this could be part of another patch?

Nope, it’s necessary there. Without these changes catalog functions cannot have defaults for arguments. These defaults have type pg_node_tree which has no-op in function.

That seems like the wrong way to make that work then. How about
instead we define the same function name twice, once with and once
without a timestamp argument. That's how this is done for other
functions that are overloaded in pg_catalog.

#40

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Andrey Borodin (#34)

Re: UUID v7

Andrey Borodin wrote on 1/16/2024 1:15 PM:

Sergey, Przemysław, Jelte, thanks for your feedback.
Here's v9. Changes:
1. Swapped type of the argument to timestamptz in gen_uuid_v7()

Please update docs part about optional timestamp argument.

2. Renamed get_uuid_v7_time() to uuid_v7_time()

Pleaserename uuid_v7_time to uuid_time() and add support for v1 and v6.
If version is incompatible then return NULL.

3. Added uuid_ver() and uuid_var().

Looks good.
But for me, throwing an error is problematic. Wouldn't it be better to
return -1.

What do you think?
Best regards, Andrey Borodin.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#41

[0]: /messages/by-id/6a65610c-46fc-2323-6b78-e8086340a325@2ndquadrant.com

postgres@jeltef.nl

almost 2 years ago

In reply to: Andrey M. Borodin (#38)

Re: UUID v7

On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Jelte, what is your opinion on naming the function which extracts timestamp from UUID v7?

I looked at a few more datatypes: json, jsonb & hstore. The get_
prefix is not used there at all, so I'm still opposed to that. But
they seem to use either an _to_ or an _extract_ infix. _to_ is then
used for conversion of the whole object, and _extract_ is used to
extract a subset. So I think _extract_ would fit well here.

On Fri, 5 Jan 2024 at 11:57, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

When naming functions, I would advise using the shorter abbreviation uuidv7 from the new version of the RFC instead of uuid_v7.

I also agree with that, uuid_v7 looks weird to my eyes. The RFC also
abbreviates them as UUIDv7 (without a space).

The more I look at it the more I also think the gen_ prefix is quite
strange, and I already thought the gen_random_uuid name was quite
weird. But now that we will also have a uuidv7 I think it's even
stranger that one uses the name from the RFC.

The name of gen_random_uuid was taken verbatim from pgcrypto, without
any discussion on the list[0]/messages/by-id/6a65610c-46fc-2323-6b78-e8086340a325@2ndquadrant.com:

Here is a proposed patch for this. I did a fair bit of looking around
in other systems for a naming pattern but didn't find anything
consistent. So I ended up just taking the function name and code from
pgcrypto.

So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

#42

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Jelte Fennema-Nio (#41)

Re: UUID v7

Jelte Fennema-Nio wrote on 1/16/2024 9:25 PM:

On Tue, 16 Jan 2024 at 19:17, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
So currently my preference for the function names would be:

- uuidv4() -> alias for gen_random_uuid()
- uuidv7()
- uuidv7(timestamptz)
- uuid_extract_ver(uuid)
- uuid_extract_var(uuid)
- uuidv7_extract_time(uuid)

+1
But replaceuuidv7_extract_time(uuid)with uuid_extract_time(uuid) -
function should be able extract timestamp from v1/v6/v7

I would highly recommend to add:
uuidv5(namespace uuid, name text) -> uuid
using uuid_generate_v5 from uuid-ossp extension
(https://www.postgresql.org/docs/current/uuid-ossp.html)
There is an important version and it should be included into the main PG
code.

Jelte: Please propose the name of the function that will convert uuid
from version 1 to 6.
v6 is almost as good as v7 for indexes. And v6 allows you to convert
from v1 which some people use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

#43

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Aleksander Alekseev (#35)

Re: UUID v7

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should
always use timestamptz.
timestamp is theoretically the same thing. But internally it does not
convert time to UTC and will lead to incorrect use.
--
Przemysław Sztoch | Mobile +48 509 99 00 66

#44

postgres@jeltef.nl

almost 2 years ago

In reply to: Przemysław Sztoch (#42)

Re: UUID v7

On Tue, 16 Jan 2024 at 22:02, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

But replace uuidv7_extract_time(uuid) with uuid_extract_time(uuid) - function should be able extract timestamp from v1/v6/v7

I'm fine with this.

I would highly recommend to add:
uuidv5(namespace uuid, name text) -> uuid
using uuid_generate_v5 from uuid-ossp extension (https://www.postgresql.org/docs/current/uuid-ossp.html)
There is an important version and it should be included into the main PG code.

I think adding more uuid versions would probably be desirable. But I
don't think it makes sense to clutter this patchset with that. I feel
like on this uuidv7 patchset we've had enough discussion that it could
reasonably get into PG17, but I think adding even more uuid versions
to this patchset would severely reduce the chances of that happening.

#45

[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Jelte Fennema-Nio (#44)

1 attachment(s)

Re: UUID v7

On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value I see that "A.6. Example of a UUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is not UTC, but local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...

If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)

Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.

Best regards, Andrey Borodin.

Attachments:

v10-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v10-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From f5e527d49a792b9b140562c7d42e0b1b15f5b315 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v10] Implement UUID v7 as per IETF draft

This commit addes function to generate UUID v7.
This function optionally accepts datetime used to generate
next UUID.
Also we add a function to extract timestamp from UUID v7.

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  18 ++-
 src/backend/utils/adt/uuid.c             | 192 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  19 +++
 src/test/regress/expected/opr_sanity.out |  13 +-
 src/test/regress/expected/uuid.out       |  88 +++++++++++
 src/test/regress/sql/uuid.sql            |  35 +++++
 6 files changed, 361 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..417e0c7f19 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,29 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes two functions to generate a UUID:
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
    This function returns a version 4 (random) UUID.  This is the most commonly
    used type of UUID and is appropriate for most applications.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID version 7.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..3538fcba6d 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,190 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = tms <= previous_timestamp;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - date2j(1582,10,15)) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - date2j(1582,10,15)) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..956fb08ce9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,25 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'uuidv7',
+  proargnames => '{unix_ts_ms}', proargmodes => '{i}' },
+{ oid => '9898', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9899', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9900', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..1c37533975 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -872,6 +873,12 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuidv7(timestamp with time zone)
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..b61f7a64ff 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,93 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') at time zone 'GMT-05';
+         timezone         
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..d7185759b1 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,40 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') at time zone 'GMT-05';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') at time zone 'GMT-05';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#46

aleksander@timescale.com

almost 2 years ago

In reply to: Przemysław Sztoch (#43)

Re: UUID v7

Hi,

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

--
Best regards,
Aleksander Alekseev

#47

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#46)

Re: UUID v7

On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:

Timestamp and TimestampTz are absolutely the same thing.

My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Best regards, Andrey Borodin.

#48

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey Borodin (#47)

Re: UUID v7

Hi Andrey,

Timestamp and TimestampTz are absolutely the same thing.

My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

"""
This example UUIDv7 test vector utilizes a well-known 32 bit Unix
epoch with additional millisecond precision to fill the first 48 bits
[...]
The timestamp is Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00
represented as 0x17F22E279B0 or 1645557742000
"""

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
?column?
----------------------
1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

--
Best regards,
Aleksander Alekseev

#49

[1]: https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Aleksander Alekseev (#48)

Re: UUID v7

Hi Andrey,

Aleksander Alekseev wrote: "If this is the case, I think the example is indeed wrong".

This is one of the reasons why I was categorically against any examples of implementation in the new RFC. The examples have been very poorly studied and discussed, and therefore it is better not to use them at all. But the text of the RFC itself clearly refers to UTC, and not at all about local time: "UUID version 7 features a time-ordered value field derived from the widely implemented and well known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded". The main reason for using UTC is so that UUIDv7's, generated approximately simultaneously in different time zones, are correctly ordered in time when they get into one database.

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

On Thursday, 18 January 2024 at 07:22:05 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi Andrey,

Timestamp and TimestampTz are absolutely the same thing.

My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Not 100% sure which text you are referring to exactly, but I'm
guessing it's section B.2 of [1]https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html

If this is the case, I think the example is indeed wrong:

```
=# select extract(epoch from 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00' :: timestamptz)*1000;
?column?
----------------------
1645521742000.000000
(1 row)
```

And the difference between the value in the text and the actual value
is 10 hours as you pointed out.

Also you named the date 1582-10-15 00:00:00 UTC. Maybe you actually
meant 1970-01-01 00:00:00 UTC?

--
Best regards,
Aleksander Alekseev

#50

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Andrey Borodin (#47)

Re: UUID v7

On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

'2022-02-22 19:22:22 UTC' is exactly that moment which was encoded into example UUIDs. It's not '2022-02-23 00:22:22 in UTC-05' as I thought.
I got confused by "at timezone" changes which in fact removes timezone information. And that's per SQL standard...

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
?column?
----------
10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
timestamptz
------------------------
2022-02-22 04:22:22-05
(1 row)

I cannot wrap my mind around it... Any pointers would be appreciated.
I'm certain that code extracted UTC time correctly, I just want a reliable test that verifies timestamp constant (+ I understand what is going on).

Best regards, Andrey Borodin.

#51

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey Borodin (#50)

Re: UUID v7

Hi Andrey,

You'd better generate a test UUIDv7 for midnight 1 Jan 1970 UTC. In this case, the timestamp in UUIDv7 according to the new RFC must be filled with zeros. By extracting the timestamp from this test UUIDv7, you should get exactly midnight 1 Jan 1970 UTC.
I also recommend this article: https://habr.com/ru/articles/772954/

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

On Thursday, 18 January 2024 at 09:31:16 pm GMT+3, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 18 Jan 2024, at 20:39, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' - TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
?column?
----------
10:00:00
(1 row)

postgres=# select TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00';
timestamptz
------------------------
2022-02-22 04:22:22-05
(1 row)

Best regards, Andrey Borodin.

#52

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Andrey Borodin (#45)

Re: UUID v7

Using localtime would be absurd. Especially since time goes back during
summer time change.
I believe our implementation should use UTC. No one forbids us from
assuming that our local time for generating uuid is UTC.

Andrey Borodin wrote on 1/18/2024 2:17 PM:

On 17 Jan 2024, at 02:19, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I want to ask Kyzer or Brad, I hope they will see this message. I'm working on the patch for time extraction for v1, v6 and v7.

Do I understand correctly, that UUIDs contain local time, not UTC time? For examples in [0] I see that "A.6. Example of a UUIDv7 Value" I see that February 22, 2022 2:22:22.00 PM GMT-05:00 results in unix_ts_ms = 0x017F22E279B0, which is not UTC, but local time.
Is it intentional? Section "5.1. UUID Version 1" states otherwise.

If so, I should swap signatures of functions from TimestampTz to Timestamp.
I'm hard-coding examples from this standard to tests, so I want to be precise...
If I follow the standard I see this in tests:
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') at time zone 'GMT-05';
+         timezone
+--------------------------
+ Wed Feb 23 00:22:22 2022
+(1 row)
Current patch version attached. I've addressed all other requests: function renames, aliases, multiple functions instead of optional params, cleaner catalog definitions, not throwing error when [var,ver,time] value is unknown.
What is left: deal with timezones, improve documentation.

Best regards, Andrey Borodin.

[0] https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#name-example-of-a-uuidv1-value

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#53

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Aleksander Alekseev (#46)

Re: UUID v7

Aleksander Alekseev wrote on 1/18/2024 3:20 PM:

Hi,

Another question: how did you choose between using TimestampTz and
Timestamp types? I realize that internally it's all the same. Maybe
Timestamp will be slightly better since the way it is displayed
doesn't depend on the session settings. Many people I talked to find
this part of TimestampTz confusing.

timstamptz internally always store UTC.
I believe that in SQL, when operating with time in UTC, you should always use timestamptz.
timestamp is theoretically the same thing. But internally it does not convert time to UTC and will lead to incorrect use.

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

Theoretically, you're right. But look at this example:

SET timezone TO 'Europe/Warsaw';
SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp),
extract(epoch from '2024-01-18 9:27:30'::timestamptz);

date_part | date_part
------------+------------
1705570050 | 1705566450
(1 row)

In my opinion, timestamptz gives greater guarantees that the time
internally is in UTC and the user gets the time in his/her time zone.

In the case of timestamp, it is never certain whether it keeps time in
UTC or in the local zone.

In the case of argument's type, there would be no problem because we
could create two functions.
Of course timestamp would be treated the same as timestamptz.
But here we have a problem with the function return type, which can only
be one. And since the time returned is in UTC, it should be timestamptz.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#54

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Andrey Borodin (#47)

Re: UUID v7

We are not allowed to consider any time other than UTC.

You need to write to the authors of the standard. I suppose this is a
mistake.

I know from experience that errors in such standards most often appear
in examples.
Nobody detects them at first.
Everyone reads and checks ideas, not calculations.
Then developers during implementation tears out their hair.

Andrey Borodin wrote on 1/18/2024 4:39 PM:

On 18 Jan 2024, at 19:20, Aleksander Alekseev <aleksander@timescale.com> wrote:

Timestamp and TimestampTz are absolutely the same thing.

My question is not about Postgres data types. I'm asking about examples in the standard.

There's an example 017F22E2-79B0-7CC3-98C4-DC0C0C07398F. It is expected to be generated on "Tuesday, February 22, 2022 2:22:22.00 PM GMT-05:00".
It's exaplained to be 164555774200000ns after 1582-10-15 00:00:00 UTC.

But 164555774200000ns after 1582-10-15 00:00:00 UTC was 2022-02-22 19:22:22 UTC. And that was 2022-02-23 00:22:22 in UTC-05.

Best regards, Andrey Borodin.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#55

Lukas Fittl

lukas@fittl.com

almost 2 years ago

In reply to: Andrey Borodin (#45)

Re: UUID v7

On Thu, Jan 18, 2024 at 5:18 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Current patch version attached. I've addressed all other requests:
function renames, aliases, multiple functions instead of optional params,
cleaner catalog definitions, not throwing error when [var,ver,time] value
is unknown.
What is left: deal with timezones, improve documentation.

I've done a test of the v10 patch, and ran into an interesting behavior
when passing in a timestamp to the function (which, as a side note, is
actually very useful to have as a feature, to support creating time-based
range partitions on UUIDv7 fields):

postgres=# SELECT uuid_extract_time(uuidv7());
uuid_extract_time
---------------------------
2024-01-18 18:49:00.01-08
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7('2024-04-01'));
uuid_extract_time
------------------------
2024-04-01 00:00:00-07
(1 row)

postgres=# SELECT uuid_extract_time(uuidv7());
uuid_extract_time
------------------------
2024-04-01 00:00:00-07
(1 row)

Note how calling the uuidv7 function again after having called it with a
fixed future timestamp, returns the future timestamp, even though it should
return the current time.

I believe this is caused by incorrectly re-using the cached
previous_timestamp. In the second call here (with a fixed future
timestamp), we end up setting ts and tms to 2024-04-01, with
increment_counter = false, which leads us to set previous_timestamp to the
passed in timestamp (else branch of the second if in uuidv7). When we then
call the function again without an argument, we end up getting a new
timestamp from gettimeofday, but because we try to detect backwards leaps,
we set increment_counter to true, and thus end up reusing the previous
(future) timestamp here:

/* protection from leap backward */
tms = previous_timestamp;

Not sure how to fix this, but clearly something is amiss here.

Thanks,
Lukas

--
Lukas Fittl

#56

https://www.postgresql.org/docs/current/datetime-posix-timezone-specs.html

david.g.johnston@gmail.com

almost 2 years ago

In reply to: Andrey Borodin (#50)

Re: UUID v7

On Thu, Jan 18, 2024 at 11:31 AM Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

Now I'm completely lost in time... I've set local time to NY (UTC-5).

postgres=# select TIMESTAMP WITH TIME ZONE '2022-02-22 14:22:22-05' -
TIMESTAMP WITH TIME ZONE 'Tuesday, February 22, 2022 2:22:22.00 PM
GMT-05:00';
?column?
----------
10:00:00
(1 row)

You are mixing POSIX and ISO-8601 conventions and, as noted in our
appendix, they disagree on the direction that is positive.

The offset fields specify the hours, and optionally minutes and seconds,
difference from UTC. They have the format hh[:mm[:ss]] optionally with a
leading sign (+ or -). The positive sign is used for zones west of
Greenwich. (Note that this is the opposite of the ISO-8601 sign convention
used elsewhere in PostgreSQL.)

David J.

#57

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: David G. Johnston (#56)

1 attachment(s)

Re: UUID v7

On 19 Jan 2024, at 08:24, David G. Johnston <david.g.johnston@gmail.com> wrote:

You are mixing POSIX and ISO-8601 conventions and, as noted in our appendix, they disagree on the direction that is positive.

Thanks! Now everything seems on its place.

I want to include in the patch following tests:
-- extract UUID v1, v6 and v7 timestamp
SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';

How do you think, will it be stable all across buildfarm? Or should we change anything to avoid false positives inferred from different timestamp parsing?

On 19 Jan 2024, at 07:58, Lukas Fittl <lukas@fittl.com> wrote:

Note how calling the uuidv7 function again after having called it with a fixed future timestamp, returns the future timestamp, even though it should return the current time.

Thanks for the review.
Well, that was intentional. But now I see it's kind of confusing behaviour. I've changed it to more expected version.

Also, I've added some documentation on all functions.

Best regards, Andrey Borodin.

Attachments:

v11-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v11-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download

From 98726fc9b75d9abaec8bb4b305531e79d16bf6b0 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v11] Implement UUID v7 as per IETF draft

This commit adds function to generate UUID v7.
This function optionally accepts datetime used to generate
next UUID.
Also we add a function to extract timestamp from UUID v7.

Authors: Andrey Borodin, Sergey Prokhorenko
---
 doc/src/sgml/func.sgml                   |  36 ++++-
 src/backend/utils/adt/uuid.c             | 195 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  19 +++
 src/test/regress/expected/opr_sanity.out |  13 +-
 src/test/regress/expected/uuid.out       |  88 ++++++++++
 src/test/regress/sql/uuid.sql            |  35 ++++
 6 files changed, 380 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..1d4d48d7cb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,43 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1,6 and 7. For other
+   versions and variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variants described by
+   IETF standard (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..6125061b35 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,193 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+static bool external_times_used = false;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = (tms <= previous_timestamp) && !external_times_used;
+		external_times_used = false;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+		external_times_used = true;
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not increment from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * http://tools.ietf.org/html/rfc ???
+	 * https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format#name-creating-a-uuidv7-value
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - date2j(1582,10,15)) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - date2j(1582,10,15)) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..956fb08ce9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,25 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'uuidv7',
+  proargnames => '{unix_ts_ms}', proargmodes => '{i}' },
+{ oid => '9898', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9899', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9900', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..1c37533975 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -872,6 +873,12 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuidv7(timestamp with time zone)
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..df78fd0385 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,93 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..c7a09dd21d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,40 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#58

aleksander@timescale.com

almost 2 years ago

In reply to: Przemysław Sztoch (#53)

Re: UUID v7

Hi,

No.

Timestamp and TimestampTz are absolutely the same thing. The only
difference is how they are shown to the user. TimestampTz uses session
context in order to be displayed in the TZ chosen by the user. Thus
typically it is somewhat more confusing to the users and thus I asked
whether there was a good reason to choose TimestampTz over Timestamp.

Theoretically, you're right. But look at this example:

SET timezone TO 'Europe/Warsaw';
SELECT extract(epoch from '2024-01-18 9:27:30'::timestamp), extract(epoch from '2024-01-18 9:27:30'::timestamptz);

date_part | date_part
------------+------------
1705570050 | 1705566450
(1 row)

In my opinion, timestamptz gives greater guarantees that the time internally is in UTC and the user gets the time in his/her time zone.

I believe you didn't notice, but this example just proves my point.

In this case you have two timestamps that are different _internally_,
but the way they are _shown_ is the same because the first one is in
UTC and the second one in your local session timezone, Europe/Warsaw.
extract(epoch ...) extract UNIX epoch, i.e. relies on the _internal_
representation. This is why you got different results.

This demonstrates that TimestampTz is a permanent source of confusion
for the users and the reason why personally I would prefer if UUIDv7
always used Timestamp (no Tz). TimestampTz can be converted to
TimestampTz by users who need them and have experience using them.

--
Best regards,
Aleksander Alekseev

#59

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Andrey Borodin (#57)

1 attachment(s)

Re: UUID v7

On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Best regards, Andrey Borodin.

Attachments:

v12-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v12-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 10401a50691e7d8c8416895b9862beb1444d2bfc Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v12] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation.
Most important function here is uuidv7() which generates
new UUID according to new standard. This function can optionally
accept a timestamp used instead of current time. This allows
implementation of k-way sotable identifiers.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().
Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
---
 doc/src/sgml/func.sgml                   |  49 +++++-
 src/backend/utils/adt/uuid.c             | 195 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  19 +++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |  13 +-
 src/test/regress/expected/uuid.out       |  88 ++++++++++
 src/test/regress/sql/uuid.sql            |  35 ++++
 7 files changed, 395 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..ce6715721f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,56 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..386cdb7a73 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,193 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+static bool external_times_used = false;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = (tms <= previous_timestamp) && !external_times_used;
+		external_times_used = false;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+		external_times_used = true;
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58811a6530..956fb08ce9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,25 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'uuidv7',
+  proargnames => '{unix_ts_ms}', proargmodes => '{i}' },
+{ oid => '9898', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9899', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9900', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..1c37533975 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -872,6 +873,12 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuidv7(timestamp with time zone)
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..df78fd0385 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,93 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..c7a09dd21d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,40 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#60

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

nik@postgres.ai

almost 2 years ago

In reply to: Andrey Borodin (#59)

Re: UUID v7

On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru>
wrote:

On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in
pg17. Yes, the RFC is not fully finalized yet, but it's very close. And
many libraries are already including implementation of UUIDv7 – here are
some examples:

Nik

#61

nikolay@samokhvalov.com

almost 2 years ago

In reply to: Nikolay Samokhvalov (#60)

Re: UUID v7

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Manually tested uuidv7(), uuid_extract_time() – they work as expected. The basic docs provided look clear.

I haven't checked the tests though and possible edge cases, so leaving it as "needs review" waiting for more reviewers

#62

aleksander@timescale.com

almost 2 years ago

In reply to: Nikolay Samokhvalov (#60)

Re: UUID v7

Hi,

But now (after big timeseries project with multiple time zones and DST problems) I think differently.
Even though timestamp and timestamptz are practically the same, timestamptz should be used to store the time in UTC.
Choosing timestamp is more likely to lead to problems and misunderstandings than timestamptz.

As somebody who contributed TZ support to TimescaleDB I'm more or less
aware about the pros and cons of Timestamp and TimestampTz :)
Engineering is all about compromises. I can imagine a project where it
makes sense to use only TimestampTz for the entire database, and the
opposite - when it's better to use only UTC and Timestamp. In this
particular case I was merely concerned that the particular choice
could be confusing for the users but I think I changed my mind by now,
see below.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.

The docs and comments look great too.

Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

Thanks!

After playing with v12 I'm inclined to agree that it's RfC.

I only have a couple of silly nitpicks:

- It could make sense to decompose the C implementation of uuidv7() in
two functions, for readability.
- It could make sense to get rid of curly braces in SQL tests when
calling uuid_extract_ver() and uuid_extract_ver(), for consistency.

I'm not going to insist on these changes though and prefer leaving it
to the author and the committer to decide.

Also I take back what I said above about using Timestamp instead of
TimestampTz. I forgot that Timestamps are implicitly casted to
TimestampTz's, so users preferring Timestamps can do this:

```
=# select uuidv7('2024-01-22 12:34:56' :: timestamp);
uuidv7
--------------------------------------
018d3085-de00-77c1-9e7b-7b04ddb9ebb9
```

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

--
Best regards,
Aleksander Alekseev

#63

aleksander@timescale.com

almost 2 years ago

In reply to: Aleksander Alekseev (#62)

Re: UUID v7

Hi,

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

I've found a bug:

```
=# select now() - interval '5000 years';
?column?
----------------------------------------
2977-01-24 15:29:01.779462+02:30:17 BC

Time: 0.957 ms

=# select uuidv7(now() - interval '5000 years');
uuidv7
--------------------------------------
720c1868-0764-7677-99cd-265b84ea08b9

=# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
uuid_extract_time
----------------------------
5943-08-26 21:30:44.836+03
```

--
Best regards,
Aleksander Alekseev

#64

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#63)

Re: UUID v7

On 24 Jan 2024, at 17:31, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

Cfbot also seems to be happy with the patch so I'm changing the CF
entry status to RfC.

I've found a bug:

```
=# select now() - interval '5000 years';
?column?
----------------------------------------
2977-01-24 15:29:01.779462+02:30:17 BC

Time: 0.957 ms

=# select uuidv7(now() - interval '5000 years');
uuidv7
--------------------------------------
720c1868-0764-7677-99cd-265b84ea08b9

=# select uuid_extract_time('720c1868-0764-7677-99cd-265b84ea08b9');
uuid_extract_time
----------------------------
5943-08-26 21:30:44.836+03
```

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being unique identifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract some nanoseconds - you will not get back timestamp you put into UUID too.
UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with limited precision, limited range and only if UUID was generated precisely by the specification in standard (and standard allows deviation! Most of implementation try to tradeoff something).

Best regards, Andrey Borodin.

#65

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey M. Borodin (#64)

Re: UUID v7

Hi,

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being unique identifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract some nanoseconds - you will not get back timestamp you put into UUID too.
UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with limited precision, limited range and only if UUID was generated precisely by the specification in standard (and standard allows deviation! Most of implementation try to tradeoff something).

I don't claim that UUIDv7 purpose is storing timestamps, but I think
the invariant:

```
uuid_extract_time(uidv7(X)) == X
```

and (!) even more importantly:

```
if X > Y then uuidv7(X) > uuidv7(Y)
```

... should hold. Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

--
Best regards,
Aleksander Alekseev

#66

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#65)

Re: UUID v7

On 24 Jan 2024, at 18:02, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

UUIDv7 range does not correspond to timestamp range. But it’s purpose is not in storing timestamp, but in being unique identifier. So I don’t think it worth throwing an error when overflowing value is given. BTW if you will subtract some nanoseconds - you will not get back timestamp you put into UUID too.
UUID does not store timpestamp, it only uses it to generate an identifier. Some value can be extracted back, but with limited precision, limited range and only if UUID was generated precisely by the specification in standard (and standard allows deviation! Most of implementation try to tradeoff something).

I don't claim that UUIDv7 purpose is storing timestamps, but I think
the invariant:

```
uuid_extract_time(uidv7(X)) == X
```

and (!) even more importantly:

```
if X > Y then uuidv7(X) > uuidv7(Y)
```

... should hold.

Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
All what is guaranteed is the uniqueness at certain conditions.

Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

Best regards, Andrey Borodin.

#67

aleksander@timescale.com

almost 2 years ago

In reply to: Aleksander Alekseev (#65)

Re: UUID v7

Hi,

Values that violate named invariants should be rejected with an error.

To clarify, I don't think we should bother about the precision part.
"Equals" in the example above means "equal within UUIDv7 precision",
same for "more" and "less". However, years 2977 BC and 5943 AC are
clearly not equal, thus 2977 BC should be rejected as an invalid value
for UUIDv7.

--
Best regards,
Aleksander Alekseev

#68

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey M. Borodin (#66)

Re: UUID v7

Hi,

Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
All what is guaranteed is the uniqueness at certain conditions.

Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees. I'm convinced that these guarantees
will be useful in real-world applications, at least the ones acting
exclusively within Postgres.

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,
Aleksander Alekseev

#69

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#68)

Re: UUID v7

On 24 Jan 2024, at 18:29, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
All what is guaranteed is the uniqueness at certain conditions.

Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

If the standard doesn't guarantee something it doesn't mean it forbids
us to give stronger guarantees.

No, the standard makes these guarantees impossible.
If we insist that uuid_extract_time(uuidv7(time))==time, we won't be able to generate uuidv7 most of the time. uuidv7(now()) will always ERROR-out.
Standard implies more coarse-grained timestamp that we have.

Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us > time, still second uuid will be greater.

Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID v7.

Best regards, Andrey Borodin.

#70

[1]: /messages/by-id/CAJ7c6TPCSprWwVNdOB==pgKZPqO5q=HRgmU7zmYqz9Dz5ffVYw@mail.gmail.com

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey M. Borodin (#69)

Re: UUID v7

Hi,

Also, please not that uuidv7(time+1us) and uuidv7(time) will have the same internal timestamp, so despite time+1us > time, still second uuid will be greater.

Both invariants you proposed cannot be reasonably guaranteed. Upholding any of them greatly reduces usability of UUID v7.

Again, personally I don't insist on the 1us precision [1]/messages/by-id/CAJ7c6TPCSprWwVNdOB==pgKZPqO5q=HRgmU7zmYqz9Dz5ffVYw@mail.gmail.com. Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

--
Best regards,
Aleksander Alekseev

#71

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#70)

1 attachment(s)

Re: UUID v7

On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com> wrote:

Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause failures in ID generation unexpected to the user...
See tests

+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected

Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not sure we should give this information in error message...
Thanks!

Best regards, Andrey Borodin.

Attachments:

v13-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v13-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 5ae983c668ed5e17c0b34faefbc62b8200704bb7 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v13] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation.
Most important function here is uuidv7() which generates
new UUID according to new standard. This function can optionally
accept a timestamp used instead of current time. This allows
implementation of k-way sotable identifiers.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().
Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
---
 doc/src/sgml/func.sgml                   |  49 +++++-
 src/backend/utils/adt/uuid.c             | 200 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  19 +++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |  13 +-
 src/test/regress/expected/uuid.out       | 105 ++++++++++++
 src/test/regress/sql/uuid.sql            |  41 +++++
 7 files changed, 423 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5030a1045f..ec09d06bd8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,56 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..f88ae95710 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,198 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+static bool external_times_used = false;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = (tms <= previous_timestamp) && !external_times_used;
+		external_times_used = false;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+		external_times_used = true;
+		if (tms & ~0xFFFFFFFFFFFF)
+		{
+			/* The standard allows only 6bytes of tms */
+			elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
+		}
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e4115cd084..16eabfbcbd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,25 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'uuidv7',
+  proargnames => '{unix_ts_ms}', proargmodes => '{i}' },
+{ oid => '9898', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9899', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9900', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..1c37533975 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -872,6 +873,12 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuidv7(timestamp with time zone)
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..22488d5990 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,110 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+ERROR:  Time argument of UUID v7 cannot exceed 6 bytes
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected
+ERROR:  Time argument of UUID v7 cannot exceed 6 bytes
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..40c3152697 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,46 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#72

Marcos Pegoraro

marcos@f10.com.br

almost 2 years ago

In reply to: Andrey M. Borodin (#71)

Re: UUID v7

Is enough from 1970 ?
How about if user wants to have an UUID of his birth date ?

regards
Marcos

Em qua., 24 de jan. de 2024 às 13:54, Andrey M. Borodin <
x4mmm@yandex-team.ru> escreveu:

Show quoted text

On 24 Jan 2024, at 20:46, Aleksander Alekseev <aleksander@timescale.com>

wrote:

Only the
fact that timestamp from the far past generates UUID from the future
bothers me.

PFA implementation of guard checks, but I'm afraid that this can cause
failures in ID generation unexpected to the user...
See tests
+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval
'0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); --
ERROR expected
+SELECT 1 FROM
uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT
uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); --
ERROR expected
Range is from 1970-01-01 00:00:00 to 10889-08-02 05:31:50.655. I'm not
sure we should give this information in error message...
Thanks!

Best regards, Andrey Borodin.

#73

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Marcos Pegoraro (#72)

Re: UUID v7

On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:

Is enough from 1970 ?

Per standard unix_ts_ms field is a number of milliseconds from UNIX start date 1970-01-01.

How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and v7 promote data locality via time-ordering.

Best regards, Andrey Borodin.

#74

Marcos Pegoraro

marcos@f10.com.br

almost 2 years ago

In reply to: Andrey Borodin (#73)

Re: UUID v7

I understand your point, but
'2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid
timestamps.

So looks strange if user can do
select uuidv7(TIMESTAMP '2000-01-01')
but cannot do
select uuidv7(TIMESTAMP '1900-01-01')

Regards
Marcos

Em qua., 24 de jan. de 2024 às 14:51, Andrey Borodin <x4mmm@yandex-team.ru>
escreveu:

Show quoted text

On 24 Jan 2024, at 22:00, Marcos Pegoraro <marcos@f10.com.br> wrote:

Is enough from 1970 ?

Per standard unix_ts_ms field is a number of milliseconds from UNIX start
date 1970-01-01.

How about if user wants to have an UUID of his birth date ?

I've claimed my
0078c135-bd00-70b1-865a-63c3741922a5

But again, UUIDs are not designed to store timestamp. They are unique and
v7 promote data locality via time-ordering.

Best regards, Andrey Borodin.

#75

postgres@jeltef.nl

almost 2 years ago

In reply to: Marcos Pegoraro (#74)

Re: UUID v7

On Wed, 24 Jan 2024 at 21:47, Marcos Pegoraro <marcos@f10.com.br> wrote:

I understand your point, but
'2000-01-01' :: timestamp and '1900-01-01' :: timestamp are both valid timestamps.

So looks strange if user can do
select uuidv7(TIMESTAMP '2000-01-01')
but cannot do
select uuidv7(TIMESTAMP '1900-01-01')

I think that would be okay honestly. I don't think there's any
reasonable value for the uuid when a timestamp is given outside of the
date range that the uuid7 "algorithm" supports.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

#76

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Aleksander Alekseev (#68)

Re: UUID v7

"Other people" think that extracting the timestamp from UUIDv7 in violation of the new RFC, and generating UUIDv7 from the timestamp were both terrible and poorly thought out ideas. The authors of the new RFC had very good reasons to prohibit this. And the problems you face are the best confirmation of the correctness of the new RFC. It’s better to throw all this gag out of the official patch. Don't tempt developers to break the new RFC with these error-producing functions.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Wednesday, 24 January 2024 at 04:30:02 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

Function to extract timestamp does not provide any guarantees at all. Standard states this, see Kyzer answers upthread.
Moreover, standard urges against relying on that if uuidX was generated before uuidY, then uuidX<uuid. The standard is doing a lot to make this happen, but does not guaranty that.
All what is guaranteed is the uniqueness at certain conditions.

Otherwise you can calculate crc64(X) or sha256(X)
internally in order to generate an unique ID and claim that it's fine.

Values that violate named invariants should be rejected with an error.

Think about the value that you pass to uuid generation function as an entropy. It’s there to ensure uniqueness and promote ordering (but not guarantee).

This being said, I understand your point of view too. Let's see what
other people think.

--
Best regards,
Aleksander Alekseev

#77

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Nikolay Samokhvalov (#60)

Re: UUID v7

That's right! There is no point in waiting for the official approval of the new RFC, which obviously will not change anything. I have been a contributor to this RFC for several years, and I can testify that every aspect imaginable has been thoroughly researched and agreed upon. Nothing new will definitely appear in the new RFC.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:

On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

Thank you, Andrey! I have just checked v12 – cleanly applied to HEAD, and functions work well. I especially like that fact that we keep uuid_extract_time(..) here – this is a great thing to have for time-based partitioning, and in many cases we will be able to decide not to have a creation column timestamp (e.g., "created_at") at all, saving 8 bytes.
The docs and comments look great too.
Overall, the patch looks mature enough. It would be great to have it in pg17. Yes, the RFC is not fully finalized yet, but it's very close. And many libraries are already including implementation of UUIDv7 – here are some examples:
- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139
Nik

#78

nik@postgres.ai

almost 2 years ago

In reply to: Sergey Prokhorenko (#77)

Re: UUID v7

On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
sergeyprokhorenko@yahoo.com.au> wrote:

That's right! There is no point in waiting for the official approval of
the new RFC, which obviously will not change anything. I have been a
contributor to this RFC
<https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.html#name-acknowledgements>
for several years, and I can testify that every aspect imaginable has been
thoroughly researched and agreed upon. Nothing new will definitely appear
in the new RFC.

From a practical point of view, these two things are extremely important to
have to support partitioning. It is better to implement limitations than
throw them away.

Without them, this functionality will be of a very limited use in
databases. We need to think about large tables – which means partitioning.

Nik

#79

nik@postgres.ai

almost 2 years ago

In reply to: Nikolay Samokhvalov (#78)

Re: UUID v7

On Wed, Jan 24, 2024 at 8:40 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:

On Wed, Jan 24, 2024 at 1:52 PM Sergey Prokhorenko <
sergeyprokhorenko@yahoo.com.au> wrote:

That's right! There is no point in waiting for the official approval of
the new RFC, which obviously will not change anything. I have been a
contributor to this RFC
<https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-14.html#name-acknowledgements>
for several years, and I can testify that every aspect imaginable has been
thoroughly researched and agreed upon. Nothing new will definitely
appear in the new RFC.

From a practical point of view, these two things are extremely important
to have to support partitioning. It is better to implement limitations than
throw them away.

Without them, this functionality will be of a very limited use in
databases. We need to think about large tables – which means partitioning.

apologies -- this was a response to another email from you:

"Other people" think that extracting the timestamp from UUIDv7 in

violation of the new RFC, and generating UUIDv7 from the timestamp were
both terrible and poorly thought out ideas. The authors of the new RFC had
very good reasons to prohibit this. And the problems you face are the best
confirmation of the correctness of the new RFC. It’s better to throw all
this gag out of the official patch. Don't tempt developers to break the new
RFC with these error-producing functions.

#80

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Nikolay Samokhvalov (#78)

Re: UUID v7

On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?

Best regards, Andrey Borodin.

#81

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#80)

Re: UUID v7

I am against turning the DBMS into another C++, in which they do not so much design something new as fix bugs in production after a crash.
As for partitioning, I already wrote to Andrey Borodin that we need a special function to generate a partition id using the UUIDv7 timestamp or even simultaneously with the generation of the timestamp. For example, every month (or so, since precision is not needed here) a new partition is created. Here's a good example: https://elixirforum.com/t/partitioning-postgres-tables-by-timestamp-based-uuids/60916
But without a separate function for extracting the entire timestamp from the UUID! Let's solve this specific problem, and not give the developers a grenade with the safety removed. Many developers have already decided to store the timestamp in UUIDv7, so as not to create a separate created_at field. Then they will delete table records with the old timestamp, etc. Horrible mistakes are simply guaranteed.

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Thursday, 25 January 2024 at 09:51:58 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?

Best regards, Andrey Borodin.

#82

przemyslaw@sztoch.pl

almost 2 years ago

In reply to: Andrey M. Borodin (#80)

Re: UUID v7

Andrey M. Borodin wrote on 25.01.2024 07:51:

On 25 Jan 2024, at 09:40, Nikolay Samokhvalov <nik@postgres.ai> wrote:

From a practical point of view, these two things are extremely important to have to support partitioning. It is better to implement limitations than throw them away.

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

On 25 Jan 2024, at 02:15, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?

+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#83

aleksander@timescale.com

almost 2 years ago

In reply to: Przemysław Sztoch (#82)

Re: UUID v7

Hi,

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

Completely agree.

Users that don't like or don't need it can pretend there are no
uuid_extract_time() and uuidv7(T) in Postgres. If we don't provide
them however, users that need them will end up writing their own
probably buggy and not compatible implementations. That would be much
worse.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?

+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev

#84

aleksander@timescale.com

almost 2 years ago

In reply to: Aleksander Alekseev (#83)

1 attachment(s)

Re: UUID v7

Hi,

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

PFA v14. I changed:

```
elog(ERROR, "Time argument of UUID v7 cannot exceed 6 bytes");
```

... to:

```
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("Time argument of UUID v7 is outside of the valid range")));
```

Which IMO tells a bit more to the average user and is translatable.

At a quick glance, the patch needs improving English, IMO.

Agree. We could use some help from a native English speaker for this.

--
Best regards,
Aleksander Alekseev

Attachments:

v14-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v14-0001-Implement-UUID-v7.patchDownload

From 874d7653f0345a93db3b6b8d954061d073d37915 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v14] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers. For code readability
this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  49 +++++-
 src/backend/utils/adt/uuid.c             | 202 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  19 +++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |  13 +-
 src/test/regress/expected/uuid.out       | 105 ++++++++++++
 src/test/regress/sql/uuid.sql            |  41 +++++
 7 files changed, 425 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 968e8d59fb..74fbb982ab 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,56 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..665e27f498 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,200 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+static bool external_times_used = false;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	if (PG_NARGS() == 0 || PG_ARGISNULL(0))
+	{
+		gettimeofday(&tp, NULL);
+		tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+		/* time from clock is protected from backward leaps */
+		increment_counter = (tms <= previous_timestamp) && !external_times_used;
+		external_times_used = false;
+	}
+	else
+	{
+		ts = PG_GETARG_TIMESTAMPTZ(0);
+		tms = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC) / 1000;
+		/*
+		 * The time can leap backwards when provided by the user, so we use
+		 * counter only when called with exactly same unix_ts_ms argument.
+		 */
+		increment_counter = (tms == previous_timestamp);
+		external_times_used = true;
+		if (tms & ~0xFFFFFFFFFFFF)
+		{
+			/* The standard allows only 6bytes of tms */
+			ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("Time argument of UUID v7 is outside of the valid range")));
+		}
+	}
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d..414a9a417f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,25 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7', proisstrict => 'f',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'timestamptz', prosrc => 'uuidv7',
+  proargnames => '{unix_ts_ms}', proargmodes => '{i}' },
+{ oid => '9898', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9899', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9900', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..1c37533975 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -872,6 +873,12 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuidv7(timestamp with time zone)
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..e85174ae82 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,110 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+ ?column? 
+----------
+ @ 0
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+ERROR:  Time argument of UUID v7 is outside of the valid range
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+ ?column? 
+----------
+        1
+(1 row)
+
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected
+ERROR:  Time argument of UUID v7 is outside of the valid range
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..40c3152697 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,46 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7 with same unix_ts_ms
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(now()));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- check that timestamp is extracted correctly
+SELECT uuid_extract_time(uuidv7(TIMESTAMP '2024-01-16 13:37:00')) - TIMESTAMP '2024-01-16 13:37:00';
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
+-- errors in edge cases of UUID v7
+SELECT 1 FROM uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '0ms');
+SELECT uuidv7('1970-01-01 00:00:00+00'::timestamptz - interval '1ms'); -- ERROR expected
+SELECT 1 FROM uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000'));
+SELECT uuidv7(uuid_extract_time('FFFFFFFF-FFFF-7FFF-B000-000000000000')+'1ms'); -- ERROR expected
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.43.0

#85

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Aleksander Alekseev (#83)

Re: UUID v7

Aleksander,

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Thursday, 25 January 2024 at 03:06:50 pm GMT+3, Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

Postgres always was a bit hackerish, allowing slightly more then is safe. I.e. you can define immutable function that is not really immutable, turn off autovacuum or fsync. Why bother with safety guards here?
My opinion is that we should have this function to extract timestamp. Even if it can return strange values for imprecise RFC implementation.

Completely agree.

So +1 for erroring when you provide a timestamp outside of that range
(either too far in the past or too far in the future).

OK, it seems like we have some consensus on ERRORing..

Do we have any other open items? Does v13 address all open items? Maybe let’s compose better error message?

+1 for erroring when ts is outside range.

v13 looks good for me. I think we have reached a optimal compromise.

Andrey, many thanks for the updated patch.

LGTM, cfbot is happy and I don't think we have any open items left. So
changing CF entry status back to RfC.

--
Best regards,
Aleksander Alekseev

#86

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Sergey Prokhorenko (#77)

Re: UUID v7

By the way, the Go language has also already implemented a function for UUIDv7: https://pkg.go.dev/github.com/gofrs/uuid#NewV7

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Thursday, 25 January 2024 at 12:49:46 am GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Monday, 22 January 2024 at 07:22:32 am GMT+3, Nikolay Samokhvalov <nik@postgres.ai> wrote:

On Fri, Jan 19, 2024 at 10:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Jan 2024, at 13:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Also, I've added some documentation on all functions.

Here's v12. Changes:
1. Documentation improvements
2. Code comments
3. Better commit message and reviews list

#87

postgres@jeltef.nl

almost 2 years ago

In reply to: Sergey Prokhorenko (#85)

Re: UUID v7

tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

Implementations acquire the current timestamp from a reliable
source to provide values that are time-ordered and continually
increasing. Care must be taken to ensure that timestamp changes
from the environment or operating system are handled in a way that
is consistent with implementation requirements. For example, if
it is possible for the system clock to move backward due to either
manual adjustment or corrections from a time synchronization
protocol, implementations need to determine how to handle such
cases. (See Altering, Fuzzing, or Smearing below.)

...

UUID version 1 and 6 both utilize a Gregorian epoch timestamp
while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp
sources or a custom timestamp epoch are required, UUIDv8 MUST be
used.

...

Monotonicity (each subsequent value being greater than the last) is
the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

Implementations MAY alter the actual timestamp. Some examples
include security considerations around providing a real clock
value within a UUID, to correct inaccurate clocks, to handle leap
seconds, or instead of dividing a number of microseconds by 1000
to obtain a millisecond value; dividing by 1024 (or some other
value) for performance reasons. This specification makes no
requirement or guarantee about how close the clock value needs to
be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

Show quoted text

This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
versions and variants this function returns NULL. The extracted timestamp
does not necessarily equate to the time of UUID generation. How close it is
to the actual time depends on the implementation that generated to UUID.
The uuidv7() function provided PostgreSQL will normally store the actual time of
generation to in the UUID, but if large batches of UUIDs are generated at the
same time it's possible that some UUIDs will store a time that is slightly later
than their actual generation time.

#88

[1]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1

postgres@jeltef.nl

almost 2 years ago

In reply to: Aleksander Alekseev (#84)

Re: UUID v7

On Thu, 25 Jan 2024 at 13:31, Aleksander Alekseev
<aleksander@timescale.com> wrote:

PFA v14.

+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. This is the most commonly
+   used type of UUID and is appropriate when random distribution of keys does
+   not affect performance of an application.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 (time-ordered + random) UUID. This UUID
+   version should be used when application prefers locality of identifiers.
+<synopsis>

I think it would be good to explain the tradeoffs between uuidv4 and
uuidv7 a bit better. How about changing the docs to something like
this:

<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
</synopsis>
Both functions return a version 4 (random) UUID. UUIDv4 is one of the
most commonly used types of UUID. It is appropriate when random
distribution of keys does not affect performance of an application or
when exposing the generation time of a UUID has unacceptable security
or business intelligence implications.
<synopsis>
<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
</synopsis>
This function returns a version 7 (time-ordered + random) UUID. It
provides much better data locality than UUIDv4, which can greatly
improve performance when UUID is used in a BTREE index (the default
index type in PostgreSQL). To achieve this data locality, UUIDv7
embeds its own generation time into the UUID. If exposing such a
timestamp has unacceptable security or business intelligence
implications, then uuidv4() should be used instead.
<synopsis>

#89

Junwang Zhao

zhjwpku@gmail.com

almost 2 years ago

In reply to: Jelte Fennema-Nio (#87)

Re: UUID v7

On Mon, Jan 29, 2024 at 7:38 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

tl;dr I believe we should remove the uuidv7(timestamp) function from
this patchset.

On Thu, 25 Jan 2024 at 18:04, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

After re-reading the RFC more diligently, I'm inclined to agree with
Sergey that uuidv7(timestamp) is quite problematic. And I would even
say that we should not provide uuidv7(timestamp) at all, and instead
should only provide uuidv7(). Providing an explicit timestamp for
UUIDv7 is explicitly against the spec (in my reading):

Implementations acquire the current timestamp from a reliable
source to provide values that are time-ordered and continually
increasing. Care must be taken to ensure that timestamp changes
from the environment or operating system are handled in a way that
is consistent with implementation requirements. For example, if
it is possible for the system clock to move backward due to either
manual adjustment or corrections from a time synchronization
protocol, implementations need to determine how to handle such
cases. (See Altering, Fuzzing, or Smearing below.)

...

UUID version 1 and 6 both utilize a Gregorian epoch timestamp
while UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp
sources or a custom timestamp epoch are required, UUIDv8 MUST be
used.

...

Monotonicity (each subsequent value being greater than the last) is
the backbone of time-based sortable UUIDs.

By allowing users to provide a timestamp we're not using a continually
increasing timestamp for our UUIDv7 generation, and thus it would not
be a valid UUIDv7 implementation.

I do agree with others however, that being able to pass in an
arbitrary timestamp for UUID generation would be very useful. For
example to be able to partition by the timestamp in the UUID and then
being able to later load data for an older timestamp and have it be
added to to the older partition. But it's possible to do that while
still following the spec, by using a UUIDv8 instead of UUIDv7. So for
this usecase we could make a helper function that generates a UUIDv8
using the same format as a UUIDv7, but allows storing arbitrary
timestamps. You might say, why not sligthly change UUIDv7 then? Well
mainly because of this critical sentence in the RFC:

UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

That would allow us to say that using this UUIDv8 helper requires
careful usage and checks if uniqueness is required.

So I believe we should remove the uuidv7(timestamp) function from this patchset.

Agreed, the RFC section 6.1[1]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14#section-6.1-2.4.1 has the following statements:

```
UUID version 1 and 6 both utilize a Gregorian epoch timestamp while
UUIDv7 utilizes a Unix Epoch timestamp. If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.
```

In contrib/uuid-ossp, uuidv1 does not allow the user to supply a
custom timestamp,
so I think it should be the same for uuidv6 and uuidv7.

And I have the same feeling that we should not consider v6 and v8 in
this patch.

I don't see a problem with including uuid_extract_time though. Afaict
the only thing the RFC says about extracting timestamps is that the
RFC does not give a requirement or guarantee about how close the
stored timestamp is to the actual time:

Implementations MAY alter the actual timestamp. Some examples
include security considerations around providing a real clock
value within a UUID, to correct inaccurate clocks, to handle leap
seconds, or instead of dividing a number of microseconds by 1000
to obtain a millisecond value; dividing by 1024 (or some other
value) for performance reasons. This specification makes no
requirement or guarantee about how close the clock value needs to
be to the actual time.

I see no reason why we cannot make stronger guarantees about the
timestamps that we use to generate UUIDs with our uuidv7() function.
And then we can update the documentation for
uuid_extract_time to something like this:

This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
versions and variants this function returns NULL. The extracted timestamp
does not necessarily equate to the time of UUID generation. How close it is
to the actual time depends on the implementation that generated to UUID.
The uuidv7() function provided PostgreSQL will normally store the actual time of
generation to in the UUID, but if large batches of UUIDs are generated at the
same time it's possible that some UUIDs will store a time that is slightly later
than their actual generation time.

--
Regards
Junwang Zhao

#90

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Sergey Prokhorenko (#85)

Re: UUID v7

On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Aleksander,

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.
On the contrary, experienced developer can leverage parameter when data locality should be reduced. If you have serveral streams of data, you might want to introduce some shift in reduce contention.
For example, you can generate uuidv7(now() + '1 day' * random(0,10)). This will split 1 contention point to 10 and increase ingestion performance 10x-fold.

On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:

If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :) I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.

Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!

Best regards, Andrey Borodin.

#91

postgres@jeltef.nl

almost 2 years ago

In reply to: Andrey M. Borodin (#90)

Re: UUID v7

On Mon, 29 Jan 2024 at 19:32, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Even if the developer pass constant time to uuidv7(T) they will get what they asked for - unique identifier. Moreover - it still will be keeping locality. There will be no negative consequences at all.

It will be significantly "less unique" than if they wouldn't pass a
constant time. Basically it would become a UUIDv4, but with 74 bits of
random data instead of 122. That might not be enough anymore to
"guarantee" uniqueness. I guess that's why it is required to use
UUIDv8 in these cases, because correct usage is now a requirement for
assuming uniqueness. And for UUIDv8 the spec says this:

UUIDv8's uniqueness will be implementation-specific and MUST NOT be assumed.

On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:

If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.

Well, yeah. RFC says this... in 4 capital letters :)

As an FYI, there is an RFC that defines these keywords that's why they
are capital letters: https://www.ietf.org/rfc/rfc2119.txt

I believe it's kind of a big deficiency that k-way sortable identifiers are not implementable on top of UUIDv7. Well, let's go without this function. UUIDv7 is still an improvement over previous versions.

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.

#92

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#90)

Re: UUID v7

Andrey,
I understand and agree with your goals. But instead of dangerous universal functions, it is better to develop safe highly specialized functions that implement only these goals.
There should not be a function uuidv7(T) from an arbitrary timestamp, but there should be a special function that implements your algorithm: uuidv8(now() + '1 century' * random(0,10)).
I replaced 1 day with 1 century because the spread of 1 day is too small. Over time, records will be inserted between existing records, which is undesirable.
Similarly, if we need to calculate the partition id, then we do not need to use the uuid_extract_time() function to provide the extracted timestamp, the accuracy of which cannot be guaranteed. Instead, we need to give exactly the partition id, calculated using the uuidv7 timestamp. For example, partitions may have approximately a month interval between each other.
As for the documentation, it must be indicated that the UUIDv7 structure is not timestamp + random, but timestamp + randomly seeded counter + random, like in all advanced implementations.

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au
______________________________________________________________

On Monday, 29 January 2024 at 09:32:54 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 25 Jan 2024, at 22:04, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Aleksander,

In this case the documentation must state that the functions uuid_extract_time() and uuidv7(T) are against the RFC requirements, and that developers may use these functions with caution at their own risk, and these functions are not recommended for production environment.

Refining documentation is good. However, saying that these functions are not recommended for production must be based on some real threats.

The function uuidv7(T) is not better than uuid_extract_time(). Careless developers may well pass any business date into this function: document date, registration date, payment date, reporting date, start date of the current month, data download date, and even a constant. This would be a profanation of UUIDv7 with very negative consequences.

On 29 Jan 2024, at 18:58, Junwang Zhao <zhjwpku@gmail.com> wrote:

If other timestamp sources or
a custom timestamp epoch are required, UUIDv8 MUST be used.

Jelte, your documentation corrections looks good to me, I'll include them in next version.

Thanks!

Best regards, Andrey Borodin.

#93

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Jelte Fennema-Nio (#91)

1 attachment(s)

Re: UUID v7

On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v15-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v15-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 9dde2582821354d4638c822e8d77eee6ff60cfa5 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v15] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers. For code readability
this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  62 +++++++-
 src/backend/utils/adt/uuid.c             | 179 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 +++++++++
 src/test/regress/sql/uuid.sql            |  26 ++++
 7 files changed, 357 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4a..588dd1ffd34 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14128,13 +14128,69 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   Both functions return a version 4 (random) UUID. UUIDv4 is one of the
+   most commonly used types of UUID. It is appropriate when random
+   distribution of keys does not affect performance of an application or
+   when exposing the generation time of a UUID has unacceptable security
+   or business intelligence implications.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random). It provides much better data locality
+   than UUIDv4, which can greatly improve performance when UUID is used in a
+   B-tree index (the default index type in PostgreSQL). To achieve this data
+   locality, UUIDv7 embeds its own generation time into the UUID. If exposing
+   such a timestamp has unacceptable security or business intelligence
+   implications, then uuidv4() should be used instead.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated to UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time of
+   generation to in the UUID, but if large batches of UUIDs are generated at the
+   same time it's possible that some UUIDs will store a time that is slightly later
+   than their actual generation time.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c73..ef14d6cba27 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,177 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d5..f9be09464be 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e3..652aeb428e2 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d68..f4b9ff654ab 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d6..f401a550885 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d2..c7362cf4e13 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.42.0

#94

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#93)

Re: UUID v7

Andrey,

I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.This allows implementation of k-way sotable identifiers."
This phrase is wrong: "Both functions return a version 4 (random) UUID."
For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the+ same time it's possible that some UUIDs will store a time that is slightly later+ than their actual generation time

Sergey Prokhorenko

sergeyprokhorenko@yahoo.com.au

On Tuesday, 30 January 2024 at 09:55:04 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 30 Jan 2024, at 01:38, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

Yeah, I liked the feature to generate UUIDv7 based on timestamp too.
But following the spec seems more important than a nice feature to me.

PFA v15. Changes: removed timestamp argument, incorporated Jelte’s documentation addons.

Thanks!

Best regards, Andrey Borodin.

#95

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Sergey Prokhorenko (#94)

1 attachment(s)

Re: UUID v7

On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers.”

Fixed.

This phrase is wrong: "Both functions return a version 4 (random) UUID.”

This applies to functions gen_random_uuid() and uuidv4().

For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the
+   same time it's possible that some UUIDs will store a time that is slightly later
+   than their actual generation time

I’ve rewritten this phrase, hope it’s more clear now.

Best regards, Andrey Borodin.

Attachments:

v16-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v16-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 8d07eec66f93adf1cfa2512f5b89f41fe558dfe6 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v16] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  61 +++++++-
 src/backend/utils/adt/uuid.c             | 179 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 +++++++++
 src/test/regress/sql/uuid.sql            |  26 ++++
 7 files changed, 356 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4a..97abf7f4c69 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14128,13 +14128,68 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID. UUIDv4 is one of the
+   most commonly used types of UUID. It is appropriate when random
+   distribution of keys does not affect performance of an application or
+   when exposing the generation time of a UUID has unacceptable security
+   or business intelligence implications.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random). It provides much better data locality
+   than UUIDv4, which can greatly improve performance when UUID is used in a
+   B-tree index (the default index type in PostgreSQL). To achieve this data
+   locality, UUIDv7 embeds its own generation time into the UUID. If exposing
+   such a timestamp has unacceptable security or business intelligence
+   implications, then uuidv4() should be used instead.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated to UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c73..ef14d6cba27 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,177 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	TimestampTz ts;
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d5..f9be09464be 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e3..652aeb428e2 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d68..f4b9ff654ab 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d6..f401a550885 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d2..c7362cf4e13 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.42.0

#96

Junwang Zhao

zhjwpku@gmail.com

almost 2 years ago

In reply to: Andrey M. Borodin (#95)

Re: UUID v7

Hi Andrey,

On Tue, Jan 30, 2024 at 5:56 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 30 Jan 2024, at 12:28, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

I think this phrase is outdated: "This function can optionally accept a timestamp used instead of current time.
This allows implementation of k-way sotable identifiers.”

Fixed.

This phrase is wrong: "Both functions return a version 4 (random) UUID.”

This applies to functions gen_random_uuid() and uuidv4().
For this phrase the reason is unclear and the phrase is most likely incorrect:
if large batches of UUIDs are generated at the
+   same time it's possible that some UUIDs will store a time that is slightly later
+   than their actual generation time
I’ve rewritten this phrase, hope it’s more clear now.

Best regards, Andrey Borodin.

+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+ pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+ uint16_t result;
+ result = uuid->data[8] >> 6;
+
+ PG_RETURN_UINT16(result);
+}
\ No newline at end of file

It's always good to add a newline at the end of a source file, though
this might be nitpicky.

--
Regards
Junwang Zhao

#97

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Junwang Zhao (#96)

1 attachment(s)

Re: UUID v7

On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:

It's always good to add a newline at the end of a source file, though
this might be nitpicky.

Thanks, also fixed warning found by CFBot.

Best regards, Andrey Borodin.

Attachments:

v17-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v17-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 148750ca11235bc24ef07ceb549b910ba2a862c2 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v17] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_ver() and uuid_extract_var().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  61 +++++++-
 src/backend/utils/adt/uuid.c             | 178 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 +++++++++
 src/test/regress/sql/uuid.sql            |  26 ++++
 7 files changed, 355 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6788ba8ef4a..97abf7f4c69 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14128,13 +14128,68 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_time</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_ver</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_var</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID:
+   <function>gen_random_uuid</function>, <function>uuidv4</function>, and <function>uuidv7</function>.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID. UUIDv4 is one of the
+   most commonly used types of UUID. It is appropriate when random
+   distribution of keys does not affect performance of an application or
+   when exposing the generation time of a UUID has unacceptable security
+   or business intelligence implications.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random). It provides much better data locality
+   than UUIDv4, which can greatly improve performance when UUID is used in a
+   B-tree index (the default index type in PostgreSQL). To achieve this data
+   locality, UUIDv7 embeds its own generation time into the UUID. If exposing
+   such a timestamp has unacceptable security or business intelligence
+   implications, then uuidv4() should be used instead.
+<synopsis>
+<function>uuid_extract_time</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a timestamptz from UUID versions 1, 6 and 7. For other
+   versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated to UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_ver</function> (uuid) <returnvalue>int2</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_var</function> (uuid) <returnvalue>int2</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c73..a157f69c2b7 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -421,3 +426,176 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+Datum
+uuid_extract_time(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+Datum
+uuid_extract_ver(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+Datum
+uuid_extract_var(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+	result = uuid->data[8] >> 6;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 29af4ce65d5..f9be09464be 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate random UUID',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 7',
+  proname => 'uuid_extract_time', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_time' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_ver', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_ver' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_var', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_var' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e3..652aeb428e2 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d68..f4b9ff654ab 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_time(uuid)
+uuid_extract_ver(uuid)
+uuid_extract_var(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d6..f401a550885 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+ uuid_extract_ver 
+------------------
+                7
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+ uuid_extract_ver 
+------------------
+                5
+(1 row)
+
+SELECT uuid_extract_var(uuidv7());
+ uuid_extract_var 
+------------------
+                2
+(1 row)
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+ uuid_extract_time 
+-------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d2..c7362cf4e13 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_ver(uuidv7());
+SELECT uuid_extract_ver('{11111111-1111-1111-1111-111111111111}') IS NULL;
+SELECT uuid_extract_ver('{11111111-1111-5111-8111-111111111111}');
+SELECT uuid_extract_var(uuidv7());
+
+-- uuid_extract_time() must refuse to accept non-UUIDv7
+SELECT uuid_extract_time(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_time('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_time('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.42.0

#98

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#97)

Re: UUID v7

typo:
being carried to time step

should be:being carried to timestemp

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Tuesday, 30 January 2024 at 04:35:45 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:

It's always good to add a newline at the end of a source file, though
this might be nitpicky.

Thanks, also fixed warning found by CFBot.

Best regards, Andrey Borodin.

#99

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#97)

Re: UUID v7

On 30.01.24 14:35, Andrey M. Borodin wrote:

On 30 Jan 2024, at 15:33, Junwang Zhao <zhjwpku@gmail.com> wrote:

It's always good to add a newline at the end of a source file, though
this might be nitpicky.

Thanks, also fixed warning found by CFBot.

I have various comments on this patch:

- doc/src/sgml/func.sgml

The documentation of the new functions should be broken up a bit.
It's all one paragraph now. At least make it several paragraphs, or
possibly tables or something else.

Avoid listing the functions twice: Once before the description and
then again in the description. That's just going to get out of date.
The first listing is not necessary, I think.

The return values in the documentation should use the public-facing
type names, like "timestamp with time zone" and "smallint".

The descriptions of the UUID generation functions use handwavy
language in their descriptions, like "It provides much better data
locality" or "unacceptable security or business intelligence
implications", which isn't useful. Either we cut that all out and
just say, it creates a UUIDv7, done, look elsewhere for more
information, or we provide some more concretely useful details.

We shouldn't label a link as "IETF standard" when it's actually a
draft.

- src/include/catalog/pg_proc.dat

The description of uuidv4 should be "generate UUID version 4", so that
it parallels uuidv7.

The description of uuid_extract_time says 'extract timestamp from UUID
version 7', the implementation is not limited to version 7.

I think uuid_extract_time should be named uuid_extract_timestamp,
because it extracts a timestamp, not a time.

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant. Otherwise, it's hard
to tell them apart, with only one letter different.

- src/test/regress/sql/uuid.sql

Why are the tests using the input format '{...}', which is not the
standard one?

- src/backend/utils/adt/uuid.c

All this new code should have more comments. There is a lot of bit
twiddling going on, and I suppose one is expected to follow along in
the RFC? At least each function should have a header comment, so one
doesn't have to check in pg_proc.dat what it's supposed to do.

I'm suspicious that these functions all appear to return null for
erroneous input, rather than raising errors. I think at least some
explanation for this should be recorded somewhere.

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

The uuidv7 function could really use a header comment that explains
the choices that were made. The RFC draft provides various options
that implementations could use; we should describe which ones we
chose.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2
method 3.

You use some kind of counter, but could you explain which method that
counter implements?

I don't see any acknowledgment of issues relating to concurrency or
restarts. Like, how do we prevent duplicates being generated by
concurrent sessions or between restarts? Maybe the counter or random
stuff does that, but it's not explained.

#100

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#99)

1 attachment(s)

Re: UUID v7

Hi Peter,

thank you for so thoughtful review.

On 6 Mar 2024, at 12:13, Peter Eisentraut <peter@eisentraut.org> wrote:

I have various comments on this patch:

- doc/src/sgml/func.sgml

The documentation of the new functions should be broken up a bit.
It's all one paragraph now. At least make it several paragraphs, or
possibly tables or something else.

I've split functions to generate UUIDs from functions to extract stuff.

Avoid listing the functions twice: Once before the description and
then again in the description. That's just going to get out of date.
The first listing is not necessary, I think.

Fixed.

The return values in the documentation should use the public-facing
type names, like "timestamp with time zone" and "smallint".

Fixed.

The descriptions of the UUID generation functions use handwavy
language in their descriptions, like "It provides much better data
locality" or "unacceptable security or business intelligence
implications", which isn't useful. Either we cut that all out and
just say, it creates a UUIDv7, done, look elsewhere for more
information, or we provide some more concretely useful details.

I've removed all that stuff entirely.

We shouldn't label a link as "IETF standard" when it's actually a
draft.

Fixed.

Well, all my modifications of documentation are kind of blind... I tried to "make docs", but it gives me gazilion of errors... Is there an easy way to see resulting HTML?

- src/include/catalog/pg_proc.dat

The description of uuidv4 should be "generate UUID version 4", so that
it parallels uuidv7.

Fixed.

The description of uuid_extract_time says 'extract timestamp from UUID
version 7', the implementation is not limited to version 7.

Fixed.

I think uuid_extract_time should be named uuid_extract_timestamp,
because it extracts a timestamp, not a time.

Renamed.

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant. Otherwise, it's hard
to tell them apart, with only one letter different.

Renamed.

- src/test/regress/sql/uuid.sql

Why are the tests using the input format '{...}', which is not the
standard one?

Fixed.

- src/backend/utils/adt/uuid.c

All this new code should have more comments. There is a lot of bit
twiddling going on, and I suppose one is expected to follow along in
the RFC? At least each function should have a header comment, so one
doesn't have to check in pg_proc.dat what it's supposed to do.

I've added some header comment. One big comment is attached to v7, I tried to take parts mostly from RFC. Yet there are a lot of my additions that now need review...

I'm suspicious that these functions all appear to return null for
erroneous input, rather than raising errors. I think at least some
explanation for this should be recorded somewhere.

The input is not erroneous per se.
But the fact that
# select 1/0;
ERROR: division by zero
makes me consider throwing an error. There was some argumentation upthread for not throwing error though, but now I cannot find it... maybe I accepted this behaviour as more user-friendly.

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

The uuidv7 function could really use a header comment that explains
the choices that were made. The RFC draft provides various options
that implementations could use; we should describe which ones we
chose.

Done.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are there across different OSes? Even if we put extra 10 bits of timestamp, we cannot extract safely them.
These bits could promote inter-backend stortability. E.i. when many backends generate data fast - this data is still somewhat ordered even within 1ms. But I think benefits of this sortability are outweighed by portability(unknown real resolution), simplicity(we don't store microseconds, thus do not try to extract them).
All this arguments are weak, but if one method would be strictly better than another - there would be only one method.

You use some kind of counter, but could you explain which method that
counter implements?

I described counter in uuidv7() header.

I don't see any acknowledgment of issues relating to concurrency or
restarts. Like, how do we prevent duplicates being generated by
concurrent sessions or between restarts? Maybe the counter or random
stuff does that, but it's not explained.

I think restart takes more than 1ms, so this is covered with time tick.
I've added paragraph about frequency of generation in uuidv7() header.

Best regards, Andrey Borodin.

Attachments:

v18-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v18-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 8d978d0b8fd07f94dd64f1c0fedb93d0555919bc Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v18] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_version() and uuid_extract_variant().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  54 ++++-
 src/backend/utils/adt/uuid.c             | 278 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 ++++++
 src/test/regress/sql/uuid.sql            |  26 +++
 7 files changed, 448 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5030a1045f..afb9ca12ee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,61 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_timestamp</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_version</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_variant</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+<synopsis>
+  </para>
+
+  <para>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated to UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_variant</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..72faec3e66 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -402,6 +407,11 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Routine to generate UUID version 4.
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -421,3 +431,271 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Routine to generate UUID version 7.
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 2). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ * 
+ * Monotonic Random (Method 2) can be implemented with arbitrary size of a
+ * counter. We choose size 18 to reuse all space of bytes that are touched by
+ * ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Rinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. Outh to be enough.
+ * Whenever counter overflow happens, this overflow is translated to increment
+ * of unix_ts_ms. So generation of UUIDs ate rate higher than 128MHz might lead
+ * to using timestamps ahead of time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10 or version is not 1,6 or7.
+ */
+Datum
+uuid_extract_timestamp(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10
+ */
+Datum
+uuid_extract_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+/*
+ * Routine to extract UUID variant. Can return only 0, 0b10, 0b110 and 0b111.
+ */
+Datum
+uuid_extract_variant(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	/*
+	 * The contents of the variant field, where the letter "x" indicates a
+	 * "don't-care" value.
+	 * ----------
+	 * Msb0		Msb1	Msb2	Msb3	Variant	Description
+	 * 0		x		x		x		1-7		Reserved, NCS backward
+	 * 											compatibility and includes Nil
+	 * 											UUID as per Section 5.9.
+	 * 1		0		x		x		8-9,A-B	The variant specified in RFC.
+	 * 1		1		0		x		C-D		Reserved, Microsoft Corporation
+	 * 											backward compatibility.
+	 * 1		1		1		x		E-F		Reserved for future definition
+	 * 											and includes Max UUID as per
+	 * 											Section 5.10 of RFC.
+	 * ----------
+	 */
+
+	uint8_t nibble = uuid->data[8] >> 4;
+	if (nibble < 8)
+		result = 0;
+	else if (nibble < 0xC)
+		result = 0b10;
+	else if (nibble < 0xE)
+		result = 0b110;
+	else
+		result = 0b111;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e4115cd084..eb1d075595 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 1, 6 or 7',
+  proname => 'uuid_extract_timestamp', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_timestamp' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_version', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_variant', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_variant' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..44542c56ef 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_timestamp(uuid)
+uuid_extract_version(uuid)
+uuid_extract_variant(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..2930bcc7f0 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+ uuid_extract_version 
+----------------------
+                    5
+(1 row)
+
+SELECT uuid_extract_variant(uuidv7());
+ uuid_extract_variant 
+----------------------
+                    2
+(1 row)
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+ uuid_extract_timestamp 
+------------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..0f9bd1a661 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+SELECT uuid_extract_variant(uuidv7());
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#101

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Andrey M. Borodin (#100)

1 attachment(s)

Re: UUID v7

On 10 Mar 2024, at 17:59, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I tried to "make docs", but it gives me gazilion of errors... Is there an easy way to see resulting HTML?

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!

Best regards, Andrey Borodin.

Attachments:

v19-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v19-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 53c1799cc731b1c254cd3fe185c12532367e61c1 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v19] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_version() and uuid_extract_variant().

Author: Andrey Borodin
Reviewers: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewers: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewers: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  54 ++++-
 src/backend/utils/adt/uuid.c             | 278 +++++++++++++++++++++++
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 ++++++
 src/test/regress/sql/uuid.sql            |  26 +++
 7 files changed, 448 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5030a1045f..34977163cf 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14130,13 +14130,61 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_timestamp</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_version</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_variant</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+  </para>
+
+  <para>
+<synopsis>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated to UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_variant</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 73dfd711c7..72faec3e66 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -402,6 +407,11 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Routine to generate UUID version 4.
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -421,3 +431,271 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Routine to generate UUID version 7.
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 2). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ * 
+ * Monotonic Random (Method 2) can be implemented with arbitrary size of a
+ * counter. We choose size 18 to reuse all space of bytes that are touched by
+ * ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Rinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. Outh to be enough.
+ * Whenever counter overflow happens, this overflow is translated to increment
+ * of unix_ts_ms. So generation of UUIDs ate rate higher than 128MHz might lead
+ * to using timestamps ahead of time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10 or version is not 1,6 or7.
+ */
+Datum
+uuid_extract_timestamp(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10
+ */
+Datum
+uuid_extract_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+/*
+ * Routine to extract UUID variant. Can return only 0, 0b10, 0b110 and 0b111.
+ */
+Datum
+uuid_extract_variant(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	/*
+	 * The contents of the variant field, where the letter "x" indicates a
+	 * "don't-care" value.
+	 * ----------
+	 * Msb0		Msb1	Msb2	Msb3	Variant	Description
+	 * 0		x		x		x		1-7		Reserved, NCS backward
+	 * 											compatibility and includes Nil
+	 * 											UUID as per Section 5.9.
+	 * 1		0		x		x		8-9,A-B	The variant specified in RFC.
+	 * 1		1		0		x		C-D		Reserved, Microsoft Corporation
+	 * 											backward compatibility.
+	 * 1		1		1		x		E-F		Reserved for future definition
+	 * 											and includes Max UUID as per
+	 * 											Section 5.10 of RFC.
+	 * ----------
+	 */
+
+	uint8_t nibble = uuid->data[8] >> 4;
+	if (nibble < 8)
+		result = 0;
+	else if (nibble < 0xC)
+		result = 0b10;
+	else if (nibble < 0xE)
+		result = 0b110;
+	else
+		result = 0b111;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e4115cd084..eb1d075595 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9174,6 +9174,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 1, 6 or 7',
+  proname => 'uuid_extract_timestamp', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_timestamp' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_version', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_variant', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_variant' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..44542c56ef 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_timestamp(uuid)
+uuid_extract_version(uuid)
+uuid_extract_variant(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..2930bcc7f0 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+ uuid_extract_version 
+----------------------
+                    5
+(1 row)
+
+SELECT uuid_extract_variant(uuidv7());
+ uuid_extract_variant 
+----------------------
+                    2
+(1 row)
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+ uuid_extract_timestamp 
+------------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..0f9bd1a661 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+SELECT uuid_extract_variant(uuidv7());
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#102

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey M. Borodin (#101)

1 attachment(s)

Re: UUID v7

Hi,

Oops, CFbot expectedly found a problem...
Sorry for the noise, this version, I hope, will pass all the tests.
Thanks!

Best regards, Andrey Borodin.

I had some issues applying v19 against the current `master` branch.
PFA the rebased and minorly tweaked v20.

The patch LGTM. I think it could be merged unless there are any open
issues left. I don't think so, but maybe I missed something.

--
Best regards,
Aleksander Alekseev

Attachments:

v20-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v20-0001-Implement-UUID-v7.patchDownload

From 931dc732111d838a2c697c15416fc7c7e9b2247d Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v20] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_version() and uuid_extract_variant().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  54 ++++-
 src/backend/utils/adt/uuid.c             | 280 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 ++++++
 src/test/regress/sql/uuid.sql            |  26 +++
 7 files changed, 449 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0bb7aeb40e..ae2b8dc491 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14127,13 +14127,61 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_timestamp</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_version</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_variant</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+  </para>
+
+  <para>
+<synopsis>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_variant</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index e9c1ec6153..581cd948a3 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
-#include "utils/fmgrprotos.h"
+#include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -406,6 +411,11 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Routine to generate UUID version 4.
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -425,3 +435,271 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Routine to generate UUID version 7.
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 2). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ * 
+ * Monotonic Random (Method 2) can be implemented with arbitrary size of a
+ * counter. We choose size 18 to reuse all space of bytes that are touched by
+ * ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Rinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. Outh to be enough.
+ * Whenever counter overflow happens, this overflow is translated to increment
+ * of unix_ts_ms. So generation of UUIDs ate rate higher than 128MHz might lead
+ * to using timestamps ahead of time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10 or version is not 1,6 or7.
+ */
+Datum
+uuid_extract_timestamp(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10
+ */
+Datum
+uuid_extract_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+/*
+ * Routine to extract UUID variant. Can return only 0, 0b10, 0b110 and 0b111.
+ */
+Datum
+uuid_extract_variant(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	/*
+	 * The contents of the variant field, where the letter "x" indicates a
+	 * "don't-care" value.
+	 * ----------
+	 * Msb0		Msb1	Msb2	Msb3	Variant	Description
+	 * 0		x		x		x		1-7		Reserved, NCS backward
+	 * 											compatibility and includes Nil
+	 * 											UUID as per Section 5.9.
+	 * 1		0		x		x		8-9,A-B	The variant specified in RFC.
+	 * 1		1		0		x		C-D		Reserved, Microsoft Corporation
+	 * 											backward compatibility.
+	 * 1		1		1		x		E-F		Reserved for future definition
+	 * 											and includes Max UUID as per
+	 * 											Section 5.10 of RFC.
+	 * ----------
+	 */
+
+	uint8_t nibble = uuid->data[8] >> 4;
+	if (nibble < 8)
+		result = 0;
+	else if (nibble < 0xC)
+		result = 0b10;
+	else if (nibble < 0xE)
+		result = 0b110;
+	else
+		result = 0b111;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fc..77f11020c6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9167,6 +9167,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 1, 6 or 7',
+  proname => 'uuid_extract_timestamp', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_timestamp' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_version', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_variant', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_variant' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..44542c56ef 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_timestamp(uuid)
+uuid_extract_version(uuid)
+uuid_extract_variant(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..2930bcc7f0 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+ uuid_extract_version 
+----------------------
+                    5
+(1 row)
+
+SELECT uuid_extract_variant(uuidv7());
+ uuid_extract_variant 
+----------------------
+                    2
+(1 row)
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+ uuid_extract_timestamp 
+------------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..0f9bd1a661 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+SELECT uuid_extract_variant(uuidv7());
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.44.0

#103

postgres@jeltef.nl

almost 2 years ago

In reply to: Aleksander Alekseev (#102)

5 attachment(s)

Re: UUID v7

Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Now with the added comments, one thing pops out to me: The comments
mention that we use "Monotonic Random", but when I read the spec that
explicitly recommends against using an increment of 1 when using
monotonic random. I feel like if we use an increment of 1, we're
better off going for the "Fixed-Length Dedicated Counter Bits" method
(i.e. change the code to start the counter at 0). See patch 0005 for
an example of that change.

I'm also wondering if we really want to use the extra rand_b bits for
this. The spec says we MAY, but it does remove the amount of
randomness in our UUIDs.

Attachments:

v21-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v21-0001-Implement-UUID-v7.patchDownload

From 8b304924e3f859c15d923906f8b23ca1b8aa03b3 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v21 1/5] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions and variants
we add functions uuid_extract_version() and uuid_extract_variant().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  54 ++++-
 src/backend/utils/adt/uuid.c             | 280 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  15 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   5 +
 src/test/regress/expected/uuid.out       |  71 ++++++
 src/test/regress/sql/uuid.sql            |  26 +++
 7 files changed, 449 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0bb7aeb40ec..2219a0f63ce 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14127,13 +14127,61 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_timestamp</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_version</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_variant</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+  </para>
+
+  <para>
+<synopsis>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
+</synopsis>
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
+<synopsis>
+<function>uuid_extract_variant</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a vartiant bits from UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index e9c1ec61537..581cd948a30 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
-#include "utils/fmgrprotos.h"
+#include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -406,6 +411,11 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Routine to generate UUID version 4.
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -425,3 +435,271 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Routine to generate UUID version 7.
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 2). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ * 
+ * Monotonic Random (Method 2) can be implemented with arbitrary size of a
+ * counter. We choose size 18 to reuse all space of bytes that are touched by
+ * ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Rinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. Outh to be enough.
+ * Whenever counter overflow happens, this overflow is translated to increment
+ * of unix_ts_ms. So generation of UUIDs ate rate higher than 128MHz might lead
+ * to using timestamps ahead of time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t tms;
+	struct timeval tp;
+	bool increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/* Time did not advance from the previous generation, we must increment counter */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char)(sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers.
+		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
+							(((uint32_t)uuid->data[7]) << 6) +
+							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char)(tms >> 40);
+	uuid->data[1] = (unsigned char)(tms >> 32);
+	uuid->data[2] = (unsigned char)(tms >> 24);
+	uuid->data[3] = (unsigned char)(tms >> 16);
+	uuid->data[4] = (unsigned char)(tms >> 8);
+	uuid->data[5] = (unsigned char)tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10 or version is not 1,6 or7.
+ */
+Datum
+uuid_extract_timestamp(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms =			  uuid->data[5];
+		tms += ((uint64_t)uuid->data[4]) << 8;
+		tms += ((uint64_t)uuid->data[3]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 32;
+		tms += ((uint64_t)uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 24;
+		tms += ((uint64_t)uuid->data[1]) << 16;
+		tms += ((uint64_t)uuid->data[2]) << 8;
+		tms += ((uint64_t)uuid->data[3]);
+		tms += ((uint64_t)uuid->data[4]) << 40;
+		tms += ((uint64_t)uuid->data[5]) << 32;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
+		tms += ((uint64_t)uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms =  ((uint64_t)uuid->data[0]) << 52;
+		tms += ((uint64_t)uuid->data[1]) << 44;
+		tms += ((uint64_t)uuid->data[2]) << 36;
+		tms += ((uint64_t)uuid->data[3]) << 28;
+		tms += ((uint64_t)uuid->data[4]) << 20;
+		tms += ((uint64_t)uuid->data[5]) << 12;
+		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
+		tms += ((uint64_t)uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
+			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10
+ */
+Datum
+uuid_extract_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
+
+/*
+ * Routine to extract UUID variant. Can return only 0, 0b10, 0b110 and 0b111.
+ */
+Datum
+uuid_extract_variant(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t result;
+
+	/*
+	 * The contents of the variant field, where the letter "x" indicates a
+	 * "don't-care" value.
+	 * ----------
+	 * Msb0		Msb1	Msb2	Msb3	Variant	Description
+	 * 0		x		x		x		1-7		Reserved, NCS backward
+	 * 											compatibility and includes Nil
+	 * 											UUID as per Section 5.9.
+	 * 1		0		x		x		8-9,A-B	The variant specified in RFC.
+	 * 1		1		0		x		C-D		Reserved, Microsoft Corporation
+	 * 											backward compatibility.
+	 * 1		1		1		x		E-F		Reserved for future definition
+	 * 											and includes Max UUID as per
+	 * 											Section 5.10 of RFC.
+	 * ----------
+	 */
+
+	uint8_t nibble = uuid->data[8] >> 4;
+	if (nibble < 8)
+		result = 0;
+	else if (nibble < 0xC)
+		result = 0b10;
+	else if (nibble < 0xE)
+		result = 0b110;
+	else
+		result = 0b111;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 291ed876fca..77f11020c63 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9167,6 +9167,21 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 1, 6 or 7',
+  proname => 'uuid_extract_timestamp', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_timestamp' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_version', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
+{ oid => '9899', descr => 'extract variant from UUID',
+  proname => 'uuid_extract_variant', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_variant' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e3..652aeb428e2 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d68..44542c56ef9 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,11 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_timestamp(uuid)
+uuid_extract_version(uuid)
+uuid_extract_variant(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d6..2930bcc7f08 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,76 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+ uuid_extract_version 
+----------------------
+                    5
+(1 row)
+
+SELECT uuid_extract_variant(uuidv7());
+ uuid_extract_variant 
+----------------------
+                    2
+(1 row)
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+ uuid_extract_timestamp 
+------------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d2..0f9bd1a6617 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,31 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+SELECT uuid_extract_variant(uuidv7());
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;

base-commit: 6d9751fa8fd40c988541c9c72ac7a2095ba73c19
-- 
2.34.1

v21-0003-Run-pgindent.patchapplication/octet-stream; name=v21-0003-Run-pgindent.patchDownload

From b1e23f1dac989cd9ea9fb6cf6093cb0adfd527b7 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Mon, 11 Mar 2024 15:54:57 +0100
Subject: [PATCH v21 3/5] Run pgindent

---
 src/backend/utils/adt/uuid.c | 119 ++++++++++++++++++-----------------
 1 file changed, 62 insertions(+), 57 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index a22efa3822a..93327d27a8b 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -509,18 +509,21 @@ Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
-	uint64_t tms;
+	uint64_t	tms;
 	struct timeval tp;
-	bool increment_counter;
+	bool		increment_counter;
 
 	gettimeofday(&tp, NULL);
-	tms = ((uint64_t)tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	tms = ((uint64_t) tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
 	/* time from clock is protected from backward leaps */
 	increment_counter = (tms <= previous_timestamp);
 
 	if (increment_counter)
 	{
-		/* Time did not advance from the previous generation, we must increment counter */
+		/*
+		 * Time did not advance from the previous generation, we must
+		 * increment counter
+		 */
 		++sequence_counter;
 		if (sequence_counter > 0x3ffff)
 		{
@@ -536,14 +539,14 @@ uuidv7(PG_FUNCTION_ARGS)
 		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
-					errmsg("could not generate random values")));
+					 errmsg("could not generate random values")));
 
 		/* most significant 4 bits of 18-bit counter */
-		uuid->data[6] = (unsigned char)(sequence_counter >> 14);
+		uuid->data[6] = (unsigned char) (sequence_counter >> 14);
 		/* next 8 bits */
-		uuid->data[7] = (unsigned char)(sequence_counter >> 6);
+		uuid->data[7] = (unsigned char) (sequence_counter >> 6);
 		/* least significant 6 bits */
-		uuid->data[8] = (unsigned char)(sequence_counter);
+		uuid->data[8] = (unsigned char) (sequence_counter);
 	}
 	else
 	{
@@ -551,31 +554,31 @@ uuidv7(PG_FUNCTION_ARGS)
 		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
 			ereport(ERROR,
 					(errcode(ERRCODE_INTERNAL_ERROR),
-					errmsg("could not generate random values")));
+					 errmsg("could not generate random values")));
 
 		/*
 		 * Left-most counter bits are initialized as zero for the sole purpose
-		 * of guarding against counter rollovers.
-		 * See section "Fixed-Length Dedicated Counter Seeding"
+		 * of guarding against counter rollovers. See section "Fixed-Length
+		 * Dedicated Counter Seeding"
 		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
 		 */
 		uuid->data[6] = (uuid->data[6] & 0xf7);
 
 		/* read randomly initialized bits of counter */
-		sequence_counter = ((uint32_t)uuid->data[8] & 0x3f) +
-							(((uint32_t)uuid->data[7]) << 6) +
-							(((uint32_t)uuid->data[6] & 0x0f) << 14);
+		sequence_counter = ((uint32_t) uuid->data[8] & 0x3f) +
+			(((uint32_t) uuid->data[7]) << 6) +
+			(((uint32_t) uuid->data[6] & 0x0f) << 14);
 
 		previous_timestamp = tms;
 	}
 
 	/* Fill in time part */
-	uuid->data[0] = (unsigned char)(tms >> 40);
-	uuid->data[1] = (unsigned char)(tms >> 32);
-	uuid->data[2] = (unsigned char)(tms >> 24);
-	uuid->data[3] = (unsigned char)(tms >> 16);
-	uuid->data[4] = (unsigned char)(tms >> 8);
-	uuid->data[5] = (unsigned char)tms;
+	uuid->data[0] = (unsigned char) (tms >> 40);
+	uuid->data[1] = (unsigned char) (tms >> 32);
+	uuid->data[2] = (unsigned char) (tms >> 24);
+	uuid->data[3] = (unsigned char) (tms >> 16);
+	uuid->data[4] = (unsigned char) (tms >> 8);
+	uuid->data[5] = (unsigned char) tms;
 
 	/*
 	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
@@ -598,21 +601,21 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	TimestampTz ts;
-	uint64_t tms;
+	uint64_t	tms;
 
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
 	if ((uuid->data[6] & 0xf0) == 0x70)
 	{
-		tms =			  uuid->data[5];
-		tms += ((uint64_t)uuid->data[4]) << 8;
-		tms += ((uint64_t)uuid->data[3]) << 16;
-		tms += ((uint64_t)uuid->data[2]) << 24;
-		tms += ((uint64_t)uuid->data[1]) << 32;
-		tms += ((uint64_t)uuid->data[0]) << 40;
-
-		ts = (TimestampTz) (tms * 1000) - /* convert ms to us, than adjust */
+		tms = uuid->data[5];
+		tms += ((uint64_t) uuid->data[4]) << 8;
+		tms += ((uint64_t) uuid->data[3]) << 16;
+		tms += ((uint64_t) uuid->data[2]) << 24;
+		tms += ((uint64_t) uuid->data[1]) << 32;
+		tms += ((uint64_t) uuid->data[0]) << 40;
+
+		ts = (TimestampTz) (tms * 1000) -	/* convert ms to us, than adjust */
 			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
@@ -620,34 +623,36 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 	if ((uuid->data[6] & 0xf0) == 0x10)
 	{
-		tms =  ((uint64_t)uuid->data[0]) << 24;
-		tms += ((uint64_t)uuid->data[1]) << 16;
-		tms += ((uint64_t)uuid->data[2]) << 8;
-		tms += ((uint64_t)uuid->data[3]);
-		tms += ((uint64_t)uuid->data[4]) << 40;
-		tms += ((uint64_t)uuid->data[5]) << 32;
-		tms += (((uint64_t)uuid->data[6])&0xf) << 56;
-		tms += ((uint64_t)uuid->data[7]) << 48;
-
-		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
-			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		tms = ((uint64_t) uuid->data[0]) << 24;
+		tms += ((uint64_t) uuid->data[1]) << 16;
+		tms += ((uint64_t) uuid->data[2]) << 8;
+		tms += ((uint64_t) uuid->data[3]);
+		tms += ((uint64_t) uuid->data[4]) << 40;
+		tms += ((uint64_t) uuid->data[5]) << 32;
+		tms += (((uint64_t) uuid->data[6]) & 0xf) << 56;
+		tms += ((uint64_t) uuid->data[7]) << 48;
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us,
+										 * than adjust */
+			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
 
 	if ((uuid->data[6] & 0xf0) == 0x60)
 	{
-		tms =  ((uint64_t)uuid->data[0]) << 52;
-		tms += ((uint64_t)uuid->data[1]) << 44;
-		tms += ((uint64_t)uuid->data[2]) << 36;
-		tms += ((uint64_t)uuid->data[3]) << 28;
-		tms += ((uint64_t)uuid->data[4]) << 20;
-		tms += ((uint64_t)uuid->data[5]) << 12;
-		tms += (((uint64_t)uuid->data[6])&0xf) << 8;
-		tms += ((uint64_t)uuid->data[7]);
-
-		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us, than adjust */
-			((uint64_t)POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		tms = ((uint64_t) uuid->data[0]) << 52;
+		tms += ((uint64_t) uuid->data[1]) << 44;
+		tms += ((uint64_t) uuid->data[2]) << 36;
+		tms += ((uint64_t) uuid->data[3]) << 28;
+		tms += ((uint64_t) uuid->data[4]) << 20;
+		tms += ((uint64_t) uuid->data[5]) << 12;
+		tms += (((uint64_t) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64_t) uuid->data[7]);
+
+		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us,
+										 * than adjust */
+			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -663,7 +668,7 @@ Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
-	uint16_t result;
+	uint16_t	result;
 
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
@@ -679,12 +684,11 @@ Datum
 uuid_extract_variant(PG_FUNCTION_ARGS)
 {
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
-	uint16_t result;
+	uint16_t	result;
 
-	/*
+	/*-----------
 	 * The contents of the variant field, where the letter "x" indicates a
 	 * "don't-care" value.
-	 * ----------
 	 * Msb0		Msb1	Msb2	Msb3	Variant	Description
 	 * 0		x		x		x		1-7		Reserved, NCS backward
 	 * 											compatibility and includes Nil
@@ -695,10 +699,11 @@ uuid_extract_variant(PG_FUNCTION_ARGS)
 	 * 1		1		1		x		E-F		Reserved for future definition
 	 * 											and includes Max UUID as per
 	 * 											Section 5.10 of RFC.
-	 * ----------
+	 *-----------
 	 */
 
-	uint8_t nibble = uuid->data[8] >> 4;
+	uint8_t		nibble = uuid->data[8] >> 4;
+
 	if (nibble < 8)
 		result = 0;
 	else if (nibble < 0xC)
-- 
2.34.1

v21-0005-Change-to-Fixed-Length-Dedicated-Counter-Bits.patchapplication/octet-stream; name=v21-0005-Change-to-Fixed-Length-Dedicated-Counter-Bits.patchDownload

From 4a892e72e1a033fa51a37f80cb66abce9fbd0aae Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Mon, 11 Mar 2024 16:53:32 +0100
Subject: [PATCH v21 5/5] Change to Fixed-Length Dedicated Counter Bits

---
 src/backend/utils/adt/uuid.c | 59 ++++++++++++++----------------------
 1 file changed, 23 insertions(+), 36 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 535a7c08025..e86dbc8baf3 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -481,16 +481,17 @@ static uint64_t previous_timestamp = 0;
  *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
  * ----------
  *
- * Monotonic Random (Method 2) can be implemented with arbitrary size of a
- * counter. We choose size 18 to reuse all space of bytes that are touched by
- * ver and var fields + rand_a bytes between them.
+ * Fixed-Length Dedicated Counter Bits (Method 1) MAY use the left-most bits of
+ * rand_b as additional counter bits. We choose size 18 to reuse all space of
+ * bytes that are touched by ver and var fields + rand_a bytes between them.
  * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
  * reinitialized. Reinilialization always sets most significant bit to 0, other
- * bits are initialized with random numbers. This gives as approximately 192K
+ * bits are initialized with random numbers. This gives as approximately 262K
  * UUIDs within one millisecond without overflow. This ougth to be enough for
  * most practical purposes. Whenever counter overflow happens, this overflow is
- * translated to increment of unix_ts_ms. So generation of UUIDs ate rate
- * higher than 128MHz might lead to using timestamps ahead of time.
+ * translated to increment of unix_ts_ms. So generation of UUIDs at a rate
+ * higher than 262MHz in the same backend might lead to using timestamps ahead
+ * of time.
  *
  * We're not using the "Replace Left-Most Random Bits with Increased Clock
  * Precision" method Section 6.2 (Method 3), because of portability concerns.
@@ -503,7 +504,9 @@ static uint64_t previous_timestamp = 0;
  * but this monotonicity is not strictly guaranteed. UUIDs generated on
  * different nodes are mostly monotonic with regards to possible clock drift.
  * Uniqueness of UUIDs generated at the same timestamp across different
- * backends and/or nodes is guaranteed by using random bits.
+ * backends and/or nodes is guaranteed by using random bits. Since we're still
+ * using 56 bits of random data in rand_b, so we're not expecting any
+ * collisions within the same millisecond.
  */
 Datum
 uuidv7(PG_FUNCTION_ARGS)
@@ -535,39 +538,11 @@ uuidv7(PG_FUNCTION_ARGS)
 		/* protection from leap backward */
 		tms = previous_timestamp;
 
-		/* fill everything after the timestamp and counter with random bytes */
-		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
-			ereport(ERROR,
-					(errcode(ERRCODE_INTERNAL_ERROR),
-					 errmsg("could not generate random values")));
-
-		/* most significant 4 bits of 18-bit counter */
-		uuid->data[6] = (unsigned char) (sequence_counter >> 14);
-		/* next 8 bits */
-		uuid->data[7] = (unsigned char) (sequence_counter >> 6);
-		/* least significant 6 bits */
-		uuid->data[8] = (unsigned char) (sequence_counter);
 	}
 	else
 	{
-		/* fill everything after the timestamp with random bytes */
-		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
-			ereport(ERROR,
-					(errcode(ERRCODE_INTERNAL_ERROR),
-					 errmsg("could not generate random values")));
-
-		/*
-		 * Left-most counter bits are initialized as zero for the sole purpose
-		 * of guarding against counter rollovers. See section "Fixed-Length
-		 * Dedicated Counter Seeding"
-		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
-		 */
-		uuid->data[6] = (uuid->data[6] & 0xf7);
-
 		/* read randomly initialized bits of counter */
-		sequence_counter = ((uint32_t) uuid->data[8] & 0x3f) +
-			(((uint32_t) uuid->data[7]) << 6) +
-			(((uint32_t) uuid->data[6] & 0x0f) << 14);
+		sequence_counter = 0;
 
 		previous_timestamp = tms;
 	}
@@ -579,6 +554,18 @@ uuidv7(PG_FUNCTION_ARGS)
 	uuid->data[3] = (unsigned char) (tms >> 16);
 	uuid->data[4] = (unsigned char) (tms >> 8);
 	uuid->data[5] = (unsigned char) tms;
+	/* most significant 4 bits of 18-bit counter */
+	uuid->data[6] = (unsigned char) (sequence_counter >> 14);
+	/* next 8 bits */
+	uuid->data[7] = (unsigned char) (sequence_counter >> 6);
+	/* least significant 6 bits */
+	uuid->data[8] = (unsigned char) (sequence_counter);
+
+	/* fill everything after the timestamp and counter with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
 
 	/*
 	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
-- 
2.34.1

v21-0004-Fix-comments-a-bit.patchapplication/octet-stream; name=v21-0004-Fix-comments-a-bit.patchDownload

From d953b41914f04f878b6bd3c72e71a8fab8853ea3 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Mon, 11 Mar 2024 15:56:28 +0100
Subject: [PATCH v21 4/5] Fix comments a bit

---
 src/backend/utils/adt/uuid.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 93327d27a8b..535a7c08025 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -615,7 +615,8 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 		tms += ((uint64_t) uuid->data[1]) << 32;
 		tms += ((uint64_t) uuid->data[0]) << 40;
 
-		ts = (TimestampTz) (tms * 1000) -	/* convert ms to us, than adjust */
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
 			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
@@ -632,8 +633,8 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 		tms += (((uint64_t) uuid->data[6]) & 0xf) << 56;
 		tms += ((uint64_t) uuid->data[7]) << 48;
 
-		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us,
-										 * than adjust */
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
 			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
@@ -650,8 +651,8 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 		tms += (((uint64_t) uuid->data[6]) & 0xf) << 8;
 		tms += ((uint64_t) uuid->data[7]);
 
-		ts = (TimestampTz) (tms / 10) - /* convert 100-ns intervals to us,
-										 * than adjust */
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
 			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
-- 
2.34.1

v21-0002-Fix-typos.patchapplication/octet-stream; name=v21-0002-Fix-typos.patchDownload

From a40cdbd7bc9b7fb708f926d69dfb16dff62ec148 Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <jelte.fennema@microsoft.com>
Date: Mon, 11 Mar 2024 15:51:06 +0100
Subject: [PATCH v21 2/5] Fix typos

---
 src/backend/utils/adt/uuid.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 581cd948a30..a22efa3822a 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -447,7 +447,7 @@ static uint64_t previous_timestamp = 0;
  * UUIDv7 Field and Bit Layout:
  * ----------
  *  0                   1                   2                   3
- 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                           unix_ts_ms                          |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -480,23 +480,30 @@ static uint64_t previous_timestamp = 0;
  *  bits filled with pseudo-random data to provide uniqueness as per
  *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
  * ----------
- * 
+ *
  * Monotonic Random (Method 2) can be implemented with arbitrary size of a
  * counter. We choose size 18 to reuse all space of bytes that are touched by
  * ver and var fields + rand_a bytes between them.
  * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
- * reinitialized. Rinilialization always sets most significant bit to 0, other
+ * reinitialized. Reinilialization always sets most significant bit to 0, other
  * bits are initialized with random numbers. This gives as approximately 192K
- * UUIDs within one millisecond without overflow. Outh to be enough.
- * Whenever counter overflow happens, this overflow is translated to increment
- * of unix_ts_ms. So generation of UUIDs ate rate higher than 128MHz might lead
- * to using timestamps ahead of time.
+ * UUIDs within one millisecond without overflow. This ougth to be enough for
+ * most practical purposes. Whenever counter overflow happens, this overflow is
+ * translated to increment of unix_ts_ms. So generation of UUIDs ate rate
+ * higher than 128MHz might lead to using timestamps ahead of time.
+ *
+ * We're not using the "Replace Left-Most Random Bits with Increased Clock
+ * Precision" method Section 6.2 (Method 3), because of portability concerns.
+ * It's unclear if all supported platforms can provide reliable microsocond
+ * precision time.
  *
  * All UUID generator state is backend-local. For UUIDs generated in one
  * backend we guarantee monotonicity. UUIDs generated on different backends
  * will be mostly monotonic if they are generated at frequences less than 1KHz,
  * but this monotonicity is not strictly guaranteed. UUIDs generated on
  * different nodes are mostly monotonic with regards to possible clock drift.
+ * Uniqueness of UUIDs generated at the same timestamp across different
+ * backends and/or nodes is guaranteed by using random bits.
  */
 Datum
 uuidv7(PG_FUNCTION_ARGS)
@@ -550,7 +557,7 @@ uuidv7(PG_FUNCTION_ARGS)
 		 * Left-most counter bits are initialized as zero for the sole purpose
 		 * of guarding against counter rollovers.
 		 * See section "Fixed-Length Dedicated Counter Seeding"
-		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-09#monotonicity_counters
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
 		 */
 		uuid->data[6] = (uuid->data[6] & 0xf7);
 
-- 
2.34.1

#104

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Jelte Fennema-Nio (#103)

Re: UUID v7

On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

Now with the added comments, one thing pops out to me: The comments
mention that we use "Monotonic Random", but when I read the spec that
explicitly recommends against using an increment of 1 when using
monotonic random. I feel like if we use an increment of 1, we're
better off going for the "Fixed-Length Dedicated Counter Bits" method
(i.e. change the code to start the counter at 0). See patch 0005 for
an example of that change.

I'm also wondering if we really want to use the extra rand_b bits for
this. The spec says we MAY, but it does remove the amount of
randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values generated at frequency less than 1KHz.
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d)
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no difference between using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given to c3 can equally be given to d.

Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2 also contribute to unguessability! So, technically, we should consider using all available bits as c2 bits.

How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise between capacity of one backend per ms and prevention of global collision.
This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane value.

This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any mistake. Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10 independent devices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6.

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.

Thanks!

Best regards, Andrey Borodin.

#105

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#104)

Re: UUID v7

On Mon, Mar 11, 2024 at 11:27:43PM +0500, Andrey M. Borodin wrote:

Sorry for this long and vague explanation, if it still seems too
uncertain we can have a chat or something like that. I don't think
this number picking stuff deserve to be commented, because it still
is quite close to random. RFC gives us too much freedom of choice.

Speaking about the RFC, I can see that there is a draft but nothing
formal yet. The last one I can see is v14 from last November:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-14

It does not strike me as a good idea to rush an implementation without
a specification officially approved because there is always a risk of
shipping something that's non-compliant into core. But perhaps I am
missing something on the RFC side?
--
Michael

#106

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#105)

Re: UUID v7

On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:

It does not strike me as a good idea to rush an implementation without
a specification officially approved because there is always a risk of
shipping something that's non-compliant into core. But perhaps I am
missing something on the RFC side?

Upthread one of document’s authors commented:

On 14 Feb 2023, at 19:13, Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:

The point is 99% of the work since adoption by the IETF has been ironing out
RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a
very good state.

And also

On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:

And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

So at least reviewing patch and agreeing on chosen methods and constants makes sense.

Best regards, Andrey Borodin.

#107

michael@paquier.xyz

almost 2 years ago

In reply to: Andrey M. Borodin (#106)

Re: UUID v7

On Tue, Mar 12, 2024 at 11:10:37AM +0500, Andrey M. Borodin wrote:

On 12 Mar 2024, at 10:53, Michael Paquier <michael@paquier.xyz> wrote:

On 22 Jan 2024, at 09:22, Nikolay Samokhvalov <nik@postgres.ai> wrote:

And many libraries are already including implementation of UUIDv7 – here are some examples:

- https://www.npmjs.com/package/uuidv7
- https://crates.io/crates/uuidv7
- https://github.com/google/uuid/pull/139

So at least reviewing patch and agreeing on chosen methods and constants makes sense.

Sure, there is no problem in discussing a patch to implement a
behavior. But I disagree about taking a risk in merging something
that could become non-compliant with the approved RFC, if the draft is
approved at the end, of course. This just strikes me as a bad idea.
--
Michael

#108

postgres@jeltef.nl

almost 2 years ago

In reply to: Andrey M. Borodin (#104)

Re: UUID v7

On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.

I thought your explanation was quite clear and I agree that this
approach makes the most sense. I sent an email to the RFC authors to
ask for their feedback with you (Andrey) in the CC, because even
though it makes the most sense it does not comply with the either of
method 1 or 2 as described in the RFC.

#109

postgres@jeltef.nl

almost 2 years ago

In reply to: Michael Paquier (#107)

Re: UUID v7

On Tue, 12 Mar 2024 at 07:32, Michael Paquier <michael@paquier.xyz> wrote:

Sure, there is no problem in discussing a patch to implement a
behavior. But I disagree about taking a risk in merging something
that could become non-compliant with the approved RFC, if the draft is
approved at the end, of course. This just strikes me as a bad idea.

I agree that we shouldn't release UUIDv7 support if the RFC describing
that is not yet approved. But I do think it would be a shame if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze. Which
would then mean we'd have to wait another 1.5 years before actually
using uuidv7. Would it be a reasonable compromise to still merge the
patch for PG17 (assuming the code is good to merge with regards to the
current draft RFC), but revert the commit if the RFC is not approved
before some deadline before the release date (e.g. before the first
release candidate)?

#110

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Jelte Fennema-Nio (#108)

Re: UUID v7

Hi Jelte,
I am one of the contributors to this RFC.

Andrey's patch corresponds exactly to Fixed-Length Dedicated Counter Bits (Method 1).

Andrey and you simply did not read the RFC a little further down in the text:
__________________________________________________________________

The following sub-topics cover topics related solely with creating reliable fixed-length dedicated counters:

- Fixed-Length Dedicated Counter Seeding:
-
Implementations utilizing the fixed-length counter method randomly initialize the counter with each new timestamp tick. However, when the timestamp has not increased, the counter is instead incremented by the desired increment logic. When utilizing a randomly seeded counter alongside Method 1, the random value MAY be regenerated with each counter increment without impacting sortability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments. Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion of the counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized. The remaining most significant, left-most counter bit is initialized as zero for the sole purpose of guarding against counter rollovers.

-
- Fixed-Length Dedicated Counter Length:
- Select a counter bit-length that can properly handle the level of timestamp precision in use. For example, millisecond precision generally requires a larger counter than a timestamp with nanosecond precision. General guidance is that the counter SHOULD be at least 12 bits but no longer than 42 bits. Care must be taken to ensure that the counter length selected leaves room for sufficient entropy in the random portion of the UUID after the counter. This entropy helps improve the unguessability characteristics of UUIDs created within the batch.
- The following sub-topics cover rollover handling with either type of counter method:

- ...
-
- Counter Rollover Handling:
-
Counter rollovers MUST be handled by the application to avoid sorting issues. The general guidance is that applications that care about absolute monotonicity and sortability should freeze the counter and wait for the timestamp to advance which ensures monotonicity is not broken. Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Tuesday, 12 March 2024 at 06:36:13 pm GMT+3, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Mon, 11 Mar 2024 at 19:27, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.

#111

postgres@jeltef.nl

almost 2 years ago

In reply to: Sergey Prokhorenko (#110)

Re: UUID v7

On Tue, 12 Mar 2024 at 18:18, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Andrey and you simply did not read the RFC a little further down in the text:

You're totally right, sorry about that. Maybe it would be good to move
those subsections around a bit in the RFC though, so that anything
related to only one method is included in the section for that method.

#112

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#100)

Re: UUID v7

On 10.03.24 13:59, Andrey M. Borodin wrote:

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant. Otherwise, it's hard
to tell them apart, with only one letter different.

Renamed.

Another related comment: Throughout your patch, swap the order of
uuid_extract_variant and uuid_extract_version. First, this makes more
sense because version is subordinate to variant, and also it makes it
alphabetical.

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently. I think
uuid_extract_variant should just return whatever is in those four bits.
Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.",
which I don't think it is correct. It should return 0 through 15.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are there across different OSes?

I think this should have been researched. If we don't know how many
bits we have, how do we know we have enough for milliseconds? I think
we should at least have some kind of idea, if we are going to have this
conversation.

#113

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#112)

Re: UUID v7

On 14 Mar 2024, at 16:07, Peter Eisentraut <peter@eisentraut.org> wrote:

On 10.03.24 13:59, Andrey M. Borodin wrote:

The functions uuid_extract_ver and uuid_extract_var could be named
uuid_extract_version and uuid_extract_variant. Otherwise, it's hard
to tell them apart, with only one letter different.

Renamed.

Another related comment: Throughout your patch, swap the order of uuid_extract_variant and uuid_extract_version. First, this makes more sense because version is subordinate to variant, and also it makes it alphabetical.

I will do it soon.

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently. I think uuid_extract_variant should just return whatever is in those four bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is correct. It should return 0 through 15.

We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15.

I would have expected that, since gettimeofday() provides microsecond
precision, we'd put the extra precision into "rand_a" as per Section 6.2 method 3.

I had chosen method 2 over method 3 as most portable. Can we be sure how many bits (after reading milliseconds) are there across different OSes?

I think this should have been researched. If we don't know how many bits we have, how do we know we have enough for milliseconds? I think we should at least have some kind of idea, if we are going to have this conversation.

Bits for milliseconds are strictly defined by the document: there are always 48 bits, independently from clock resolution.
But I don't think it's main problem for Method 3. Method 1 actually guarantees strictly increasing order of UUIDs generated by single backend. Method 3 can generate a lot of unsorted data in case of time leaping backward.

BTW Kyzer (in an off-list discussion) and Sergey confirmed that implemented method from the patch actually is Method 1.

Best regards, Andrey Borodin.

#114

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#113)

Re: UUID v7

On 14.03.24 12:25, Andrey M. Borodin wrote:

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently. I think uuid_extract_variant should just return whatever is in those four bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is correct. It should return 0 through 15.

We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15.

I agree it's confusing. Before I studied the RFC 4122bis project, I
didn't even know about variant vs. version. I think overall people will
find this more confusing than useful. If you just want to know, "is
this UUID of the kind specified in RFC 4122", you can query it with
uuid_extract_version(x) IS NOT NULL. So maybe we don't need the
_extract_variant function?

#115

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#114)

Re: UUID v7

On 14 Mar 2024, at 20:10, Peter Eisentraut <peter@eisentraut.org> wrote:

I think the behavior of uuid_extract_var(iant) is wrong. The code
takes just two bits to return, but the draft document is quite clear
that the variant is 4 bits (see Table 1).

Well, it was correct only for implemented variant. I've made version that implements full table 1 from section 4.1.

I think we are still interpreting this differently. I think uuid_extract_variant should just return whatever is in those four bits. Your function comment says "Can return only 0, 0b10, 0b110 and 0b111.", which I don't think it is correct. It should return 0 through 15.

We will return "do not care" bits. This bits can confuse someone. E.g. for varaint 0b10 we can return 8, 9, 10 and 11 randomly. Is it OK? BTW for some reason document lists number 1-15, but your are correct that range is 0-15.

I agree it's confusing. Before I studied the RFC 4122bis project, I didn't even know about variant vs. version. I think overall people will find this more confusing than useful. If you just want to know, "is this UUID of the kind specified in RFC 4122", you can query it with uuid_extract_version(x) IS NOT NULL. So maybe we don't need the _extract_variant function?

I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.

Best regards, Andrey Borodin.

#116

aleksander@timescale.com

almost 2 years ago

In reply to: Andrey M. Borodin (#115)

Re: UUID v7

Hi,

So maybe we don't need the _extract_variant function?

I think it's the best possible solution. The variant has no value besides detecting if a version can be extracted.

+1 to the idea. I doubt that anyone will miss it.

--
Best regards,
Aleksander Alekseev

#117

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Aleksander Alekseev (#116)

1 attachment(s)

Re: UUID v7

On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

Thanks!

Best regards, Andrey Borodin.

Attachments:

v22-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v22-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From d74968fe65bf0760723347b420bc2dcf8938fbb9 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Sun, 20 Aug 2023 23:55:31 +0300
Subject: [PATCH v22] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Also we add a function to extract timestamp from UUID v1, v6 and v7.
To allow user to distinguish various UUID versions we add function
uuid_extract_version().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  46 +++-
 src/backend/utils/adt/uuid.c             | 254 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  12 ++
 src/include/datatype/timestamp.h         |   3 +-
 src/test/regress/expected/opr_sanity.out |   4 +
 src/test/regress/expected/uuid.out       |  65 ++++++
 src/test/regress/sql/uuid.sql            |  25 +++
 7 files changed, 404 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 72c5175e3b..06a8f3a0d6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14127,13 +14127,53 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_timestamp</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_extract_version</primary>
+  </indexterm>
+
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+  </para>
+
+  <para>
+<synopsis>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index e9c1ec6153..80bce72276 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,13 +13,18 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
-#include "utils/fmgrprotos.h"
+#include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
+#include "utils/timestamp.h"
 #include "utils/uuid.h"
 
 /* sortsupport for uuid */
@@ -406,6 +411,11 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Routine to generate UUID version 4.
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -425,3 +435,245 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	PG_RETURN_UUID_P(uuid);
 }
+
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Routine to generate UUID version 7.
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 1). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ *
+ * "Fixed-Length Dedicated Counter Bits" (Method 1) can be implemented with
+ * arbitrary size of a counter. We choose size 18 to reuse all space of bytes
+ * that are touched by ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Reinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. This ougth to be enough for
+ * most practical purposes. Whenever counter overflow happens, this overflow is
+ * translated to increment of unix_ts_ms. So generation of UUIDs ate rate
+ * higher than 128MHz might lead to using timestamps ahead of time.
+ *
+ * We're not using the "Replace Left-Most Random Bits with Increased Clock
+ * Precision" method Section 6.2 (Method 3), because of portability concerns.
+ * It's unclear if all supported platforms can provide reliable microsocond
+ * precision time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ * Uniqueness of UUIDs generated at the same timestamp across different
+ * backends and/or nodes is guaranteed by using random bits.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t	tms;
+	struct timeval tp;
+	bool		increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t) tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/*
+		 * Time did not advance from the previous generation, we must
+		 * increment counter
+		 */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char) (sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char) (sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char) (sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers. See section "Fixed-Length
+		 * Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t) uuid->data[8] & 0x3f) +
+			(((uint32_t) uuid->data[7]) << 6) +
+			(((uint32_t) uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (tms >> 40);
+	uuid->data[1] = (unsigned char) (tms >> 32);
+	uuid->data[2] = (unsigned char) (tms >> 24);
+	uuid->data[3] = (unsigned char) (tms >> 16);
+	uuid->data[4] = (unsigned char) (tms >> 8);
+	uuid->data[5] = (unsigned char) tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10 or version is not 1,6 or7.
+ */
+Datum
+uuid_extract_timestamp(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	TimestampTz ts;
+	uint64_t	tms;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+
+	if ((uuid->data[6] & 0xf0) == 0x70)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64_t) uuid->data[4]) << 8;
+		tms += ((uint64_t) uuid->data[3]) << 16;
+		tms += ((uint64_t) uuid->data[2]) << 24;
+		tms += ((uint64_t) uuid->data[1]) << 32;
+		tms += ((uint64_t) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x10)
+	{
+		tms = ((uint64_t) uuid->data[0]) << 24;
+		tms += ((uint64_t) uuid->data[1]) << 16;
+		tms += ((uint64_t) uuid->data[2]) << 8;
+		tms += ((uint64_t) uuid->data[3]);
+		tms += ((uint64_t) uuid->data[4]) << 40;
+		tms += ((uint64_t) uuid->data[5]) << 32;
+		tms += (((uint64_t) uuid->data[6]) & 0xf) << 56;
+		tms += ((uint64_t) uuid->data[7]) << 48;
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if ((uuid->data[6] & 0xf0) == 0x60)
+	{
+		tms = ((uint64_t) uuid->data[0]) << 52;
+		tms += ((uint64_t) uuid->data[1]) << 44;
+		tms += ((uint64_t) uuid->data[2]) << 36;
+		tms += ((uint64_t) uuid->data[3]) << 28;
+		tms += ((uint64_t) uuid->data[4]) << 20;
+		tms += ((uint64_t) uuid->data[5]) << 12;
+		tms += (((uint64_t) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64_t) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	PG_RETURN_NULL();
+}
+
+/*
+ * Routine to extract UUID version from variant 0b10
+ * Returns NULL if UUID is not 0b10
+ */
+Datum
+uuid_extract_version(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
+	uint16_t	result;
+
+	if ((uuid->data[8] & 0xc0) != 0x80)
+		PG_RETURN_NULL();
+	result = uuid->data[6] >> 4;
+
+	PG_RETURN_UINT16(result);
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 700f7daf7b..ddac2f51f6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9170,6 +9170,18 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'extract timestamp from UUID version 1, 6 or 7',
+  proname => 'uuid_extract_timestamp', proleakproof => 't',
+  prorettype => 'timestamptz', proargtypes => 'uuid', prosrc => 'uuid_extract_timestamp' },
+{ oid => '9898', descr => 'extract version from RFC 4122 UUID',
+  proname => 'uuid_extract_version', proleakproof => 't',
+  prorettype => 'int2', proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
 # pg_lsn
 { oid => '3229', descr => 'I/O',
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 3a37cb661e..652aeb428e 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -230,9 +230,10 @@ struct pg_itm_in
 	 ((y) < JULIAN_MAXYEAR || \
 	  ((y) == JULIAN_MAXYEAR && ((m) < JULIAN_MAXMONTH))))
 
-/* Julian-date equivalents of Day 0 in Unix and Postgres reckoning */
+/* Julian-date equivalents of Day 0 in Unix, Postgres and Gregorian epochs */
 #define UNIX_EPOCH_JDATE		2440588 /* == date2j(1970, 1, 1) */
 #define POSTGRES_EPOCH_JDATE	2451545 /* == date2j(2000, 1, 1) */
+#define GREGORIAN_EPOCH_JDATE	2299161 /* == date2j(1582,10,15) */
 
 /*
  * Range limits for dates and timestamps.
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 7610b011d6..2b5d67b7f0 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,10 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
+uuid_extract_timestamp(uuid)
+uuid_extract_version(uuid)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8e7f21910d..15dab4e1c2 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,5 +168,70 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+ uuid_extract_version 
+----------------------
+                    5
+(1 row)
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+ uuid_extract_timestamp 
+------------------------
+ 
+(1 row)
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+ ?column? 
+----------
+ t
+(1 row)
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 9a8f437c7d..0d1b041b6c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,5 +85,30 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- support functions for UUID versions and variants
+SELECT uuid_extract_version(uuidv7());
+SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111') IS NULL;
+SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');
+
+-- uuid_extract_timestamp() must refuse to accept non-UUIDv7
+SELECT uuid_extract_timestamp(gen_random_uuid());
+
+-- extract UUID v1, v6 and v7 timestamp
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';
+
 -- clean up
 DROP TABLE guid1, guid2 CASCADE;
-- 
2.37.1 (Apple Git-137.1)

#118

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#117)

Re: UUID v7

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.

Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

I have committed a subset of this for now, namely the additions of
uuid_extract_timestamp() and uuid_extract_version(). These seemed
mature and agreed upon. You can rebase the rest of your patch on top of
that.

I have started a separate discussion to learn about the precision we can
expect from gettimeofday().

#119

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#118)

1 attachment(s)

Re: UUID v7

On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.
Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and uuid_extract_version(). These seemed mature and agreed upon. You can rebase the rest of your patch on top of that.

Great! Thank you! PFA v23 with rebase on HEAD.

I have started a separate discussion to learn about the precision we can expect from gettimeofday().

Even in presence of real microsecond-enabled and portable timer using microseconds does not seem to me an optimal way of utilising UUID bits.

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

Time-based bits contribute to global uniqueness, but certainly they are not that effective as counter bits.

Time-based bits do not provide local sortability guarantees: some UUIDs might get same microseconds or be affected by leap backwards.

I think that microseconds are good only for hardware-specific solutions, not for something that runs on variety of platforms, OSes, devices.

Best regards, Andrey Borodin.

Attachments:

v23-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v23-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From b69c860c9fb793d56493053042e5beceaf069123 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v23] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  38 ++++-
 src/backend/utils/adt/uuid.c             | 206 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |   6 +
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  46 ++++-
 src/test/regress/sql/uuid.sql            |  18 +-
 6 files changed, 308 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b225ccf4f..acb655cfeb 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14127,6 +14127,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14136,12 +14144,36 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
+  </para>
+
+  <para>
+<synopsis>
+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamptz</returnvalue>
+</synopsis>
+   This function extracts a <type>timestamp with time zone</type> from UUID versions 1, 6 and 7.
+   For other versions and variants this function returns NULL. The extracted timestamp
+   does not necessarily equate to the time of UUID generation. How close it is
+   to the actual time depends on the implementation that generated UUID.
+   The uuidv7() function provided by PostgreSQL will normally store the actual time,
+   with some exceptions: prevention of time leaps backwards and counter overflow
+   being carried to time step.
+<synopsis>
+<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function extracts a version bits from UUID of variant described by
+   <ulink url="https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis">IETF standard draft</ulink>
+   (b10xx variant). For other variants this function returns NULL.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 45eb1b2fea..dfa0746fc7 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,11 +13,15 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
+#include "access/xlog.h"
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
-#include "utils/fmgrprotos.h"
+#include "utils/builtins.h"
+#include "utils/datetime.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
 #include "utils/timestamp.h"
@@ -407,6 +411,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -427,7 +437,164 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+static uint32_t sequence_counter;
+static uint64_t previous_timestamp = 0;
+
+/*
+ * Generate UUID version 7.
+ *
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * UUIDv7 Field and Bit Layout:
+ * ----------
+ *  0                   1                   2                   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                           unix_ts_ms                          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |          unix_ts_ms           |  ver  |       rand_a          |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |var|                        rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                            rand_b                             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * unix_ts_ms:
+ *  48 bit big-endian unsigned number of Unix epoch timestamp in milliseconds
+ *  as per Section 6.1. Occupies bits 0 through 47 (octets 0-5).
+ *
+ * ver:
+ *  The 4 bit version field as defined by Section 4.2, set to 0b0111 (7).
+ *  Occupies bits 48 through 51 of octet 6.
+ *
+ * rand_a:
+ *  Most significant 12 bits of 18-bit counter. This counter is designed to
+ *  guarantee additional monotonicity as per Section 6.2 (Method 1). rand_a
+ *  occupies bits 52 through 63 (octets 6-7).
+ *
+ * var:
+ *  The 2 bit variant field as defined by Section 4.1, set to 0b10. Occupies
+ *  bits 64 and 65 of octet 8.
+ *
+ * rand_b:
+ *  Starting 6 bits are least significant 6 bits of a counter. The final 56
+ *  bits filled with pseudo-random data to provide uniqueness as per
+ *  Section 6.9. rand_b Occupies bits 66 through 127 (octets 8-15).
+ * ----------
+ *
+ * "Fixed-Length Dedicated Counter Bits" (Method 1) can be implemented with
+ * arbitrary size of a counter. We choose size 18 to reuse all space of bytes
+ * that are touched by ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Reinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. This ougth to be enough for
+ * most practical purposes. Whenever counter overflow happens, this overflow is
+ * translated to increment of unix_ts_ms. So generation of UUIDs ate rate
+ * higher than 128MHz might lead to using timestamps ahead of time.
+ *
+ * We're not using the "Replace Left-Most Random Bits with Increased Clock
+ * Precision" method Section 6.2 (Method 3), because of portability concerns.
+ * It's unclear if all supported platforms can provide reliable microsocond
+ * precision time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ * Uniqueness of UUIDs generated at the same timestamp across different
+ * backends and/or nodes is guaranteed by using random bits.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64_t	tms;
+	struct timeval tp;
+	bool		increment_counter;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64_t) tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	increment_counter = (tms <= previous_timestamp);
+
+	if (increment_counter)
+	{
+		/*
+		 * Time did not advance from the previous generation, we must
+		 * increment counter
+		 */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char) (sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char) (sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char) (sequence_counter);
+	}
+	else
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bits are initialized as zero for the sole purpose
+		 * of guarding against counter rollovers. See section "Fixed-Length
+		 * Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32_t) uuid->data[8] & 0x3f) +
+			(((uint32_t) uuid->data[7]) << 6) +
+			(((uint32_t) uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (tms >> 40);
+	uuid->data[1] = (unsigned char) (tms >> 32);
+	uuid->data[2] = (unsigned char) (tms >> 24);
+	uuid->data[3] = (unsigned char) (tms >> 16);
+	uuid->data[4] = (unsigned char) (tms >> 8);
+	uuid->data[5] = (unsigned char) tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+#define GREGORIAN_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
 
 /*
  * Extract timestamp from UUID.
@@ -461,7 +628,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64_t) uuid->data[0]) << 52;
+		tms += ((uint64_t) uuid->data[1]) << 44;
+		tms += ((uint64_t) uuid->data[2]) << 36;
+		tms += ((uint64_t) uuid->data[3]) << 28;
+		tms += ((uint64_t) uuid->data[4]) << 20;
+		tms += ((uint64_t) uuid->data[5]) << 12;
+		tms += (((uint64_t) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64_t) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64_t) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64_t) uuid->data[4]) << 8;
+		tms += ((uint64_t) uuid->data[3]) << 16;
+		tms += ((uint64_t) uuid->data[2]) << 24;
+		tms += ((uint64_t) uuid->data[1]) << 32;
+		tms += ((uint64_t) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 177d81a891..30dc2d82af 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9170,6 +9170,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
 { oid => '9897', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9d047b21b8..2b5d67b7f0 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,8 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
 uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 -- restore normal output mode
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..4f010eed67 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,26 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +208,32 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..6fa039800a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,18 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +104,13 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.37.1 (Apple Git-137.1)

#120

postgres@jeltef.nl

almost 2 years ago

In reply to: Andrey M. Borodin (#119)

Re: UUID v7

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends. With
the current counter approach between backends we only have sortability
at the millisecond level.

However, I don't really think it is incredibly important to get the
"perfect" approach to filling in rand_a/rand_b right now. As long as
we don't document what we do, we can choose to change the method
without breaking backwards compatibility. Because either approach
results in valid UUIDv7s.

#121

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Jelte Fennema-Nio (#120)

Re: UUID v7

On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical sortability between backends.

However, I don't really think it is incredibly important to get the
"perfect" approach to filling in rand_a/rand_b right now. As long as
we don't document what we do, we can choose to change the method
without breaking backwards compatibility. Because either approach
results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using extra bits at all (fill them with random numbers) would work for 0.999 cases.

Best regards, Andrey Borodin.

#122

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#121)

Re: UUID v7

I think it's better to leave Andrey's patch as is, and add another function in the future with a customizable UUIDv7 structure for special use cases. The structure description can be in JSON format. See this discussion.

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 09:54:07 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 21 Mar 2024, at 20:21, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

Oh, that’s an interesting practical feature!
Se, essentially counter is a theoretical guaranty of sortability in one backend, while microseconds are practical sortability between backends.

However, I don't really think it is incredibly important to get the
"perfect" approach to filling in rand_a/rand_b right now. As long as
we don't document what we do, we can choose to change the method
without breaking backwards compatibility. Because either approach
results in valid UUIDv7s.

Makes sense to me. I think both methods would be much better than UUIDv4 for practical reasons. And even not using extra bits at all (fill them with random numbers) would work for 0.999 cases.

Best regards, Andrey Borodin.

#123

peter@eisentraut.org

almost 2 years ago

In reply to: Jelte Fennema-Nio (#120)

Re: UUID v7

On 21.03.24 16:21, Jelte Fennema-Nio wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

#124

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Peter Eisentraut (#123)

Re: UUID v7

Why not use a single UUID generator for the database table in this case, similar to autoincrement?

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:

On 21.03.24 16:21, Jelte Fennema-Nio wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

#125

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Sergey Prokhorenko (#124)

Re: UUID v7

BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

|
|
| |
Microservices Pattern: Shared database

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Why not use a single UUID generator for the database table in this case, similar to autoincrement?

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:

On 21.03.24 16:21, Jelte Fennema-Nio wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

#126

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Sergey Prokhorenko (#125)

Re: UUID v7

Another source: Microservices Pattern: Database per service

|
|
|
| | |

|
|
| |
Microservices Pattern: Database per service

A service's database is private to that service
|

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 04:58:59 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

BTW: Each microservice should have its own database to ensure data isolation and independence, enabling better scalability and fault tolerance
Source: Microservices Pattern: Shared database

|
|
| |
Microservices Pattern: Shared database

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 04:42:20 pm GMT+3, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Why not use a single UUID generator for the database table in this case, similar to autoincrement?

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

On Friday, 22 March 2024 at 03:51:20 pm GMT+3, Peter Eisentraut <peter@eisentraut.org> wrote:

On 21.03.24 16:21, Jelte Fennema-Nio wrote:

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Timer-based bits contribute to global sortability. But the real timers we have are not even millisecond adjusted. We can hope for ~few ms variation in one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends.

There is that, and there are also multiple backend workers for one session.

#127

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#119)

Re: UUID v7

On 20.03.24 19:08, Andrey M. Borodin wrote:

On 19 Mar 2024, at 13:55, Peter Eisentraut <peter@eisentraut.org> wrote:

On 16.03.24 18:43, Andrey M. Borodin wrote:

On 15 Mar 2024, at 14:47, Aleksander Alekseev <aleksander@timescale.com> wrote:

+1 to the idea. I doubt that anyone will miss it.

PFA v22.
Changes:
1. Squashed all editorialisation by Jelte
2. Fixed my erroneous comments on using Method 2 (we are using method 1 instead)
3. Remove all traces of uuid_extract_variant()

I have committed a subset of this for now, namely the additions of uuid_extract_timestamp() and uuid_extract_version(). These seemed mature and agreed upon. You can rebase the rest of your patch on top of that.

Great! Thank you! PFA v23 with rebase on HEAD.

I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis
document. People can read that themselves. What I would like to see is
easy to find information what from there we are implementing. Like,

- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when
you overwrite 6 bits with variant and version. Then it's just a 12 bit
counter? Which is the size of the rand_a field, so that kind of makes
sense. But 12 bits is the recommended minimum, and (in this patch) we
don't use sub-millisecond timestamp precision, so we should probably use
more than the minimum?

Also, you are initializing 4 bits (I think?) to zero to guard against
counter rollovers (so it's really just an 8 bit counter?). But nothing
checks against such rollovers, so I don't understand the use of that.

The code code be organized better. In the not-increment_counter case,
you could use two separate pg_strong_random calls: One to initialize
rand_b, starting at &uuid->data[8], and one to initialize the counter.
Then the former could be shared between the two branches, and the code
to assign the sequence_counter to the uuid fields could also be shared.

I would also prefer if the normal case (not-increment_counter) were the
first if branch.

Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and
uuid_extract_version.

- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be
kept inside the uuidv7() function.

#128

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#127)

1 attachment(s)

Re: UUID v7

Sorry for this long reply. I was looking on refactoring around pg_strong_random() and could not decide what to do. Finally, I decided to post at least something.

On 22 Mar 2024, at 19:15, Peter Eisentraut <peter@eisentraut.org> wrote:

I have been studying the uuidv() function.

I find this code extremely hard to follow.

We don't need to copy all that documentation from the RFC 4122bis document. People can read that themselves. What I would like to see is easy to find information what from there we are implementing. Like,

- UUID version 7
- fixed-length dedicated counter
- counter is 18 bits
- 4 bits are initialized as zero

I've removed table taken from RFC.

That's more or less all I would need to know what is going on.

That said, I don't understand why you say it's an 18 bit counter, when you overwrite 6 bits with variant and version. Then it's just a 12 bit counter? Which is the size of the rand_a field, so that kind of makes sense. But 12 bits is the recommended minimum, and (in this patch) we don't use sub-millisecond timestamp precision, so we should probably use more than the minimum?

No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 total. Essentially, we use both partial bytes and one whole byte between.
There was a bug - we used 1 extra byte of random numbers that was not necessary, I think that's what lead you to think that we use 12-bit counter.

Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit counter?). But nothing checks against such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
Rollover is carried into time tick here:
++sequence_counter;
if (sequence_counter > 0x3ffff)
{
/* We only have 18-bit counter */
sequence_counter = 0;
previous_timestamp++;
}

I think we might use 10 bits of microseconds and have 8 bits of a counter. Effect of a counter won't change much. But I'm not sure if this is allowed per RFC.
If time source is coarse-grained it still acts like a random initializer. And when it is precise - time is "natural" source of entropy.

The code code be organized better. In the not-increment_counter case, you could use two separate pg_strong_random calls: One to initialize rand_b, starting at &uuid->data[8], and one to initialize the counter. Then the former could be shared between the two branches, and the code to assign the sequence_counter to the uuid fields could also be shared.

Call to pg_strong_random() is very expensive in builds without ssl (and even with ssl too). If we could ammortize random numbers in small buffers - that would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). Or, perhaps, we can ignore cost of two pg_string_random() calls.

I would also prefer if the normal case (not-increment_counter) were the first if branch.

Done.

Some other notes on your patch:

- Your rebase duplicated the documentation of uuid_extract_timestamp and uuid_extract_version.

- PostgreSQL code uses uint64 etc. instead of uint64_t etc.

- It seems the added includes

#include "access/xlog.h"
#include "utils/builtins.h"
#include "utils/datetime.h"

are not needed.

- The static variables sequence_counter and previous_timestamp could be kept inside the uuidv7() function.

Fixed.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v24-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v24-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 916bbaf8ce0602b5a3639a0207a52a8aea78487a Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v24] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  19 ++-
 src/backend/utils/adt/uuid.c             | 165 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |   6 +
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  46 ++++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 6 files changed, 249 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8ecc02f2b9..763fdab535 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14127,6 +14127,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14136,12 +14144,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 45eb1b2fea..e3b42224e6 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -407,6 +409,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -427,7 +435,127 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Generate UUID version 7.
+ *
+ * Following description is taken from RFC draft and slightly extended to
+ * reflect implementation specific choices.
+ *
+ * "Fixed-Length Dedicated Counter Bits" (Method 1) can be implemented with
+ * arbitrary size of a counter. We choose size 18 to reuse all space of bytes
+ * that are touched by ver and var fields + rand_a bytes between them.
+ * Whenever timestamp unix_ts_ms is moving forward, this counter bits are
+ * reinitialized. Reinilialization always sets most significant bit to 0, other
+ * bits are initialized with random numbers. This gives as approximately 192K
+ * UUIDs within one millisecond without overflow. This ougth to be enough for
+ * most practical purposes. Whenever counter overflow happens, this overflow is
+ * translated to increment of unix_ts_ms. So generation of UUIDs ate rate
+ * higher than 128MHz might lead to using timestamps ahead of time.
+ *
+ * We're not using the "Replace Left-Most Random Bits with Increased Clock
+ * Precision" method Section 6.2 (Method 3), because of portability concerns.
+ * It's unclear if all supported platforms can provide reliable microsocond
+ * precision time.
+ *
+ * All UUID generator state is backend-local. For UUIDs generated in one
+ * backend we guarantee monotonicity. UUIDs generated on different backends
+ * will be mostly monotonic if they are generated at frequences less than 1KHz,
+ * but this monotonicity is not strictly guaranteed. UUIDs generated on
+ * different nodes are mostly monotonic with regards to possible clock drift.
+ * Uniqueness of UUIDs generated at the same timestamp across different
+ * backends and/or nodes is guaranteed by using random bits.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	static uint32 sequence_counter;
+	static uint64 previous_timestamp = 0;
+
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	uint64	tms;
+	struct timeval tp;
+	bool		time_tick_forward;
+
+	gettimeofday(&tp, NULL);
+	tms = ((uint64) tp.tv_sec) * 1000 + (tp.tv_usec) / 1000;
+	/* time from clock is protected from backward leaps */
+	time_tick_forward = (tms > previous_timestamp);
+
+	if (time_tick_forward)
+	{
+		/* fill everything after the timestamp with random bytes */
+		if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/*
+		 * Left-most counter bit is initialized as zero for the sole purpose
+		 * of guarding against counter rollovers. See section "Fixed-Length
+		 * Dedicated Counter Seeding"
+		 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis#monotonicity_counters
+		 */
+		uuid->data[6] = (uuid->data[6] & 0xf7);
+
+		/* read randomly initialized bits of counter */
+		sequence_counter = ((uint32) uuid->data[8] & 0x3f) +
+			(((uint32) uuid->data[7]) << 6) +
+			(((uint32) uuid->data[6] & 0x0f) << 14);
+
+		previous_timestamp = tms;
+	}
+	else
+	{
+		/*
+		 * Time did not advance from the previous generation, we must
+		 * increment counter
+		 */
+		++sequence_counter;
+		if (sequence_counter > 0x3ffff)
+		{
+			/* We only have 18-bit counter */
+			sequence_counter = 0;
+			previous_timestamp++;
+		}
+
+		/* protection from leap backward */
+		tms = previous_timestamp;
+
+		/* fill everything after the timestamp and counter with random bytes */
+		if (!pg_strong_random(&uuid->data[9], UUID_LEN - 9))
+			ereport(ERROR,
+					(errcode(ERRCODE_INTERNAL_ERROR),
+					 errmsg("could not generate random values")));
+
+		/* most significant 4 bits of 18-bit counter */
+		uuid->data[6] = (unsigned char) (sequence_counter >> 14);
+		/* next 8 bits */
+		uuid->data[7] = (unsigned char) (sequence_counter >> 6);
+		/* least significant 6 bits */
+		uuid->data[8] = (unsigned char) (sequence_counter);
+	}
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (tms >> 40);
+	uuid->data[1] = (unsigned char) (tms >> 32);
+	uuid->data[2] = (unsigned char) (tms >> 24);
+	uuid->data[3] = (unsigned char) (tms >> 16);
+	uuid->data[4] = (unsigned char) (tms >> 8);
+	uuid->data[5] = (unsigned char) tms;
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+#define GREGORIAN_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
 
 /*
  * Extract timestamp from UUID.
@@ -461,7 +589,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64) uuid->data[0]) << 52;
+		tms += ((uint64) uuid->data[1]) << 44;
+		tms += ((uint64) uuid->data[2]) << 36;
+		tms += ((uint64) uuid->data[3]) << 28;
+		tms += ((uint64) uuid->data[4]) << 20;
+		tms += ((uint64) uuid->data[5]) << 12;
+		tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0d26e5b422..2d8bbd3137 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9173,6 +9173,12 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
 { oid => '9897', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9d047b21b8..2b5d67b7f0 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -872,6 +872,8 @@ xid8ge(xid8,xid8)
 xid8eq(xid8,xid8)
 xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
+uuidv4()
+uuidv7()
 uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 -- restore normal output mode
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..4f010eed67 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,26 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +208,32 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..6fa039800a 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,18 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +104,13 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 4122bis test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.37.1 (Apple Git-137.1)

#129

peter@eisentraut.org

almost 2 years ago

In reply to: Andrey M. Borodin (#128)

Re: UUID v7

On 26.03.24 18:26, Andrey M. Borodin wrote:

Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit counter?). But nothing checks against such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.

Uh, I guess I don't understand this at all. I tried to dig up some
information about this, but didn't find anything. What exactly is the
mechanism of these "counter rollover guards"? If they don't guard
anything, what are they supposed to accomplish?

#130

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Peter Eisentraut (#129)

Re: UUID v7

On 4 Apr 2024, at 18:45, Peter Eisentraut <peter@eisentraut.org> wrote:

On 26.03.24 18:26, Andrey M. Borodin wrote:

Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit counter?). But nothing checks against such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.

Uh, I guess I don't understand this at all. I tried to dig up some information about this, but didn't find anything. What exactly is the mechanism of these "counter rollover guards"? If they don't guard anything, what are they supposed to accomplish?

My understanding of guard bits is the following: on every UUID generation, when time is advancing, counter bits are initialized with random numbers, except guard bits. Guard bits are always initialized with zeroes.

Let's consider we have a 1-byte counter with 4 guard bits and 4 normal bits.
If we generate some UUIDs at the very same millisecond we might have following counter values:

0C <--- lower nibble is initialized with random 4 bits C.
0D
0E
0F
10
11
12

If we have no these guard bits we might get random numbers that are immifiately at the end of a range of allowed values:

FE <--- first UUID at given millisecond
FF
00 <--- rollover to next millisecond
01

If we have 1 guard bit and 7 normal bits we get at worst 128 values before rollover to next millisecond.
If we have 2 guard bits and 6 normal bits this guaranty is extended to 192.
3/5 will guaranty capacity of 224.
But usefulness of every next guard bits decreases, so I think there is a point in only one.

That's my understanding of guard bits in the counter. Please correct me if I'm wrong.

At this point we can skip the counter\microseconds entirely, just fill everything after unix_ts_ms with randomness. It's still a valid UUIDv7, exhibiting much more data locality than UUIDv4. We can adjust this sortability measures later.

Best regards, Andrey Borodin.

#131

sergeyprokhorenko@yahoo.com.au

almost 2 years ago

In reply to: Andrey M. Borodin (#130)

Re: UUID v7

For every complex problem there is an answer that is clear, simple, and wrong. Since the RFC allows microsecond timestamp granularity, the first thing that comes to everyone's mind is to insert microsecond granularity into UUIDv7. And if the RFC allowed nanosecond timestamp granularity, then they would try to insert nanosecond granularity into UUIDv7.
But I am categorically against abandoning the counter under pressure from the unfounded proposal to replace the counter with microsecond granularity.
1) The RFC specifies millisecond timestamp granularity by default.
2) All advanced UUIDv7 implementations include a counter:• for JavaScript https://www.npmjs.com/package/uuidv7• for Rust https://crates.io/crates/uuid7• for Go (Golang) https://pkg.go.dev/github.com/gofrs/uuid#NewV7• for Python https://github.com/oittaa/uuid6-python
3) The theoretical performance of generating UUIDv7 without loss of monotonicity for microsecond granularity is only 1000 UUIDv7 per millisecond. This is very low and insufficient generation performance! But the actual generation performance is even worse, since the generation demand is unevenly distributed within a millisecond. Therefore, a UUIDv7 will not be generated every microsecond.
For a counter 18 bits long, with the most significant bit initialized to zero and the remaining bits initialized to a random number, the actual performance of generating a UUIDv7 without loss of monotonicity is between 2 to the power of 17 = 131072 UUIDv7 per millisecond (if the random number happens to be all ones) to 2 to the power of 18 = 262144 UUIDv7 per millisecond (if the random number happens to be all zeros). This is more than enough.
4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.

The only reasonable use of microsecond granularity is when writing to a database table in parallel. However, monotonicity in this case can be ensured in another way, namely a single UUIDv7 generator per database table, similar to SERIAL (https://postgrespro.com/docs/postgresql/16/datatype-numeric#DATATYPE-SERIAL) in PostgreSQL.
Best regards,
Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Thursday, 4 April 2024 at 09:12:17 pm GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

...

Best regards, Andrey Borodin.

#132

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Jelte Fennema-Nio (#109)

Re: UUID v7

On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze

Jelte, you seem to be the visionary! I would consider participating in lotteries or betting.
New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state. This means after final approval by authors RFC will be imminently publicised. Most probably, this will happen circa 2 weeks after feature freeze :)

Best regards, Andrey Borodin.

[0]: https://www.rfc-editor.org/auth48/rfc9562

#133

sergeyprokhorenko@yahoo.com.au

over 1 year ago

In reply to: Andrey M. Borodin (#132)

Re: UUID v7

I think that for the sake of such an epoch-making thing as UUIDv7 it would be worth slightly unfreezing this feature freeze.

Best regards,

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

On Saturday, 13 April 2024 at 09:58:29 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 12 Mar 2024, at 20:41, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

if e.g.
the RFC got approved 2 weeks after Postgres its feature freeze

Best regards, Andrey Borodin.

[0]: https://www.rfc-editor.org/auth48/rfc9562

#134

michael@paquier.xyz

over 1 year ago

In reply to: Sergey Prokhorenko (#133)

Re: UUID v7

On Sat, Apr 13, 2024 at 07:07:34PM +0000, Sergey Prokhorenko wrote:

I think that for the sake of such an epoch-making thing as UUIDv7 it
would be worth slightly unfreezing this feature freeze.

A feature freeze is here to freeze things in place. This comes up
every year, and that won't happen.

New UUID is assigned RFC number 9562, it was aproved by RFC editors
and is now in AUTH48 state. This means after final approval by
authors RFC will be imminently publicised. Most probably, this will
happen circa 2 weeks after feature freeze :)

[0] https://www.rfc-editor.org/auth48/rfc9562

Well, that's life. It looks like this is waiting for some final
approval, which may take some more time. I have no idea how long this
usually takes.
--
Michael

#135

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Andrey M. Borodin (#132)

Re: UUID v7

On 13 Apr 2024, at 11:58, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

New UUID is assigned RFC number 9562, it was aproved by RFC editors and is now in AUTH48 state.

RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.

Best regards, Andrey Borodin.

#136

[0]: https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Andrey M. Borodin (#135)

Re: UUID v7

On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Here's the documentation from ClickHouse [0]https://clickhouse.com/docs/en/sql-reference/functions/uuid-functions#generateUUIDv7 for their implementation. It's identical to provided patch in this thread, with few notable exceptions:

1. Counter is 42 bits, not 18. The counter have no guard bits, every bit is initialized with random number on time ticks.
2. By default counter is shared between threads. Alternative function generateUUIDv7ThreadMonotonic() provides thread-local counter.

Thanks!

Best regards, Andrey Borodin.

#137

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Andrey M. Borodin (#135)

Re: UUID v7

On 3 May 2024, at 11:18, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

RFC 9562 is not in AUTH48-Done state, it was approved by authors and editor, and now should be published.

It's RFC now.
https://datatracker.ietf.org/doc/rfc9562/

Best regards, Andrey Borodin.

#138

[0]: /messages/by-id/be0339cc-1ae1-4892-9445-8e6d8995a44d@eisentraut.org

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Andrey M. Borodin (#137)

1 attachment(s)

Re: UUID v7

On 8 May 2024, at 18:37, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

It's RFC now.

PFA version with references to RFC instead of drafts.
In nearby thread [0]/messages/by-id/be0339cc-1ae1-4892-9445-8e6d8995a44d@eisentraut.org we found out that most systems have enough presicion to fill additional 12 bits of sub-millisecond information. So I switched implementation to this method.
We have a portable gettimeofday(), but unfortunately it gives only 10 bits of sub-millisecond information. So I created portable get_real_time_ns() for this purpose: it reads clock_gettime() on non-Windows platforms and GetSystemTimePreciseAsFileTime() on Windows.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v25-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v25-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From aee48c9c3311af01f316e7320f42b744e2f8d0b4 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v25] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  19 ++-
 src/backend/utils/adt/uuid.c             | 161 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |   8 +-
 src/test/regress/expected/opr_sanity.out |   2 +
 src/test/regress/expected/uuid.out       |  46 ++++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 6 files changed, 242 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 785886af714..6202ce21717 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14160,6 +14160,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14169,12 +14177,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 45eb1b2fea9..dcf92bcee66 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -32,6 +34,7 @@ typedef struct
 	hyperLogLogState abbr_card; /* cardinality estimator */
 } uuid_sortsupport_state;
 
+static uint64 get_real_time_ns();
 static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
@@ -407,6 +410,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -427,12 +436,119 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is avaiable on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 unix_ts_ms;
+	uint16 		 incresed_clock_precision;
+
+	ns = get_real_time_ns();
+	if (previous_ns >= ns)
+		ns++;
+	previous_ns = ns;
+	
+	unix_ts_ms = ns / 1000000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	incresed_clock_precision = ((ns % 1000000) * 4096) / 1000000;
+
+	uuid->data[6] = (unsigned char) (incresed_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (incresed_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+#define GREGORIAN_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -442,7 +558,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -461,7 +577,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64) uuid->data[0]) << 52;
+		tms += ((uint64) uuid->data[1]) << 44;
+		tms += ((uint64) uuid->data[2]) << 36;
+		tms += ((uint64) uuid->data[3]) << 28;
+		tms += ((uint64) uuid->data[4]) << 20;
+		tms += ((uint64) uuid->data[5]) << 12;
+		tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -473,7 +622,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -481,7 +630,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf85826..52d0eff3eb1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9207,11 +9207,17 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9d047b21b88..7252915e812 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -874,6 +874,8 @@ xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
 uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
+uuidv4()
+uuidv7()
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed31..7c39a25224a 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,26 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +208,32 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a7..cfae3f8cd1c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,18 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +104,13 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.37.1 (Apple Git-137.1)

#139

sergeyprokhorenko@yahoo.com.au

over 1 year ago

In reply to: Andrey M. Borodin (#138)

Re: UUID v7

Dear Colleagues,

Althoughthe uuidv7(timestamp) function clearly contradicts RFC 9562, but theuuidv7(timestamp_offset) function is fully compliant with RFC 9562 and isabsolutely necessary.
Here is a quote from the RFC 9562to support thisstatement (RFC 9562: Universally Unique IDentifiers (UUIDs)):

|
|
|
| | |

|
|
| |
RFC 9562: Universally Unique IDentifiers (UUIDs)

This specification defines UUIDs (Universally Unique IDentifiers) -- also known as GUIDs (Globally Unique IDenti...
|

"Altering,Fuzzing, or Smearing:

ImplementationsMAY alter the actual timestamp. Some examples include security considerationsaround providing a real-clock value within a UUID to 1) correct inaccurateclocks, 2) handle leap seconds, or 3) obtain a millisecond value by dividing by1024 (or some other value) for performance reasons (instead of dividing anumber of microseconds by 1000). This specification makes no requirement orguarantee about how close the clock value needs to be to the actual time. "

It’s written clumsily, of course, butthe intention of the authors of RFC 9562 is completely clear: the currenttimestamp can be changed by any amount and for any reason, including securityor performance reasons. The wording provides only a few examples, the list ofwhich is certainly not exhaustive.

The motives of the authors of RFC 9562are also clear. The timestamp is needed only to generate monotonicallyincreasing UUIDv7.The timestamp should not be used as a source of data about the time the recordwas created (this is explicitly stated in section 6.12. Opacity). Therefore,the actual timestampcan and should be changed if necessary.

Why then does RFC 9562 contain wording aboutthe need to use "Unix Epoch timestamp"? First, the authors of RFC9562 wanted toget away from using the Gregorian calendar, which required a timestamp that wastoo long. Second, the RFC 9562 prohibits inserting into UUIDv7 a completely arbitrary dateand time value that does not increase with the passage of real time. And thisis correct, since in this case the generated UUIDv7 would not be monotonicallyincreasing. Thirdly, on almost all computing platforms there is a convenientsource of "Unix Epoch timestamp".

Whydoes the uuidv7() function need the optional formal parameter timestamp_offset?This question isbest answered by a quote from https://lu.sagebl.eu/notes/maybe-we-dont-need-uuidv7 :

"Leakinginformation

UUIDv4does not leak information assuming a proper implementation. But, UUIDv7 in factdoes: the timestamp of the server is embeded into the ID. From a business pointof view it discloses information about resource creation time. It may not be aproblem depending on the context. Current RFC draft allows implementation totweak timestamps a little to enforce a strict increasing order between twogenerations and to alleviate some security concerns."

There is a lot of hate on the internetabout "UUIDv7 should not be used because it discloses the date and time the record wascreated." If there was a ban on changing the actual timestamp, this wouldprevent the use of UUIDv7 in mission-critical databases, and would generallylead to a decrease in the popularity of UUIDv7.

The implementation details of timestamp_offsetare, of course, up to the developer. But I would suggest two features:

1. Ifthe result of applyingtimestamp_offsetthe timestamp goes beyond the permissible interval, the timestamp_offset value mustbe reset tozero
2. Thedata type for timestamp_offsetshould bedeveloper-friendly interval type,(https://postgrespro.ru/docs/postgresql/16/datatype-datetime?lang=en#DATATYPE-INTERVAL-INPUT), which allows you to enter the argument value using words microsecond,millisecond, second, minute, hour, day, week, month, year, decade, century,millennium.
Ireally hope that timestamp_offsetwill be used inthe uuidv7() function for PostgreSQL.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

#140

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Sergey Prokhorenko (#139)

1 attachment(s)

Re: UUID v7

On 24 Jul 2024, at 04:09, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Implementations MAY alter the actual timestamp.

Hmm… looks like we slightly misinterpreted words about clock source.
Well, that’s great, let’s get offset back.
PFA version accepting offset interval.
It works like this:
postgres=# select uuidv7(interval '-2 months’);
018fc02f-0996-7136-aeb4-8936b5a516a1

postgres=# select uuid_extract_timestamp(uuidv7(interval '-2 months'));
2024-05-28 22:11:15.71+05

What do you think?

Best regards, Andrey Borodin.

Attachments:

v26-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v26-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From c4e8860b799e43830b3e1eda17ab112b00550365 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v26] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  19 ++-
 src/backend/utils/adt/uuid.c             | 174 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |  10 +-
 src/test/regress/expected/uuid.out       |  46 +++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 6 files changed, 263 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 785886af71..6202ce2171 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14160,6 +14160,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14169,12 +14177,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 45eb1b2fea..470c6fb6b4 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -32,6 +34,7 @@ typedef struct
 	hyperLogLogState abbr_card; /* cardinality estimator */
 } uuid_sortsupport_state;
 
+static uint64 get_real_time_ns();
 static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
@@ -407,6 +410,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -427,12 +436,132 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is avaiable on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 unix_ts_ms;
+	uint16 		 incresed_clock_precision;
+
+	ns = get_real_time_ns();
+	if (previous_ns >= ns)
+		ns++;
+	previous_ns = ns;
+
+	if (PG_NARGS() > 0)
+	{
+		Interval *span;
+		TimestampTz ts = (TimestampTz) (ns / 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		span = PG_GETARG_INTERVAL_P(0);
+		ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													 TimestampTzGetDatum(ts),
+													 IntervalPGetDatum(span)));
+		ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+			* 1000 + ns % 1000;
+	}
+	
+	unix_ts_ms = ns / 1000000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	incresed_clock_precision = ((ns % 1000000) * 4096) / 1000000;
+
+	uuid->data[6] = (unsigned char) (incresed_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (incresed_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[6], UUID_LEN - 6))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+#define GREGORIAN_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -442,7 +571,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -461,7 +590,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64) uuid->data[0]) << 52;
+		tms += ((uint64) uuid->data[1]) << 44;
+		tms += ((uint64) uuid->data[2]) << 36;
+		tms += ((uint64) uuid->data[3]) << 28;
+		tms += ((uint64) uuid->data[4]) << 20;
+		tms += ((uint64) uuid->data[5]) << 12;
+		tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -473,7 +635,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -481,7 +643,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf8582..568f6833c0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9207,11 +9207,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9d047b21b8..7b308d4333 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -874,6 +875,9 @@ xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
 uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..7c39a25224 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,26 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +208,32 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..cfae3f8cd1 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,18 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +104,13 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.3 (Apple Git-146)

#141

x4mmm@yandex-team.ru

over 1 year ago

In reply to: Andrey M. Borodin (#140)

1 attachment(s)

Re: UUID v7

On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

PFA version accepting offset interval.

There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of millisecond.
V27 fixes that.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v27-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v27-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 599e3400278fd7b0a201a55114790efcd83840fb Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v27] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  19 ++-
 src/backend/utils/adt/uuid.c             | 176 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |  10 +-
 src/test/regress/expected/uuid.out       |  47 +++++-
 src/test/regress/sql/uuid.sql            |  19 ++-
 6 files changed, 267 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 785886af71..6202ce2171 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14160,6 +14160,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14169,12 +14177,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 45eb1b2fea..492812fee7 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -32,6 +34,7 @@ typedef struct
 	hyperLogLogState abbr_card; /* cardinality estimator */
 } uuid_sortsupport_state;
 
+static uint64 get_real_time_ns();
 static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
@@ -407,6 +410,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -427,12 +436,134 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is avaiable on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 unix_ts_ms;
+	uint16 		 incresed_clock_precision;
+
+/* minimum amount of ns that guarantees step of incresed_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+	ns = get_real_time_ns();
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	if (PG_NARGS() > 0)
+	{
+		Interval *span;
+		TimestampTz ts = (TimestampTz) (ns / 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		span = PG_GETARG_INTERVAL_P(0);
+		ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													 TimestampTzGetDatum(ts),
+													 IntervalPGetDatum(span)));
+		ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+			* 1000 + ns % 1000;
+	}
+	
+	unix_ts_ms = ns / 1000000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	incresed_clock_precision = ((ns % 1000000) * 4096) / 1000000;
+
+	uuid->data[6] = (unsigned char) (incresed_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (incresed_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+#define GREGORIAN_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -442,7 +573,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -461,7 +592,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64) uuid->data[0]) << 52;
+		tms += ((uint64) uuid->data[1]) << 44;
+		tms += ((uint64) uuid->data[2]) << 36;
+		tms += ((uint64) uuid->data[3]) << 28;
+		tms += ((uint64) uuid->data[4]) << 20;
+		tms += ((uint64) uuid->data[5]) << 12;
+		tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -473,7 +637,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -481,7 +645,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 73d9cf8582..568f6833c0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9207,11 +9207,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 9d047b21b8..7b308d4333 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -874,6 +875,9 @@ xid8ne(xid8,xid8)
 xid8cmp(xid8,xid8)
 uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..1d221a983b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,32 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..dd7fe4c2ef 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,13 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.3 (Apple Git-146)

#142

michael@paquier.xyz

about 1 year ago

In reply to: Andrey M. Borodin (#141)

Re: UUID v7

On Sun, Aug 04, 2024 at 03:50:37PM +0500, Andrey M. Borodin wrote:

There was a bug: when time was not moving on, I was updating used
time by a nanosecond, instead of 1/4096 of millisecond.
V27 fixes that.

+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
[...]
+static uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif

This part of the patch looks structurally wrong to me because we've
already spent some time refactoring the clock APIs into instr_time.h
that deals about cross-platform requirements for monotonic times.
Particularly, on MacOS, we have CLOCK_MONOTONIC_RAW, and your patch
does not use it. So you should avoid calling these routines, and
build something using the interface unified across the board, like
anywhere else. And you know, duplication.

The patch has a couple of typos, some spots:
- avaiable
- incresed_clock_precision
--
Michael

#143

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#141)

Re: UUID v7

On Sun, Aug 4, 2024 at 3:51 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

PFA version accepting offset interval.

There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of millisecond.
V27 fixes that.

Thanks!

I've reviewed the v27 patch and have some comments:

---
in datatype.sgml:

The data type <type>uuid</type> stores Universally Unique Identifiers
(UUID) as defined by <ulink
url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
ISO/IEC 9834-8:2005, and related standards.

In funcs.sgml:
This function extracts the version from a UUID of the variant described by
<ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC
4122</ulink>. For

Maybe these references of RFC4122 need to be updated as well.

---
'git show --check' raises a warning:

src/backend/utils/adt/uuid.c:520: trailing whitespace.
+

---
+
+   if (PG_NARGS() > 0)
+   {
+       Interval *span;
+       TimestampTz ts = (TimestampTz) (ns / 1000) -
+           (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY *
USECS_PER_SEC;
+       span = PG_GETARG_INTERVAL_P(0);
+       ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+                                                    TimestampTzGetDatum(ts),
+                                                    IntervalPGetDatum(span)));
+       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
+           * 1000 + ns % 1000;
+   }

We need to add a comment to describe what/why we're doing here.

---
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"

Need a period at the end of this sentence.

---
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },

Both functions have the same description but work differently. I think
it's better to clarify the description of uuidv7() that takes an
interval.

---
- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)

I think that we need to change these functions so that this check
query doesn't return anything, no?

---
+   if (version == 6)
+   {
+       tms = ((uint64) uuid->data[0]) << 52;
+       tms += ((uint64) uuid->data[1]) << 44;
+       tms += ((uint64) uuid->data[2]) << 36;
+       tms += ((uint64) uuid->data[3]) << 28;
+       tms += ((uint64) uuid->data[4]) << 20;
+       tms += ((uint64) uuid->data[5]) << 12;
+       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+       tms += ((uint64) uuid->data[7]);
+
+       /* convert 100-ns intervals to us, then adjust */
+       ts = (TimestampTz) (tms / 10) -
+           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC;
+
+       PG_RETURN_TIMESTAMPTZ(ts);
+   }

It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#144

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Michael Paquier (#142)

Re: UUID v7

On 16 Oct 2024, at 11:05, Michael Paquier <michael@paquier.xyz> wrote:

This part of the patch looks structurally wrong to me because we've
already spent some time refactoring the clock APIs into instr_time.h
that deals about cross-platform requirements for monotonic times.
Particularly, on MacOS, we have CLOCK_MONOTONIC_RAW, and your patch
does not use it. So you should avoid calling these routines, and
build something using the interface unified across the board, like
anywhere else. And you know, duplication.

Thanks for looking!
Actually, CLOCK_MONOTONIC_RAW on MacOS was exactly a problem: this clocks have nothing to do with astronomic clock. And we must put real UTC time into UUID.
I’d be happy to reuse instr_time.h infrastructure, but it just does not fit for the purpose. It’s optimized to measure time spans.

Best regards, Andrey Borodin.

#145

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#143)

1 attachment(s)

Re: UUID v7

Thanks for the review!

On 18 Oct 2024, at 02:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Aug 4, 2024 at 3:51 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

PFA version accepting offset interval.

There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of millisecond.
V27 fixes that.

Thanks!

I've reviewed the v27 patch and have some comments:

---
in datatype.sgml:

The data type <type>uuid</type> stores Universally Unique Identifiers
(UUID) as defined by <ulink
url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
ISO/IEC 9834-8:2005, and related standards.

In funcs.sgml:
This function extracts the version from a UUID of the variant described by
<ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC
4122</ulink>. For

Maybe these references of RFC4122 need to be updated as well.

Fixed.

---
'git show --check' raises a warning:

Fixed.

src/backend/utils/adt/uuid.c:520: trailing whitespace.
+

---
+
+   if (PG_NARGS() > 0)
+   {
+       Interval *span;
+       TimestampTz ts = (TimestampTz) (ns / 1000) -
+           (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY *
USECS_PER_SEC;
+       span = PG_GETARG_INTERVAL_P(0);
+       ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+                                                    TimestampTzGetDatum(ts),
+                                                    IntervalPGetDatum(span)));
+       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
+           * 1000 + ns % 1000;
+   }

We need to add a comment to describe what/why we're doing here.

Done.

---
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"

Need a period at the end of this sentence.

Fixed.

---
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },

Both functions have the same description but work differently. I think
it's better to clarify the description of uuidv7() that takes an
interval.

I've slightly extended the description... not it's 'generate UUID version 7 with a timestamp shifted on specific interval'. Perhaps, we can come up with something better.

---
- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
I think that we need to change these functions so that this check
query doesn't return anything, no?

We have 4 options:
0. Remove uuidv7(interval). But it brings imporatne functionality to the table: we can avoid contention points while massively insert data.
1. Give different names to uuidv7() and uuidv7(interval).
2. Allow importing pg_node_tree (see v7 of the patch)
3. Change this query. Comment to this query suggest that it checks for exactly this case: same function is declared with different number of arguments.

IMO approach number 3 is best. However, I do not understand why this query check was introduced in the first place. Maybe, there are string arguments why we should not do same-named functions with different number of arguments.

---
+   if (version == 6)
+   {
+       tms = ((uint64) uuid->data[0]) << 52;
+       tms += ((uint64) uuid->data[1]) << 44;
+       tms += ((uint64) uuid->data[2]) << 36;
+       tms += ((uint64) uuid->data[3]) << 28;
+       tms += ((uint64) uuid->data[4]) << 20;
+       tms += ((uint64) uuid->data[5]) << 12;
+       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+       tms += ((uint64) uuid->data[7]);
+
+       /* convert 100-ns intervals to us, then adjust */
+       ts = (TimestampTz) (tms / 10) -
+           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC;
+
+       PG_RETURN_TIMESTAMPTZ(ts);
+   }

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA patch with implementation.

Best regards, Andrey Borodin.

Attachments:

v28-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v28-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From a10890c0fd8f7891579a3a49132bab53ac892212 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v28] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +-
 src/backend/utils/adt/uuid.c             | 239 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  14 +-
 src/test/regress/expected/opr_sanity.out |  11 +-
 src/test/regress/expected/uuid.out       |  63 +++++-
 src/test/regress/sql/uuid.sql            |  26 ++-
 7 files changed, 358 insertions(+), 18 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f1..3e6751d 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7be0324..0483cc7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23..12052bf 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -32,6 +34,7 @@ typedef struct
 	hyperLogLogState abbr_card; /* cardinality estimator */
 } uuid_sortsupport_state;
 
+static uint64 get_real_time_ns();
 static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
@@ -401,6 +404,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,7 +422,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * http://tools.ietf.org/html/rfc9562#section-4.4
 	 */
 	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
 	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
@@ -421,12 +430,195 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+	ns = get_real_time_ns();
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	if (PG_NARGS() > 0)
+	{
+		/*
+		 * We are given a time shift interval as an argument.
+		 * The interval represent days, monthes and years, that are not fixed
+		 * number of nanoseconds. To make correct computations we call
+		 * timestamptz_pl_interval() with corresponding logic. This logic is
+		 * implemented with microsecond precision. So we carry nanoseconds
+		 * between computations.
+		 */
+		Interval *span;
+		/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+		TimestampTz ts = (TimestampTz) (ns / 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		span = PG_GETARG_INTERVAL_P(0);
+		/* Copmute time shift */
+		ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													 TimestampTzGetDatum(ts),
+													 IntervalPGetDatum(span)));
+		/* Convert TimestampTz back and carry nanoseconds. */
+		ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+			* 1000 + ns % 1000;
+	}
+
+	unix_ts_ms = ns / 1000000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % 1000000) * 4096) / 1000000;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
+
+Datum
+uuidv6(PG_FUNCTION_ARGS)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 ts_ms;
+
+	ns = get_real_time_ns(); /* Unix epoch */
+	if (previous_ns + 100 >= ns)
+		ns = previous_ns + 100;
+	previous_ns = ns;
+
+	ts_ms = ns / 100; /* Number of 100ns intervals */
+
+	ts_ms = ts_ms + (UNIX_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC * 10;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (ts_ms >> 52);
+	uuid->data[1] = (unsigned char) (ts_ms >> 44);
+	uuid->data[2] = (unsigned char) (ts_ms >> 36);
+	uuid->data[3] = (unsigned char) (ts_ms >> 28);
+	uuid->data[4] = (unsigned char) (ts_ms >> 20);
+	uuid->data[5] = (unsigned char) (ts_ms >> 12);
+	uuid->data[6] = (unsigned char) (ts_ms >> 8); /* 4 bits will be overwritten */
+	uuid->data[7] = (unsigned char) ts_ms;
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 6" UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 0 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x60;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +628,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +647,40 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 6)
+	{
+		tms = ((uint64) uuid->data[0]) << 52;
+		tms += ((uint64) uuid->data[1]) << 44;
+		tms += ((uint64) uuid->data[2]) << 36;
+		tms += ((uint64) uuid->data[3]) << 28;
+		tms += ((uint64) uuid->data[4]) << 20;
+		tms += ((uint64) uuid->data[5]) << 12;
+		tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+		tms += ((uint64) uuid->data[7]);
+
+		/* convert 100-ns intervals to us, then adjust */
+		ts = (TimestampTz) (tms / 10) -
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +692,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +700,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1ec0d6f..df324b3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9340,11 +9340,23 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },
+{ oid => '9899', descr => 'generate UUID version 6',
+  proname => 'uuidv6', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv6' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd..fe167b5 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,10 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname 
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -878,6 +879,10 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
+uuidv6()
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15..677ee9b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,37 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- test of uuidv6()
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv6());
+INSERT INTO guid1 (guid_field) VALUES (uuidv6());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +219,38 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv6()); --6
+ uuid_extract_version 
+----------------------
+                    6
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d0..c1e8dd6 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,25 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test of uuidv6()
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv6());
+INSERT INTO guid1 (guid_field) VALUES (uuidv6());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +111,14 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv6()); --6
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('1EC9414C-232A-6B00-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v6
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#146

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#145)

Re: UUID v7

On Sat, Oct 26, 2024 at 10:05 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

---
- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  | proname | oid  | proname
+------+---------+------+---------
+ 9896 | uuidv7  | 9897 | uuidv7
+(1 row)
I think that we need to change these functions so that this check
query doesn't return anything, no?
We have 4 options:
0. Remove uuidv7(interval). But it brings imporatne functionality to the table: we can avoid contention points while massively insert data.
1. Give different names to uuidv7() and uuidv7(interval).
2. Allow importing pg_node_tree (see v7 of the patch)
3. Change this query. Comment to this query suggest that it checks for exactly this case: same function is declared with different number of arguments.

IMO approach number 3 is best. However, I do not understand why this query check was introduced in the first place. Maybe, there are string arguments why we should not do same-named functions with different number of arguments.

I think we typically avoid this kind of check failure by assigning
uuidv7() and uuidv7(interval) different C functions that call the
common function. That is, we have pg_proc entries like:

{ oid => '9896', descr => 'generate UUID version 7',
proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
{ oid => '9897', descr => 'generate UUID version 7 with a timestamp
shifted on specific interval',
proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
prorettype => 'uuid', proargtypes => 'interval', prosrc =>
'uuidv7_interval' },

And then have C codes like:

static Datum
generate_uuidv7(FunctionCallInfo fcinfo)
{
static uint64 previous_ns = 0;:
:
PG_RETURN_UUID_P(uuid);
}

Datum
uuidv7(PG_FUNCTION_ARGS)
{
return generate_uuidv7(fcinfo);
}

Datum
uuidv7_interval(PG_FUNCTION_ARGS)
{
return generate_uuidv7(fcinfo);
}

---
+   if (version == 6)
+   {
+       tms = ((uint64) uuid->data[0]) << 52;
+       tms += ((uint64) uuid->data[1]) << 44;
+       tms += ((uint64) uuid->data[2]) << 36;
+       tms += ((uint64) uuid->data[3]) << 28;
+       tms += ((uint64) uuid->data[4]) << 20;
+       tms += ((uint64) uuid->data[5]) << 12;
+       tms += (((uint64) uuid->data[6]) & 0xf) << 8;
+       tms += ((uint64) uuid->data[7]);
+
+       /* convert 100-ns intervals to us, then adjust */
+       ts = (TimestampTz) (tms / 10) -
+           ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC;
+
+       PG_RETURN_TIMESTAMPTZ(ts);
+   }
It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.
RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA patch with implementation.

My point is that we should either support full functionality for
UUIDv6 (such as generation and extraction) or none of them. I'm not
really sure we want UUIDv6 as well, but if we want it, it should be
implemented in a separate patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#147

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#146)

1 attachment(s)

Re: UUID v7

On 31 Oct 2024, at 22:15, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Oct 26, 2024 at 10:05 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I think we typically avoid this kind of check failure by assigning
uuidv7() and uuidv7(interval) different C functions that call the
common function. That is, we have pg_proc entries like:

Done.

It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA patch with implementation.

My point is that we should either support full functionality for
UUIDv6 (such as generation and extraction) or none of them. I'm not
really sure we want UUIDv6 as well, but if we want it, it should be
implemented in a separate patch.

Make sense. I've removed all traces of v6.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v29-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v29-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 7f18793317ca83f84ffd70c83c929a615e100435 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v29] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 193 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 ++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 7 files changed, 274 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f1..3e6751d 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7be0324..0483cc7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23..af9d12c 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -32,6 +34,7 @@ typedef struct
 	hyperLogLogState abbr_card; /* cardinality estimator */
 } uuid_sortsupport_state;
 
+static uint64 get_real_time_ns();
 static void string_to_uuid(const char *source, pg_uuid_t *uuid, Node *escontext);
 static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
@@ -401,6 +404,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,7 +422,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * http://tools.ietf.org/html/rfc9562#section-4.4
 	 */
 	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
 	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
@@ -421,12 +430,167 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(FunctionCallInfo fcinfo)
+{
+	static uint64 previous_ns = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 ns;
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+	ns = get_real_time_ns();
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	if (PG_NARGS() > 0)
+	{
+		/*
+		 * We are given a time shift interval as an argument.
+		 * The interval represent days, monthes and years, that are not fixed
+		 * number of nanoseconds. To make correct computations we call
+		 * timestamptz_pl_interval() with corresponding logic. This logic is
+		 * implemented with microsecond precision. So we carry nanoseconds
+		 * between computations.
+		 */
+		Interval *span;
+		/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+		TimestampTz ts = (TimestampTz) (ns / 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		span = PG_GETARG_INTERVAL_P(0);
+		/* Copmute time shift */
+		ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													 TimestampTzGetDatum(ts),
+													 IntervalPGetDatum(span)));
+		/* Convert TimestampTz back and carry nanoseconds. */
+		ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+			* 1000 + ns % 1000;
+	}
+
+	unix_ts_ms = ns / 1000000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % 1000000) * 4096) / 1000000;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(fcinfo);
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(fcinfo);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +600,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +619,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +646,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +654,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1ec0d6f..3c426ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9340,11 +9340,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd..43e7180 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15..aa6224e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d0..eec7f16 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#148

Stepan Neretin

sndcppg@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#147)

Re: UUID v7

I think we typically avoid this kind of check failure by assigning
uuidv7() and uuidv7(interval) different C functions that call the
common function. That is, we have pg_proc entries like:

Done.

It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with

providing implementation, it's trivial. PFA patch with implementation.

My point is that we should either support full functionality for
UUIDv6 (such as generation and extraction) or none of them. I'm not
really sure we want UUIDv6 as well, but if we want it, it should be
implemented in a separate patch.

Make sense. I've removed all traces of v6.

Hi there,

Firstly, I'd like to discuss the increased_clock_precision variable, which
currently divides the timestamp into milliseconds and nanoseconds. However,
this approach only approximates the extra bits for sub-millisecond
precision, leading to imprecise timestamps in high-frequency UUID
generation.

To address this issue, we could consider using a more accurate method for
calculating the timestamp. For instance, we could utilize a higher
resolution clock or implement a more precise algorithm to ensure accurate
timestamps.

Additionally, it would be beneficial to add validation checks for the
interval argument. These checks could verify that the input interval is
within reasonable bounds and that the calculated timestamp is accurate.
Examples of checks could include verifying if the interval is too small,
too large, or exceeds the maximum possible number of milliseconds and
nanoseconds in a timestamp.

By implementing these changes, we can improve the accuracy and reliability
of UUID generation, making it more suitable for high-frequency usage
scenarios.

What do you think about these suggestions? Let me know your thoughts!

Best Regards, Stepan Neretin!

#149

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#147)

Re: UUID v7

On Thu, Oct 31, 2024 at 11:46 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 31 Oct 2024, at 22:15, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Oct 26, 2024 at 10:05 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

I think we typically avoid this kind of check failure by assigning
uuidv7() and uuidv7(interval) different C functions that call the
common function. That is, we have pg_proc entries like:

Done.

It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
spite of not supporting UUID v6 generation. I think it makes more
sense to support UUID v6 generation as well, if the need for it is
high.

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA patch with implementation.

My point is that we should either support full functionality for
UUIDv6 (such as generation and extraction) or none of them. I'm not
really sure we want UUIDv6 as well, but if we want it, it should be
implemented in a separate patch.

Make sense. I've removed all traces of v6.

Thank you for updating the patch.

I've been studying UUID v7 and have a question about the current (v29)
implementation. IIUC the current implementation uses
nanosecond-precision timestamps for both the first 48 bit space and 12
bits of pseudorandom data space (referred as 'rand_a' space in RFC
9562). IOW, all data except for random, version, and variant parts
consist of a timestamp. The nanosecond-precision timestamp is
generated using clock_gettime() with CLOCK_REALTIME on Linux, which
however could be affected by time adjustment by NTP. Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#150

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#149)

Re: UUID v7

On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns forward on each UUID generation. 60 bits of time are unique and ascending for a given backend.

Thanks!

Best regards, Andrey Borodin.

#151

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#150)

Re: UUID v7

On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns forward on each UUID generation. 60 bits of time are unique and ascending for a given backend.

Thank you for your explanation. I now understand this code guarantees
the monotonicity:

+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+       ns = get_real_time_ns();
+       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+               ns = previous_ns + SUB_MILLISECOND_STEP;
+       previous_ns = ns;

I think that one of the most important parts in UUIDv7 implementation
is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
the monotonicity. The current patch employs method 3 with the
assumption that 12 bits of sub-millisecond information is available on
most of the systems we support. However, as far as I tested, on MacOS,
values returned by clock_gettime(CLOCK_REALTIME) are only microsecond
precision, meaning that we could waste some randomness. Has this point
been considered?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#152

[0]: /messages/by-id/212C2E24-32CF-400E-982E-A446AB21E8CC@yandex-team.ru

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#151)

Re: UUID v7

On 31 Oct 2024, at 23:04, Stepan Neretin <sndcppg@gmail.com> wrote:

Firstly, I'd like to discuss the increased_clock_precision variable, which
currently divides the timestamp into milliseconds and nanoseconds. However,
this approach only approximates the extra bits for sub-millisecond
precision, leading to imprecise timestamps in high-frequency UUID
generation.

No, timestamp is taken in nanoseconds, we keep precision of 1/4096 of ms. If you observe precision loss anywhere let me know.

To address this issue, we could consider using a more accurate method for
calculating the timestamp. For instance, we could utilize a higher
resolution clock or implement a more precise algorithm to ensure accurate
timestamps.

That's what we do.

Additionally, it would be beneficial to add validation checks for the
interval argument. These checks could verify that the input interval is
within reasonable bounds and that the calculated timestamp is accurate.
Examples of checks could include verifying if the interval is too small,
too large, or exceeds the maximum possible number of milliseconds and
nanoseconds in a timestamp.

timestamptz_pl_interval() is already doing this.

What do you think about these suggestions? Let me know your thoughts!

Thanks a lot for reviewing the patch!

On 1 Nov 2024, at 10:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns forward on each UUID generation. 60 bits of time are unique and ascending for a given backend.

Thank you for your explanation. I now understand this code guarantees
the monotonicity:
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+       ns = get_real_time_ns();
+       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+               ns = previous_ns + SUB_MILLISECOND_STEP;
+       previous_ns = ns;
I think that one of the most important parts in UUIDv7 implementation
is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
the monotonicity. The current patch employs method 3 with the
assumption that 12 bits of sub-millisecond information is available on
most of the systems we support. However, as far as I tested, on MacOS,
values returned by clock_gettime(CLOCK_REALTIME) are only microsecond
precision, meaning that we could waste some randomness. Has this point
been considered?

There was a thread "What is a typical precision of gettimeofday()?" [0]/messages/by-id/212C2E24-32CF-400E-982E-A446AB21E8CC@yandex-team.ru
There we found out that routines of instr_time.h are precise enough. On my machine (MacBook Air M3) I do not observe significant differences between CLOCK_MONOTONIC_RAW and CLOCK_REALTIME in pg_test_timing results.

CLOCK_MONOTONIC_RAW
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.30 ns
Histogram of timing durations:
< us % of total count
1 98.47856 193113929
2 1.52039 2981452
4 0.00025 485
8 0.00062 1211
16 0.00012 237
32 0.00004 79
64 0.00002 30
128 0.00000 8
256 0.00000 5
512 0.00000 3
1024 0.00000 1
2048 0.00000 2

CLOCK_REALTIME
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.04 ns
Histogram of timing durations:
< us % of total count
1 98.49709 196477842
2 1.50268 2997479
4 0.00007 130
8 0.00012 238
16 0.00005 91
32 0.00000 4
64 0.00000 1

Thanks!

#153

[1]: /messages/by-id/3110108.1719939353@sss.pgh.pa.us

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#152)

Re: UUID v7

On Fri, Nov 1, 2024 at 10:33 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 31 Oct 2024, at 23:04, Stepan Neretin <sndcppg@gmail.com> wrote:

Firstly, I'd like to discuss the increased_clock_precision variable, which
currently divides the timestamp into milliseconds and nanoseconds. However,
this approach only approximates the extra bits for sub-millisecond
precision, leading to imprecise timestamps in high-frequency UUID
generation.

No, timestamp is taken in nanoseconds, we keep precision of 1/4096 of ms. If you observe precision loss anywhere let me know.

To address this issue, we could consider using a more accurate method for
calculating the timestamp. For instance, we could utilize a higher
resolution clock or implement a more precise algorithm to ensure accurate
timestamps.

That's what we do.

Additionally, it would be beneficial to add validation checks for the
interval argument. These checks could verify that the input interval is
within reasonable bounds and that the calculated timestamp is accurate.
Examples of checks could include verifying if the interval is too small,
too large, or exceeds the maximum possible number of milliseconds and
nanoseconds in a timestamp.

timestamptz_pl_interval() is already doing this.

What do you think about these suggestions? Let me know your thoughts!

Thanks a lot for reviewing the patch!
On 1 Nov 2024, at 10:33, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Oct 31, 2024 at 9:53 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 1 Nov 2024, at 03:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Therefore, if the
system clock moves backward due to NTP, we cannot guarantee
monotonicity and sortability. Is that right?

Not exactly. Monotonicity is ensured for a given backend. We make sure that timestamp is advanced at least for ~250ns forward on each UUID generation. 60 bits of time are unique and ascending for a given backend.

Thank you for your explanation. I now understand this code guarantees
the monotonicity:
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)
+       ns = get_real_time_ns();
+       if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+               ns = previous_ns + SUB_MILLISECOND_STEP;
+       previous_ns = ns;
I think that one of the most important parts in UUIDv7 implementation
is which method (1, 2, or 3 described in RFC 9562) we use to guarantee
the monotonicity. The current patch employs method 3 with the
assumption that 12 bits of sub-millisecond information is available on
most of the systems we support. However, as far as I tested, on MacOS,
values returned by clock_gettime(CLOCK_REALTIME) are only microsecond
precision, meaning that we could waste some randomness. Has this point
been considered?
There was a thread "What is a typical precision of gettimeofday()?" [0]
There we found out that routines of instr_time.h are precise enough. On my machine (MacBook Air M3) I do not observe significant differences between CLOCK_MONOTONIC_RAW and CLOCK_REALTIME in pg_test_timing results.

CLOCK_MONOTONIC_RAW
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.30 ns
Histogram of timing durations:
< us % of total count
1 98.47856 193113929
2 1.52039 2981452
4 0.00025 485
8 0.00062 1211
16 0.00012 237
32 0.00004 79
64 0.00002 30
128 0.00000 8
256 0.00000 5
512 0.00000 3
1024 0.00000 1
2048 0.00000 2

CLOCK_REALTIME
x4mmm@x4mmm-osx bin % ./pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 15.04 ns
Histogram of timing durations:
< us % of total count
1 98.49709 196477842
2 1.50268 2997479
4 0.00007 130
8 0.00012 238
16 0.00005 91
32 0.00000 4
64 0.00000 1

I applied the patch shared on that thread[1]/messages/by-id/3110108.1719939353@sss.pgh.pa.us to measure nanoseconds
and changed instr_time.h to use CLOCK_REALTIME even on macOS. Here is
the results on my machine (macOS 14.7, M1 Pro):

Testing timing overhead for 3 seconds.
Per loop time including overhead: 18.61 ns
Histogram of timing durations:
<= ns % of total running % count
0 98.1433 98.1433 158212921
1 0.0000 98.1433 0
3 0.0000 98.1433 0
7 0.0000 98.1433 0
15 0.0000 98.1433 0
31 0.0000 98.1433 0
63 0.0000 98.1433 0
127 0.0000 98.1433 0
255 0.0000 98.1433 0
511 0.0000 98.1433 0
1023 1.8560 99.9994 2992054
2047 0.0000 99.9994 51
4095 0.0001 99.9995 110
8191 0.0003 99.9998 463
16383 0.0002 100.0000 313
32767 0.0000 100.0000 49
65535 0.0000 100.0000 4

Timing durations less than 128 ns:
ns % of total running % count
0 98.1433 98.1433 158212921

Most of the timing durations were nanoseconds and fell into either 0
ns. Others fell into >1023 bins.

I've done a simple test as well on my Mac and saw that the time
returned by clock_gettime(CLOCK_REALTIME) doesn't have nanosecond
precision:

% cat test.c
#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int
main(void)
{
struct timespec real;
struct timespec mono;
struct timespec mono_raw;

clock_gettime(CLOCK_REALTIME, &real);
clock_gettime(CLOCK_MONOTONIC, &mono);
clock_gettime(CLOCK_MONOTONIC_RAW, &mono_raw);

printf("real: %ld\t%ld\n", real.tv_sec, real.tv_nsec);
printf("mono: %ld\t%ld\n", mono.tv_sec, mono.tv_nsec);
printf("mono_raw: %ld\t%ld\n", mono_raw.tv_sec, mono_raw.tv_nsec);

return 0;
}
% gcc -o test test.c
% ./test
real: 1730495955 515018000
mono: 3212977 834578000
mono_raw: 3212982 962799958
% ./test
real: 1730495956 78927000
mono: 3212978 398488000
mono_raw: 3212983 526718333
% ./test
real: 1730495956 652751000
mono: 3212978 972312000
mono_raw: 3212984 100552333

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#154

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#153)

1 attachment(s)

Re: UUID v7

On 2 Nov 2024, at 02:23, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Most of the timing durations were nanoseconds and fell into either 0
ns. Others fell into >1023 bins.

Indeed. We cannot have these 2 bits from nanoseconds :(
I've tried to come up with some clever tricks, but have not succeeded.
Let's use only microseconds.

Best regards, Andrey Borodin.

Attachments:

v30-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v30-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From d2306902956be14a76e37ae714072e6ac59024ca Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v30] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +++-
 src/backend/utils/adt/uuid.c             | 146 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 ++++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 7 files changed, 227 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 05f630c6a6..8f69f4d3b6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..2f45086237 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -401,6 +403,12 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,7 +421,7 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * http://tools.ietf.org/html/rfc9562#section-4.4
 	 */
 	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
 	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
@@ -421,12 +429,121 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 10 bits in "rand_a" bits to store microseconds.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(FunctionCallInfo fcinfo)
+{
+	static uint64 previous_us = 0;
+
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 us;
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+	struct timeval tv;
+
+	gettimeofday(&tv, NULL);
+
+	us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+	if (previous_us >= us)
+		us = previous_us + 1;
+	previous_us = us;
+
+	if (PG_NARGS() > 0)
+	{
+		/*
+		 * We are given a time shift interval as an argument.
+		 * To make correct computations we call
+		 * timestamptz_pl_interval() with corresponding logic. This logic is
+		 * implemented on TimestampTz, so we have to convert there and back.
+		 */
+		Interval *span;
+		/* Convert time part of UUID to Timestamptz (us since Postgres epoch) */
+		TimestampTz ts = (TimestampTz) (us -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC);
+		span = PG_GETARG_INTERVAL_P(0);
+		/* Copmute shifted time */
+		ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													 TimestampTzGetDatum(ts),
+													 IntervalPGetDatum(span)));
+		/* Convert TimestampTz back and carry nanoseconds. */
+		us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC);
+	}
+
+	unix_ts_ms = us / 1000;
+	/* microsecond fraction (used to fill 10 bits) */
+	increased_clock_precision = us % 1000;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 6);
+	uuid->data[7] = (unsigned char) (increased_clock_precision << 2);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/* Take 2 bits of entropy from overwritten part */
+	uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	/* set version field, top four bits are 0, 1, 1, 1 */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(fcinfo);
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(fcinfo);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +553,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +572,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = uuid->data[5];
+		tms += ((uint64) uuid->data[4]) << 8;
+		tms += ((uint64) uuid->data[3]) << 16;
+		tms += ((uint64) uuid->data[2]) << 24;
+		tms += ((uint64) uuid->data[1]) << 32;
+		tms += ((uint64) uuid->data[0]) << 40;
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +599,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +607,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1ec0d6f6b5..3c426ca532 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9340,11 +9340,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..aa6224e81b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..eec7f160f8 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#155

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Andrey M. Borodin (#154)

Re: UUID v7

On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v30-0001-Implement-UUID-v7.patch>

Some more thoughts on this patch version:

0. Comment mentioning nanoseconds, while we do not need to carry anything
/* Convert TimestampTz back and carry nanoseconds. */

1. There's unnecessary &3 in
uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);

2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.

Best regards, Andrey Borodin.

#156

[1]: /messages/by-id/305478845.5279532.1712440778735@mail.yahoo.com

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#155)

Re: UUID v7

On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v30-0001-Implement-UUID-v7.patch>

Some more thoughts on this patch version:

0. Comment mentioning nanoseconds, while we do not need to carry anything
/* Convert TimestampTz back and carry nanoseconds. */

1. There's unnecessary &3 in
uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);

2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.

While using only 10 bits microseconds makes the implementation simple,
I'm not sure if 10 bits is enough to generate UUIDs at microsecond
granularity without losing monotonicity. Since 10-bit microseconds are
used as is in rand_a space, 1000 UUIDs can be generated per
millisecond without losing monotonicity.

For example, in my environment, it took 1808 milliseconds to generate
1 million UUIDs. This is about 533 UUIDs generated per millisecond. As
UUID generation performance improves, I think 10 bits will not be
enough.

=# select count(uuidv7()) from generate_series(1, 1_000_000);
count
---------
1000000
(1 row)

Time: 1808.734 ms

I found a similar comment from Sergey Prokhorenko[1]/messages/by-id/305478845.5279532.1712440778735@mail.yahoo.com. He also mentioned:

4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.

I feel that it's better to switch to Method 1 or 2 with 12 bits or
larger counter space.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#157

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#156)

Re: UUID v7

On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v30-0001-Implement-UUID-v7.patch>

Some more thoughts on this patch version:

0. Comment mentioning nanoseconds, while we do not need to carry anything
/* Convert TimestampTz back and carry nanoseconds. */

1. There's unnecessary &3 in
uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);

2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.

While using only 10 bits microseconds makes the implementation simple,
I'm not sure if 10 bits is enough to generate UUIDs at microsecond
granularity without losing monotonicity. Since 10-bit microseconds are
used as is in rand_a space, 1000 UUIDs can be generated per
millisecond without losing monotonicity.

We won’t loose monotonicity on one backend. We will just accumulate time shift.
See 
+	us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+	if (previous_us >= us)
+		us = previous_us + 1;

For example, in my environment, it took 1808 milliseconds to generate
1 million UUIDs. This is about 533 UUIDs generated per millisecond.

BTW we can furether improve this performance by buffering CSPRNG. See v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch in this thread.

As
UUID generation performance improves, I think 10 bits will not be
enough.

=# select count(uuidv7()) from generate_series(1, 1_000_000);
count
---------
1000000
(1 row)

Time: 1808.734 ms

I found a similar comment from Sergey Prokhorenko[1]. He also mentioned:

4) Microsecond timestamp fraction subtracts 10 bits from random data, which increases the risk of collision. In the counter, almost all bits are initialized with a random number, which reduces the risk of collision.

I feel that it's better to switch to Method 1 or 2 with 12 bits or
larger counter space.

12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter. See version v24-0001-Implement-UUID-v7.patch
This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.

Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.

Best regards, Andrey Borodin.

#158

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Andrey M. Borodin (#157)

Re: UUID v7

On Thursday 7 November 2024 at 11:34:31 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter. See version v24-0001-Implement-UUID-v7.patch> This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.> Yet, UUIDs generated on parallel workers will loose some sortability.
Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.

Best regards, Andrey Borodin.------------------------------------------------------------------------

In fact, the RFC does allow combining methods 3 and 1:https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-7 5.7. UUID Version 7

"Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:
1. An OPTIONAL sub-millisecond timestamp fraction (12 bits at maximum) as per Section 6.2 (Method 3).2. An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1 or 2).3. Random data for each new UUIDv7 generated for any remaining space."

This clearly refers to a "combination of the following subfields". COMBINATION!!!

However, with the current performance of computers, method 3 is quite sufficient without the addition of method 1.
Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

#159

sawada.mshk@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#158)

Re: UUID v7

On Fri, Nov 8, 2024 at 1:25 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

On Thursday 7 November 2024 at 11:34:31 am GMT+3, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

12 bits does not differ much. We can have much longer counters. Before switching to Method 3 I used 18 bits counter. See version v24-0001-Implement-UUID-v7.patch
This version is more resilent to generating a lot of UUIDs on one backend while still not accumulating time shift.
Yet, UUIDs generated on parallel workers will loose some sortability.

Personally, I think both methods are good. I’d even combine them both. But RFC seems to be not allowing this. BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.

Best regards, Andrey Borodin.

------------------------------------------------------------------------

In fact, the RFC does allow combining methods 3 and 1:
https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-7
5.7. UUID Version 7

"Alternatively, implementations MAY fill the 74 bits, jointly, with a combination of the following subfields, in this order from the most significant bits to the least, to guarantee additional monotonicity within a millisecond:

1. An OPTIONAL sub-millisecond timestamp fraction (12 bits at maximum) as per Section 6.2 (Method 3).
2. An OPTIONAL carefully seeded counter as per Section 6.2 (Method 1 or 2).
3. Random data for each new UUIDv7 generated for any remaining space."

This clearly refers to a "combination of the following subfields". COMBINATION!!!

However, with the current performance of computers, method 3 is quite sufficient without the addition of method 1.

Do you think method 3 is sufficient even with microsecond precision
(i.e. storing only 10 bits microseconds in rand_a space)?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#160

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#157)

Re: UUID v7

On Thu, Nov 7, 2024 at 12:34 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 7 Nov 2024, at 12:42, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Nov 6, 2024 at 10:14 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 5 Nov 2024, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

<v30-0001-Implement-UUID-v7.patch>

Some more thoughts on this patch version:

0. Comment mentioning nanoseconds, while we do not need to carry anything
/* Convert TimestampTz back and carry nanoseconds. */

1. There's unnecessary &3 in
uuid->data[7] = uuid->data[7] | ((uuid->data[8] >> 6) & 3);

2. Currently we store 0..999 microseconds in 10 bits, so values 1000..1023 are unused. We could use them for overflow. That would slightly increase non-overflowing capacity when generating more than million UUIDs per second on one backend. However, given current performance of our CSPRNG I do not think this feature worth code complexity.

While using only 10 bits microseconds makes the implementation simple,
I'm not sure if 10 bits is enough to generate UUIDs at microsecond
granularity without losing monotonicity. Since 10-bit microseconds are
used as is in rand_a space, 1000 UUIDs can be generated per
millisecond without losing monotonicity.
We won’t loose monotonicity on one backend. We will just accumulate time shift.
See
+       us = tv.tv_sec * SECS_PER_DAY * USECS_PER_SEC + tv.tv_usec;
+       if (previous_us >= us)
+               us = previous_us + 1;

IIUC the microsecond part is working also as a counter in a sense. It
seems fine to me but I'm slightly concerned that there is no guidance
of such implementation in RFC 9562.

BTW if we just continue to use nanoseconds patch, zero bits will act exactly as counters.

Yes, but we will lose some randomness on macOS as the nanosecond part
is 0 in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#161

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Masahiko Sawada (#159)

Re: UUID v7

On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that there is no guidance of such implementation in RFC 9562.

In fact, there is guidance of similar implementation in RFC 9562:https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters"Counter Rollover Handling:""Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."

Do you think method 3 is sufficient even with microsecond precision (i.e. storing only 10 bits microseconds in rand_a space)?

The maximum write performance in PostgreSQL is approximately 500 rows per millisecond, but under normal conditions 50 rows per millisecond. This corresponds to a precision of 2 microseconds and 20 microseconds respectively.
Andrey Borodin's implementation of method 3 provides a precision of approximately 0.25 microseconds.
You offer a precision of approximately 0.98 microseconds. This is about twice as good as what is needed for 500 rows per millisecond write performance. But in the near future, this may not be enough for the highest-performance systems.

Sergey Prokhorenkosergeyprokhorenko@yahoo.com.au

#162

sawada.mshk@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#161)

Re: UUID v7

On Sat, Nov 9, 2024 at 9:07 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that there is no guidance of such implementation in RFC 9562.

In fact, there is guidance of similar implementation in RFC 9562:
https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
"Counter Rollover Handling:"
"Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."

Indeed, thank you.

But in the near future, this may not be enough for the highest-performance systems.

Yeah, I'm concerned about this. That time might gradually come. That
being said, as long as rand_a part works also as a counter, it's fine.
Also, 12 bits does not differ much as Andrey Borodin mentioned. I
think in the first version it's better to start with a simple
implementation rather than over-engineering it.

Regarding the implementation, the v30 patch uses only microseconds
precision time even on platforms where nanoseconds precision is
available such as Linux. I think it's better to store the value of
(sub-milliseconds * 4096) into 12-bits of rand_a space instead of
directly storing microseconds into 10 bits space. That way, we can use
nanoseconds precision timestamps where available. On some platforms
such as macOS, the sub-milliseconds precision timestamp is restricted
to microseconds, we can consider it as a kind of special case. If
12-bits of rand_a space is not enough to guarantee monotonically in
the future, it is also possible to improve it by putting a (random)
counter into rand_b.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#163

sawada.mshk@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#162)

Re: UUID v7

On Mon, Nov 11, 2024 at 12:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Nov 9, 2024 at 9:07 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

On Saturday 9 November 2024 at 01:00:15 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

the microsecond part is working also as a counter in a sense. IT seems fine to me but I'm slightly concerned that there is no guidance of such implementation in RFC 9562.

In fact, there is guidance of similar implementation in RFC 9562:
https://datatracker.ietf.org/doc/html/rfc9562#name-monotonicity-and-counters
"Counter Rollover Handling:"
"Alternatively, implementations MAY increment the timestamp ahead of the actual time and reinitialize the counter."

Indeed, thank you.

But in the near future, this may not be enough for the highest-performance systems.

Yeah, I'm concerned about this. That time might gradually come. That
being said, as long as rand_a part works also as a counter, it's fine.
Also, 12 bits does not differ much as Andrey Borodin mentioned. I
think in the first version it's better to start with a simple
implementation rather than over-engineering it.

Regarding the implementation, the v30 patch uses only microseconds
precision time even on platforms where nanoseconds precision is
available such as Linux. I think it's better to store the value of
(sub-milliseconds * 4096) into 12-bits of rand_a space instead of
directly storing microseconds into 10 bits space.

IIUC v29 patch implements UUIDv7 generation in this way. So I've
reviewed v29 patch and here are some review comments:

---
     * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-    * http://tools.ietf.org/html/rfc4122#section-4.4
+    * http://tools.ietf.org/html/rfc9562#section-4.4
     */

The new RFC doesn't have section 4.4.

---
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant 0b10 bits.

I'm concerned that "version and variant 0b10 bits" is not very clear
to readers. I think we can just mention "... except version and
variant bits".

---
+
+#ifndef WIN32
+#include <time.h>
+
+static uint64 get_real_time_ns()
+{
+       struct timespec tmp;
+
+       clock_gettime(CLOCK_REALTIME, &tmp);
+       return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+#include "c.h"
+#include <sysinfoapi.h>
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+/*
+ * FILETIME represents the number of 100-nanosecond intervals since
+ * January 1, 1601 (UTC).
+ */
+#define FILETIME_UNITS_TO_NS UINT64CONST(100)
+
+
+/*
+ * timezone information is stored outside the kernel so tzp isn't used anymore.
+ *
+ * Note: this function is not for Win32 high precision timing purposes. See
+ * elapsed_time().
+ */
+static uint64
+get_real_time_ns()
+{
+       FILETIME        file_time;
+       ULARGE_INTEGER ularge;
+
+       GetSystemTimePreciseAsFileTime(&file_time);
+       ularge.LowPart = file_time.dwLowDateTime;
+       ularge.HighPart = file_time.dwHighDateTime;
+
+       return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+#endif

I think that it's better to implement these functions in instr_time.h
or another file.

---
+/* minimum amount of ns that guarantees step of increased_clock_precision */
+#define SUB_MILLISECOND_STEP (1000000/4096 + 1)

I think we can rewrite it to:

#define NS_PER_MS INT64CONST(1000000)
#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)

Which improves the readability.

Also, I think "#define NS_PER_US INT64CONST(1000)" can also be used in
many places.

---
+       /* set version field, top four bits are 0, 1, 1, 1 */
+       uuid->data[6] = (uuid->data[6] & 0x0f) | 0x70;
+       /* set variant field, top two bits are 1, 0 */
+       uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;

I think we can make an inline function to set both variant and version
so we can use it for generating UUIDv4 and UUIDv7.

--
+               tms = uuid->data[5];
+               tms += ((uint64) uuid->data[4]) << 8;
+               tms += ((uint64) uuid->data[3]) << 16;
+               tms += ((uint64) uuid->data[2]) << 24;
+               tms += ((uint64) uuid->data[1]) << 32;
+               tms += ((uint64) uuid->data[0]) << 40;

How about rewriting these to the following for consistency with UUIDv1 codes?

        tms = uuid->data[5]
        + ((uint64) uuid->data[4] << 8)
        + ((uint64) uuid->data[3] << 16)
        + ((uint64) uuid->data[2] << 24)
        + ((uint64) uuid->data[1] << 32)
        + ((uint64) uuid->data[0] << 40);

---
Thinking about the function structures more, I think we can refactor
generate_uuidv7(), uuidv7() and uuidv7_interval():

- create a function, get_clock_timestamp_ns(), that provides a
nanosecond-precision timestamp
- the returned timestamp is guaranteed to be greater than the
previous returned value.
- this function can be inlined.
- create a function, generate_uuidv7(), that takes a
nanosecond-precision timestamp as a function argument, and generate
UUIDv7 based on it.
- this function can be inlined too.
- uuidv7() gets the timestamp from get_clock_timestamp_ns() and passes
it to generate_uuidv7().
- uuidv7() gets the timestamp from get_clock_timestamp_ns(), adjusts
it based on the given interval, and passes it to generate_uuidv7().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#164

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#163)

1 attachment(s)

Re: UUID v7

On 15 Nov 2024, at 06:44, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

IIUC v29 patch implements UUIDv7 generation in this way. So I've
reviewed v29 patch and here are some review comments:

I believe I've addressed all your comments in v31 with one exception:
get_clock_timestamp_ns() does not return ascending values, because it resides in too generic place. So there's get_real_time_ns_ascending() in uuid.c.

Should we make a note about get_clock_timestamp_ns() returning only microseconds somewhere?

Also, maybe let's steal 2 random bits (from version or variant) and mix it into increased_clock_precision on MacOS?

Thank you!

Best regards, Andrey Borodin.

Attachments:

v31-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v31-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From c4f919d4d15ec362f157736ea7a4bb49a890a437 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v31] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 165 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/include/portability/instr_time.h     |  22 +++
 src/port/win32gettimeofday.c             |  17 +++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 +++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 9 files changed, 283 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 73979f20ff..03161b3f87 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..6c5aee2e31 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,10 +13,13 @@
 
 #include "postgres.h"
 
+#include <sys/time.h>
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
+#include "portability/instr_time.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
@@ -37,6 +40,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline uint64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +406,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Set magic numbers for a UUID variant 3
+ * https://www.rfc-editor.org/rfc/rfc9562
+ */
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,20 +436,129 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 4);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Aquire nanosecond reading and ensure it is ascending (on this backend)
+ */
+static inline uint64 get_real_time_ns_ascending()
+{
+	static uint64 previous_ns = 0;
+	uint64 ns = get_real_time_ns();
+
+	/* minimum amount of ns that guarantees step of UUID increased clock precision */
+#define NS_PER_MS INT64CONST(1000000)
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(uint64 ns)
+{
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(get_real_time_ns_ascending());
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	uint64 ns = get_real_time_ns_ascending();
+	/*
+	 * We are given a time shift interval as an argument.
+	 * The interval represent days, monthes and years, that are not fixed
+	 * number of nanoseconds. To make correct computations we call
+	 * timestamptz_pl_interval() with corresponding logic. This logic is
+	 * implemented with microsecond precision. So we carry nanoseconds
+	 * between computations.
+	 */
+	Interval *span = PG_GETARG_INTERVAL_P(0);
+	/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+	TimestampTz ts = (TimestampTz) (ns / 1000) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													TimestampTzGetDatum(ts),
+													IntervalPGetDatum(span)));
+	/* Convert TimestampTz back and carry nanoseconds. */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* 1000 + ns % 1000;
+   return generate_uuidv7(ns);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +568,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +587,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +614,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +622,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index e66ecf34cd..9acf547b64 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -194,4 +194,26 @@ GetTimerFrequency(void)
 #define INSTR_TIME_GET_MICROSEC(t) \
 	(INSTR_TIME_GET_NANOSEC(t) / NS_PER_US)
 
+#ifndef WIN32
+
+/*
+ * Read real time with high resolution. Trimmed to microseconds on MacOS.
+ */
+static inline uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+#else /* WIN32 */
+
+/*
+ * Function to read real time with all available preciscion.
+ * Prototype-only, implementation in win32gettimeofday.c
+ */
+uint64 get_real_time_ns();
+
+#endif
+
 #endif							/* INSTR_TIME_H */
diff --git a/src/port/win32gettimeofday.c b/src/port/win32gettimeofday.c
index 1e00f7ee14..ec46cc00fd 100644
--- a/src/port/win32gettimeofday.c
+++ b/src/port/win32gettimeofday.c
@@ -41,6 +41,7 @@ static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
  */
 #define FILETIME_UNITS_PER_SEC	10000000L
 #define FILETIME_UNITS_PER_USEC 10
+#define FILETIME_UNITS_TO_NS	100L
 
 
 /*
@@ -73,3 +74,19 @@ gettimeofday(struct timeval *tp, void *tzp)
 
 	return 0;
 }
+
+/*
+ * Function to read real time with all available preciscion.
+ */
+uint64
+get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..aa6224e81b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..eec7f160f8 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#165

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Andrey M. Borodin (#164)

1 attachment(s)

Re: UUID v7

On 17 Nov 2024, at 00:06, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

v31

There was a problem with MingWG build. I've considered all options and decided to include all necessary stuff into instr_time.h. So much fuss for these 2 bits about nanoseconds :)

Best regards, Andrey Borodin.

Attachments:

v32-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v32-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 4b540ae77fd3ee215e0482e520059f36d1ac0c15 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v32] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 163 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/include/portability/instr_time.h     |  41 ++++++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 +++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 8 files changed, 283 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 73979f20ff..03161b3f87 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..d909ae4907 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -17,6 +17,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
+#include "portability/instr_time.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
@@ -37,6 +38,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline uint64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +404,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Set magic numbers for a UUID variant 3
+ * https://www.rfc-editor.org/rfc/rfc9562
+ */
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,20 +434,129 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Aquire nanosecond reading and ensure it is ascending (on this backend)
+ */
+static inline uint64 get_real_time_ns_ascending()
+{
+	static uint64 previous_ns = 0;
+	uint64 ns = get_real_time_ns();
+
+	/* minimum amount of ns that guarantees step of UUID increased clock precision */
+#define NS_PER_MS INT64CONST(1000000)
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(uint64 ns)
+{
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 7);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(get_real_time_ns_ascending());
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	uint64 ns = get_real_time_ns_ascending();
+	/*
+	 * We are given a time shift interval as an argument.
+	 * The interval represent days, monthes and years, that are not fixed
+	 * number of nanoseconds. To make correct computations we call
+	 * timestamptz_pl_interval() with corresponding logic. This logic is
+	 * implemented with microsecond precision. So we carry nanoseconds
+	 * between computations.
+	 */
+	Interval *span = PG_GETARG_INTERVAL_P(0);
+	/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+	TimestampTz ts = (TimestampTz) (ns / 1000) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													TimestampTzGetDatum(ts),
+													IntervalPGetDatum(span)));
+	/* Convert TimestampTz back and carry nanoseconds. */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* 1000 + ns % 1000;
+   return generate_uuidv7(ns);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +566,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +585,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +612,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +620,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index e66ecf34cd..baf5678f8c 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -194,4 +194,45 @@ GetTimerFrequency(void)
 #define INSTR_TIME_GET_MICROSEC(t) \
 	(INSTR_TIME_GET_NANOSEC(t) / NS_PER_US)
 
+#ifdef WIN32
+
+#include <sysinfoapi.h>
+
+#include <sys/time.h>
+
+/* FILETIME of Jan 1 1970 00:00:00, the PostgreSQL epoch */
+static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
+
+#define FILETIME_UNITS_TO_NS	100L
+
+/*
+ * Read real time with high resolution. Trimmed to 100ns.
+ */
+static inline uint64 get_real_time_ns()
+{
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	return (ularge.QuadPart - epoch) * FILETIME_UNITS_TO_NS;
+}
+
+#else /* not WIN32 */
+
+/*
+ * Read real time with high resolution. Trimmed to microseconds on MacOS.
+ */
+static inline uint64 get_real_time_ns()
+{
+	struct timespec tmp;
+
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	return tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+}
+
+#endif
+
 #endif							/* INSTR_TIME_H */
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..aa6224e81b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..eec7f160f8 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#166

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#165)

Re: UUID v7

On Sun, Nov 17, 2024 at 10:39 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 17 Nov 2024, at 00:06, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

v31

There was a problem with MingWG build. I've considered all options and decided to include all necessary stuff into instr_time.h. So much fuss for these 2 bits about nanoseconds :)

I realized that what we do in get_real_time_ns() on Windows is
essentially the same as what we do in gettimeofday(). Probably we can
just do either clock_gettime() with CLOCK_REALTIME on unix-like
systems and gettimeofday() on Windows, and then don't change anything
in instr_time.h? We need to explain why we don't use gettimeofday() on
unix-like systems in get_real_time_ns() function.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#167

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#166)

1 attachment(s)

Re: UUID v7

On 19 Nov 2024, at 02:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I realized that what we do in get_real_time_ns() on Windows is
essentially the same as what we do in gettimeofday(). Probably we can
just do either clock_gettime() with CLOCK_REALTIME on unix-like
systems and gettimeofday() on Windows, and then don't change anything
in instr_time.h? We need to explain why we don't use gettimeofday() on
unix-like systems in get_real_time_ns() function.

Done.

Best regards, Andrey Borodin.

Attachments:

v33-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v33-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 19527327ff6be78e50d275e95beaf3ffd8365542 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v33] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 167 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/include/port/win32_port.h            |  14 ++
 src/port/win32gettimeofday.c             |  22 +++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 +++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 9 files changed, 282 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1c..3e6751d64cc 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 73979f20fff..03161b3f874 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc4..f8b8b590216 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -17,6 +17,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
+#include "portability/instr_time.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
@@ -37,6 +38,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline uint64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +404,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Set magic numbers for a UUID variant 3
+ * https://www.rfc-editor.org/rfc/rfc9562
+ */
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,20 +434,133 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 4);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Aquire nanosecond reading and ensure it is ascending (on this backend)
+ */
+static inline uint64 get_real_time_ns_ascending()
+{
+	static uint64 previous_ns = 0;
+	struct timespec tmp;
+	uint64 ns;
+
+	/* We use some bits of nanosecond precision, so we cannot resort to gettimeofday() */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+
+	/* minimum amount of ns that guarantees step of UUID increased clock precision */
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(uint64 ns)
+{
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(get_real_time_ns_ascending());
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	uint64 ns = get_real_time_ns_ascending();
+	/*
+	 * We are given a time shift interval as an argument.
+	 * The interval represent days, monthes and years, that are not fixed
+	 * number of nanoseconds. To make correct computations we call
+	 * timestamptz_pl_interval() with corresponding logic. This logic is
+	 * implemented with microsecond precision. So we carry nanoseconds
+	 * between computations.
+	 */
+	Interval *span = PG_GETARG_INTERVAL_P(0);
+	/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+	TimestampTz ts = (TimestampTz) (ns / 1000) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													TimestampTzGetDatum(ts),
+													IntervalPGetDatum(span)));
+	/* Convert TimestampTz back and carry nanoseconds. */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* 1000 + ns % 1000;
+   return generate_uuidv7(ns);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +570,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +589,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +616,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +624,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..3353e9d6e36 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 7789e0431aa..d343e6c1875 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -184,6 +184,20 @@
 #ifdef _MSC_VER
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval *tp, void *tzp);
+
+/*
+ * Windows implementation is limited to CLOCK_REALTIME
+ */
+typedef enum {
+	CLOCK_REALTIME
+} clockid_t;
+
+#include <time.h> /* for timespec */
+
+extern int clock_gettime(clockid_t clock_id, struct timespec *tp);
+#else
+/* MinGW */
+#include "pthread_time.h"
 #endif
 
 /* for setitimer in backend/port/win32/timer.c */
diff --git a/src/port/win32gettimeofday.c b/src/port/win32gettimeofday.c
index 1e00f7ee149..93ec3bf731c 100644
--- a/src/port/win32gettimeofday.c
+++ b/src/port/win32gettimeofday.c
@@ -41,6 +41,7 @@ static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
  */
 #define FILETIME_UNITS_PER_SEC	10000000L
 #define FILETIME_UNITS_PER_USEC 10
+#define FILETIME_UNITS_TO_NS	100L
 
 
 /*
@@ -73,3 +74,24 @@ gettimeofday(struct timeval *tp, void *tzp)
 
 	return 0;
 }
+
+/*
+ * This function is ported for UUID purposes.
+ */
+int
+clock_gettime(clockid_t clock_id, struct timespec *tp)
+{
+	Assert(clock_id == CLOCK_REALTIME);
+
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	tp->tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_UNITS_PER_SEC);
+	tp->tv_nsec = (long) (((ularge.QuadPart - epoch) % FILETIME_UNITS_PER_SEC)
+						  * FILETIME_UNITS_TO_NS);
+
+	return 0;
+}
\ No newline at end of file
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d2..43e7180a161 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed31..aa6224e81bb 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a7..eec7f160f81 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.42.0

#168

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Andrey M. Borodin (#167)

2 attachment(s)

Re: UUID v7

On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Done.

Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
What do you think?

Best regards, Andrey Borodin.

Attachments:

v33-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v33-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 19527327ff6be78e50d275e95beaf3ffd8365542 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v33] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 167 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/include/port/win32_port.h            |  14 ++
 src/port/win32gettimeofday.c             |  22 +++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  41 +++++-
 src/test/regress/sql/uuid.sql            |  18 ++-
 9 files changed, 282 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 73979f20ff..03161b3f87 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..f8b8b59021 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -17,6 +17,7 @@
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
+#include "portability/instr_time.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
@@ -37,6 +38,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline uint64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +404,24 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Set magic numbers for a UUID variant 3
+ * https://www.rfc-editor.org/rfc/rfc9562
+ */
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -413,20 +434,133 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 
 	/*
 	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 4);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Aquire nanosecond reading and ensure it is ascending (on this backend)
+ */
+static inline uint64 get_real_time_ns_ascending()
+{
+	static uint64 previous_ns = 0;
+	struct timespec tmp;
+	uint64 ns;
+
+	/* We use some bits of nanosecond precision, so we cannot resort to gettimeofday() */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+
+	/* minimum amount of ns that guarantees step of UUID increased clock precision */
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
+		ns = previous_ns + SUB_MILLISECOND_STEP;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562.
+ *
+ * Monotonicity (regarding generation on given backend) is ensured with method
+ * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
+ * Usage of pg_testtime indicates that such precision is available on most
+ * systems. If timestamp is not advancing between two consecutive UUID
+ * generations, previous timestamp is incremented and used instead of current
+ * timestamp.
+ */
+static Datum
+generate_uuidv7(uint64 ns)
+{
+	pg_uuid_t	*uuid = palloc(UUID_LEN);
+	uint64		 unix_ts_ms;
+	uint16 		 increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;
+
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
+	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Entry point for uuidv7()
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+   return generate_uuidv7(get_real_time_ns_ascending());
+}
+
+/*
+ * Entry point for uuidv7(interval)
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	uint64 ns = get_real_time_ns_ascending();
+	/*
+	 * We are given a time shift interval as an argument.
+	 * The interval represent days, monthes and years, that are not fixed
+	 * number of nanoseconds. To make correct computations we call
+	 * timestamptz_pl_interval() with corresponding logic. This logic is
+	 * implemented with microsecond precision. So we carry nanoseconds
+	 * between computations.
+	 */
+	Interval *span = PG_GETARG_INTERVAL_P(0);
+	/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
+	TimestampTz ts = (TimestampTz) (ns / 1000) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+													TimestampTzGetDatum(ts),
+													IntervalPGetDatum(span)));
+	/* Convert TimestampTz back and carry nanoseconds. */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* 1000 + ns % 1000;
+   return generate_uuidv7(ns);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +570,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +589,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * 1000) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +616,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +624,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 7789e0431a..d343e6c187 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -184,6 +184,20 @@
 #ifdef _MSC_VER
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval *tp, void *tzp);
+
+/*
+ * Windows implementation is limited to CLOCK_REALTIME
+ */
+typedef enum {
+	CLOCK_REALTIME
+} clockid_t;
+
+#include <time.h> /* for timespec */
+
+extern int clock_gettime(clockid_t clock_id, struct timespec *tp);
+#else
+/* MinGW */
+#include "pthread_time.h"
 #endif
 
 /* for setitimer in backend/port/win32/timer.c */
diff --git a/src/port/win32gettimeofday.c b/src/port/win32gettimeofday.c
index 1e00f7ee14..93ec3bf731 100644
--- a/src/port/win32gettimeofday.c
+++ b/src/port/win32gettimeofday.c
@@ -41,6 +41,7 @@ static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
  */
 #define FILETIME_UNITS_PER_SEC	10000000L
 #define FILETIME_UNITS_PER_USEC 10
+#define FILETIME_UNITS_TO_NS	100L
 
 
 /*
@@ -73,3 +74,24 @@ gettimeofday(struct timeval *tp, void *tzp)
 
 	return 0;
 }
+
+/*
+ * This function is ported for UUID purposes.
+ */
+int
+clock_gettime(clockid_t clock_id, struct timespec *tp)
+{
+	Assert(clock_id == CLOCK_REALTIME);
+
+	FILETIME	file_time;
+	ULARGE_INTEGER ularge;
+	GetSystemTimePreciseAsFileTime(&file_time);
+	ularge.LowPart = file_time.dwLowDateTime;
+	ularge.HighPart = file_time.dwHighDateTime;
+
+	tp->tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_UNITS_PER_SEC);
+	tp->tv_nsec = (long) (((ularge.QuadPart - epoch) % FILETIME_UNITS_PER_SEC)
+						  * FILETIME_UNITS_TO_NS);
+
+	return 0;
+}
\ No newline at end of file
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..aa6224e81b 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -168,6 +168,27 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +209,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..eec7f160f8 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -85,6 +85,19 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
 
 -- extract functions
 
@@ -92,9 +105,12 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

v34-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patchapplication/octet-stream; name=v34-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patch; x-unix-mode=0644Download

From 2deed80fbd15edeff04f9c5925b7897715cfe437 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Tue, 19 Nov 2024 22:41:32 +0500
Subject: [PATCH v34 2/2] Mix in 2 bits of entropy into timestampt of UUID on
 MacOS

---
 src/backend/utils/adt/uuid.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index f8b8b59021..b251f2b10d 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -503,6 +503,17 @@ generate_uuidv7(uint64 ns)
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
 
+#if defined(__darwin__)
+	/*
+	 * On MacOS real time is truncted to microseconds. Thus, 2 least
+	 * significant bits of increased_clock_precision are neither random
+	 * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
+	 * are dependent on other time-specific bits, thus they do not contribute
+	 * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
 	/*
 	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
 	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
-- 
2.39.5 (Apple Git-154)

#169

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#168)

1 attachment(s)

Re: UUID v7

On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Done.

Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
What do you think?

Thank you for updating the patch!

I've reviewed the v33 patch and made some changes mostly for cosmetic
things. Please review it to see if we accept these changes.

I have one question about the additional patch:

+#if defined(__darwin__)
+ /*
+ * On MacOS real time is truncted to microseconds. Thus, 2 least
+ * significant bits of increased_clock_precision are neither random
+ * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
+ * are dependent on other time-specific bits, thus they do not contribute
+ * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
+ */
+ uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif

I thought that the whole 12 bits in "rand_a" is actually
time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
I missing something?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

change_v33.patchapplication/octet-stream; name=change_v33.patchDownload

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index f8b8b590216..ac094ac5901 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,17 +13,30 @@
 
 #include "postgres.h"
 
+#include <time.h> /* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
 #include "port/pg_bswap.h"
-#include "portability/instr_time.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096 fractions of
+ * sub-millisecond. This is the minimum amount of nanoseconds that guarantees
+ * step of UUID increased clock precision.
+ */
+#define SUBMS_MINIMAL_STEP ((NS_PER_MS / (1 << 12)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -39,7 +52,7 @@ static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
-static inline uint64 get_real_time_ns_ascending();
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -404,14 +417,13 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
-/*
- * Set magic numbers for a UUID variant 3
- * https://www.rfc-editor.org/rfc/rfc9562
- */
-static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
 {
 	/* set version field, top four bits */
 	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
 	/* set variant field, top two bits are 1, 0 */
 	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
 }
@@ -433,8 +445,8 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and
+	 * variant, see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
 	uuid_set_version(uuid, 4);
 
@@ -442,44 +454,63 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 }
 
 /*
- * Aquire nanosecond reading and ensure it is ascending (on this backend)
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
  */
-static inline uint64 get_real_time_ns_ascending()
+static inline int64
+get_real_time_ns_ascending()
 {
-	static uint64 previous_ns = 0;
+	static int64 previous_ns = 0;
+	int64 ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	WIN32
+	struct timeval tmp;
+
+	gettimeofday(&tp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
 	struct timespec tmp;
-	uint64 ns;
 
-	/* We use some bits of nanosecond precision, so we cannot resort to gettimeofday() */
+	/*
+	 * We don't use gettimeofday() where available, instead use clock_gettime()
+	 * with CLOCK_REALTIME in order to get a high-precision (nanoseconds) real
+	 * timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
 	clock_gettime(CLOCK_REALTIME, &tmp);
-	ns = tmp.tv_sec * 1000000000L + tmp.tv_nsec;
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
 
-	/* minimum amount of ns that guarantees step of UUID increased clock precision */
-#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
-	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
-		ns = previous_ns + SUB_MILLISECOND_STEP;
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP;
 	previous_ns = ns;
 
 	return ns;
 }
 
 /*
- * Generate UUID version 7 per RFC 9562.
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
  *
- * Monotonicity (regarding generation on given backend) is ensured with method
- * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
- * We use 12 bits in "rand_a" bits to store 1/4096 fractions of millisecond.
- * Usage of pg_testtime indicates that such precision is available on most
- * systems. If timestamp is not advancing between two consecutive UUID
- * generations, previous timestamp is incremented and used instead of current
- * timestamp.
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
  */
-static Datum
-generate_uuidv7(uint64 ns)
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
 {
 	pg_uuid_t	*uuid = palloc(UUID_LEN);
-	uint64		 unix_ts_ms;
-	uint16 		 increased_clock_precision;
+	int64		 unix_ts_ms;
+	int32 		 increased_clock_precision;
 
 	unix_ts_ms = ns / NS_PER_MS;
 
@@ -494,6 +525,7 @@ generate_uuidv7(uint64 ns)
 	/* sub-millisecond timestamp fraction (12 bits) */
 	increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;
 
+	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
 	uuid->data[7] = (unsigned char) (increased_clock_precision);
 
@@ -504,51 +536,62 @@ generate_uuidv7(uint64 ns)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
-	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and
+	 * variant, see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
 	 */
 	uuid_set_version(uuid, 7);
 
-	PG_RETURN_UUID_P(uuid);
+	return uuid;
 }
 
 /*
- * Entry point for uuidv7()
+ * Generate UUID version 7 with the current timestamp.
  */
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-   return generate_uuidv7(get_real_time_ns_ascending());
+	pg_uuid_t *uuid = generate_uuidv7(get_real_time_ns_ascending());
+
+	PG_RETURN_UUID_P(uuid);
 }
 
 /*
- * Entry point for uuidv7(interval)
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
  */
 Datum
 uuidv7_interval(PG_FUNCTION_ARGS)
 {
-	uint64 ns = get_real_time_ns_ascending();
+	Interval *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t *uuid;
+	int64 ns = get_real_time_ns_ascending();
+
 	/*
-	 * We are given a time shift interval as an argument.
-	 * The interval represent days, monthes and years, that are not fixed
-	 * number of nanoseconds. To make correct computations we call
-	 * timestamptz_pl_interval() with corresponding logic. This logic is
-	 * implemented with microsecond precision. So we carry nanoseconds
-	 * between computations.
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
 	 */
-	Interval *span = PG_GETARG_INTERVAL_P(0);
-	/* Convert time part of UUID to Timestamptz (ms since Postgres epoch) */
-	TimestampTz ts = (TimestampTz) (ns / 1000) -
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
 		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 	/* Copmute time shift */
 	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
-													TimestampTzGetDatum(ts),
-													IntervalPGetDatum(span)));
-	/* Convert TimestampTz back and carry nanoseconds. */
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
+	 */
 	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* 1000 + ns % 1000;
-   return generate_uuidv7(ns);
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
+
+	PG_RETURN_UUID_P(uuid);
 }
 
 /*
@@ -603,7 +646,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 			+ (((uint64) uuid->data[0]) << 40);
 
 		/* convert ms to us, then adjust */
-		ts = (TimestampTz) (tms * 1000) -
+		ts = (TimestampTz) (tms * NS_PER_US) -
 			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index d343e6c1875..7789e0431aa 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -184,20 +184,6 @@
 #ifdef _MSC_VER
 /* Last parameter not used */
 extern int	gettimeofday(struct timeval *tp, void *tzp);
-
-/*
- * Windows implementation is limited to CLOCK_REALTIME
- */
-typedef enum {
-	CLOCK_REALTIME
-} clockid_t;
-
-#include <time.h> /* for timespec */
-
-extern int clock_gettime(clockid_t clock_id, struct timespec *tp);
-#else
-/* MinGW */
-#include "pthread_time.h"
 #endif
 
 /* for setitimer in backend/port/win32/timer.c */
diff --git a/src/port/win32gettimeofday.c b/src/port/win32gettimeofday.c
index 93ec3bf731c..1e00f7ee149 100644
--- a/src/port/win32gettimeofday.c
+++ b/src/port/win32gettimeofday.c
@@ -41,7 +41,6 @@ static const unsigned __int64 epoch = UINT64CONST(116444736000000000);
  */
 #define FILETIME_UNITS_PER_SEC	10000000L
 #define FILETIME_UNITS_PER_USEC 10
-#define FILETIME_UNITS_TO_NS	100L
 
 
 /*
@@ -74,24 +73,3 @@ gettimeofday(struct timeval *tp, void *tzp)
 
 	return 0;
 }
-
-/*
- * This function is ported for UUID purposes.
- */
-int
-clock_gettime(clockid_t clock_id, struct timespec *tp)
-{
-	Assert(clock_id == CLOCK_REALTIME);
-
-	FILETIME	file_time;
-	ULARGE_INTEGER ularge;
-	GetSystemTimePreciseAsFileTime(&file_time);
-	ularge.LowPart = file_time.dwLowDateTime;
-	ularge.HighPart = file_time.dwHighDateTime;
-
-	tp->tv_sec = (long) ((ularge.QuadPart - epoch) / FILETIME_UNITS_PER_SEC);
-	tp->tv_nsec = (long) (((ularge.QuadPart - epoch) % FILETIME_UNITS_PER_SEC)
-						  * FILETIME_UNITS_TO_NS);
-
-	return 0;
-}
\ No newline at end of file
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index aa6224e81bb..bd83f6b0763 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -189,6 +194,14 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      3
 (1 row)
 
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -247,4 +260,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index eec7f160f81..8e54217a75c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -98,6 +103,9 @@ INSERT INTO guid1 (guid_field) VALUES (uuidv7());
 INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -116,4 +124,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;

#170

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#169)

Re: UUID v7

On 20 Nov 2024, at 00:06, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Done.

Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
What do you think?

Thank you for updating the patch!

I've reviewed the v33 patch and made some changes mostly for cosmetic
things. Please review it to see if we accept these changes.

Your changes look good to me. I particularly like sortability test.
I see that you removed implementation of clock_gettime() for Windows. Well, this makes sense.

I have one question about the additional patch:

+#if defined(__darwin__)
+ /*
+ * On MacOS real time is truncted to microseconds. Thus, 2 least
+ * significant bits of increased_clock_precision are neither random
+ * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
+ * are dependent on other time-specific bits, thus they do not contribute
+ * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
+ */
+ uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif

I thought that the whole 12 bits in "rand_a" is actually
time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
I missing something?

We have 12 bits in increaesd_clock_precission but only 1000 possible values of these bits. 2 least significant bits are defined by other 10 bits.
These bits are not equal to 0, they are changing.
True, these bits are time-dependent in a sense that these bits are be computed from a full timestamp. I wanted to express the fact that timestamp cannot be altered in a way so only these 2 bits are changed.

Best regards, Andrey Borodin.

#171

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#170)

Re: UUID v7

On Tue, Nov 19, 2024 at 7:54 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 20 Nov 2024, at 00:06, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 19, 2024 at 9:45 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 19 Nov 2024, at 14:31, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

Done.

Here's v33 intact + one more patch to add 2 bits of entropy on MacOS (to compensate lack of nanoseconds).
What do you think?

Thank you for updating the patch!

I've reviewed the v33 patch and made some changes mostly for cosmetic
things. Please review it to see if we accept these changes.

Your changes look good to me. I particularly like sortability test.
I see that you removed implementation of clock_gettime() for Windows. Well, this makes sense.
I have one question about the additional patch:
+#if defined(__darwin__)
+ /*
+ * On MacOS real time is truncted to microseconds. Thus, 2 least
+ * significant bits of increased_clock_precision are neither random
+ * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
+ * are dependent on other time-specific bits, thus they do not contribute
+ * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
+ */
+ uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
I thought that the whole 12 bits in "rand_a" is actually
time-dependent since we store 1/4096 fraction of sub-milliseconds. Am
I missing something?
We have 12 bits in increaesd_clock_precission but only 1000 possible values of these bits. 2 least significant bits are defined by other 10 bits.
These bits are not equal to 0, they are changing.
True, these bits are time-dependent in a sense that these bits are be computed from a full timestamp. I wanted to express the fact that timestamp cannot be altered in a way so only these 2 bits are changed.

Understood the idea. But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity? The ensured minimal
timestamp step is 245, ((NS_PER_MS / (1 << 12)) + 1), meaning that if
two UUIDs are generated within a microsecond on macOS, the two
timestamps differ by 245 ns. After calculating the increased clock
precision with these two timestamps, they differ only by 1, which
seems to be problematic to me.

Suppose the two timestamps are:

ns1: 1732142033754429000 (Nov 20, 2024 10:33:53.754429000)
ns2: 1732142033754429245 (Nov 20, 2024 10:33:53.754429245)

Their sub-milliseconds are calculated (by multiplying by 4096) to:

subms1: 1757 (0b011011011101)
subms2: 1758 (0b011011011110)

If we replace the least significant bits '01' of subms1 with random
bits '11' and replace '10' of subms2 with '00', we cannot guarantee
the monotonicity.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#172

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#171)

1 attachment(s)

Re: UUID v7

On 21 Nov 2024, at 02:24, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity?

You are right. We have to take into account this when calculating monotonicity. PFA another version.

Best regards, Andrey Borodin.

Attachments:

v35-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patchapplication/octet-stream; name=v35-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patch; x-unix-mode=0644Download

From 328032b5ec850e7eacc135e00e8d5f63c23d8058 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Tue, 19 Nov 2024 22:41:32 +0500
Subject: [PATCH v35 2/2] Mix in 2 bits of entropy into timestampt of UUID on
 MacOS

---
 src/backend/utils/adt/uuid.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index f8b8b59021..25cbe62803 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -455,7 +455,12 @@ static inline uint64 get_real_time_ns_ascending()
 	ns = tmp.tv_sec * 1000000000L + tmp.tv_nsec;
 
 	/* minimum amount of ns that guarantees step of UUID increased clock precision */
-#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+#if defined(__darwin__) || _MSC_VER
+#define SUB_MILLISECOND_BITS 10
+#else
+#define SUB_MILLISECOND_BITS 12
+#endif
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << SUB_MILLISECOND_BITS)) + 1)
 	if (previous_ns + SUB_MILLISECOND_STEP >= ns)
 		ns = previous_ns + SUB_MILLISECOND_STEP;
 	previous_ns = ns;
@@ -503,6 +508,17 @@ generate_uuidv7(uint64 ns)
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
 
+#if defined(__darwin__) || _MSC_VER
+	/*
+	 * On MacOS real time is truncted to microseconds. Thus, 2 least
+	 * significant bits of increased_clock_precision are neither random
+	 * (CSPRNG), nor time-dependent (in a sense - truly random). These 2 bits
+	 * are dependent on other time-specific bits, thus they do not contribute
+	 * to uniqueness. To make these bit random we mix in two bits from CSPRNG.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
 	/*
 	 * Set magic numbers for a "version 7" (pseudorandom) UUID, see
 	 * https://www.rfc-editor.org/rfc/rfc9562#name-version-field
-- 
2.39.5 (Apple Git-154)

#173

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#172)

Re: UUID v7

On Thu, Nov 21, 2024 at 1:22 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 21 Nov 2024, at 02:24, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity?

You are right. We have to take into account this when calculating monotonicity. PFA another version.

While it works fine, I think we need a comment for this change:

 -#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
 +#if defined(__darwin__) || _MSC_VER
 +#define SUB_MILLISECOND_BITS 10
 +#else
 +#define SUB_MILLISECOND_BITS 12
 +#endif
 +#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << SUB_MILLISECOND_BITS)) + 1)

because the reader might think we should use SUB_MILLISECOND_BITS
here too at a glance:

+       /* sub-millisecond timestamp fraction (12 bits) */
+       increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#174

sawada.mshk@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#173)

1 attachment(s)

Re: UUID v7

On Fri, Nov 22, 2024 at 2:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 21, 2024 at 1:22 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 21 Nov 2024, at 02:24, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

But does replacing the least significant 2 bits
with random 2 bits really not affect monotonicity?

You are right. We have to take into account this when calculating monotonicity. PFA another version.

While it works fine, I think we need a comment for this change:
-#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << 12)) + 1)
+#if defined(__darwin__) || _MSC_VER
+#define SUB_MILLISECOND_BITS 10
+#else
+#define SUB_MILLISECOND_BITS 12
+#endif
+#define SUB_MILLISECOND_STEP ((NS_PER_MS / (1 << SUB_MILLISECOND_BITS)) + 1)
because the reader might think we should use SUB_MILLISECOND_BITS
here too at a glance:
+       /* sub-millisecond timestamp fraction (12 bits) */
+       increased_clock_precision = ((ns % NS_PER_MS) * (1 << 12)) / NS_PER_MS;

I've attached an updated patch that squashed changes I made for v33.
We're still discussing increasing entropy on Windows and macOS, but
the patch seems to be in good shape.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v36-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v36-0001-Implement-UUID-v7.patchDownload

From d93ab6dc33b8623c1ba4aac6faf2f0c1f9cd2473 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v36] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 213 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 314 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1c..3e6751d64cc 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13ccbe7d78c..a1af74b69cc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc4..b137805696f 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,19 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096 fractions of
+ * sub-millisecond. This is the minimum amount of nanoseconds that guarantees
+ * step advancement of sub-millisecond part.
+ */
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +52,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +418,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +446,165 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +614,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +633,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +660,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +668,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..3353e9d6e36 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d2..43e7180a161 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed31..bd83f6b0763 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -168,6 +173,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +222,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -208,4 +260,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a7..8e54217a75c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -85,6 +90,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -92,12 +113,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.5

#175

[0]: https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/
[1]: https://github.com/mariadb/server/blob/main/plugin/type_uuid/sql_type_uuid_v7.h#L32

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#174)

2 attachment(s)

Re: UUID v7

On 23 Nov 2024, at 10:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that squashed changes I made for v33.
We're still discussing increasing entropy on Windows and macOS, but
the patch seems to be in good shape.

+1, thanks!

PFA version with improved comment.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/. They are doing very similar UUID v7 generation as we do [1]https://github.com/mariadb/server/blob/main/plugin/type_uuid/sql_type_uuid_v7.h#L32.

Best regards, Andrey Borodin.

Attachments:

v37-0001-Implement-UUID-v7.patchapplication/octet-stream; name=v37-0001-Implement-UUID-v7.patch; x-unix-mode=0644Download

From 216a6269605f09c363d8e64f0e368588aa5cd4e6 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v37 1/2] Implement UUID v7
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit adds function for UUID generation. Most important function here
is uuidv7() which generates new UUID according to the new standard.
For code readability this commit adds alias uuidv4() to function
gen_random_uuid().

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 ++-
 src/backend/utils/adt/uuid.c             | 213 +++++++++++++++++++++--
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 314 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 73979f20ff..03161b3f87 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+   This function returns a version 7 UUID (UNIX timestamp with 1ms precision +
+   randomly seeded counter + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..b137805696 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,19 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096 fractions of
+ * sub-millisecond. This is the minimum amount of nanoseconds that guarantees
+ * step advancement of sub-millisecond part.
+ */
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +52,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +418,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +446,165 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/* sub-millisecond timestamp fraction (12 bits) */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Copmute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +614,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +633,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +660,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +668,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 6026e15ed3..bd83f6b076 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -168,6 +173,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -188,8 +222,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -208,4 +260,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index c88f6d087a..8e54217a75 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -85,6 +90,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -92,12 +113,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.39.5 (Apple Git-154)

v37-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patchapplication/octet-stream; name=v37-0002-Mix-in-2-bits-of-entropy-into-timestampt-of-UUID.patch; x-unix-mode=0644Download

From 6f16c43c40bda7cf663c56b3efdb7f58ecd367ea Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Sat, 23 Nov 2024 11:15:49 +0300
Subject: [PATCH v37 2/2] Mix in 2 bits of entropy into timestampt of UUID on
 MacOS

---
 src/backend/utils/adt/uuid.c | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index b137805696..ce8cf64908 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -31,12 +31,19 @@
 #define NS_PER_US	INT64CONST(1000)
 
 /*
- * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096 fractions of
- * sub-millisecond. This is the minimum amount of nanoseconds that guarantees
- * step advancement of sub-millisecond part.
+ * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096
+ * fractions of sub-millisecond. On systems that have only 10 bits of sub-ms
+ * precision we still use 1/4096 parts of a millisecond, but fill lower 2 bits
+ * with random numbers. SUBMS_MINIMAL_STEP is the minimum amount of
+ * nanoseconds that guarantees step of UUID increased clock precision.
  */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
 #define SUBMS_BITS	12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
 
 /* sortsupport for uuid */
 typedef struct
@@ -523,7 +530,10 @@ generate_uuidv7(int64 ns)
 	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
 	uuid->data[5] = (unsigned char) unix_ts_ms;
 
-	/* sub-millisecond timestamp fraction (12 bits) */
+	/* 
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
 	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
@@ -536,6 +546,19 @@ generate_uuidv7(int64 ns)
 				(errcode(ERRCODE_INTERNAL_ERROR),
 				 errmsg("could not generate random values")));
 
+#if defined(__darwin__) || defined(WIN32)
+	/*
+	 * On MacOS real time is truncted to microseconds. Thus, 2 least
+	 * significant are dependent on other time-specific bits, thus they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG.
+	 * 
+	 * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+	 * despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
 	/*
 	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
 	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
-- 
2.39.5 (Apple Git-154)

#176

wenhui qiu

qiuwenhuifx@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#175)

Re: UUID v7

HI Andrey M. Borodin
It's not just mariadb, percona also implements the uuid plugin.

https://docs.percona.com/percona-server/8.4/uuid-versions.html#functions-available-in-uuid_vx

Thanks

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年11月23日周六 16:21写道：

Show quoted text

On 23 Nov 2024, at 10:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that squashed changes I made for v33.
We're still discussing increasing entropy on Windows and macOS, but
the patch seems to be in good shape.

+1, thanks!

PFA version with improved comment.

Sergey Prokhorenko just draw my attention to the new release of MariaDB
[0]. They are doing very similar UUID v7 generation as we do [1].

Best regards, Andrey Borodin.

[0]
https://mariadb.com/resources/blog/announcing-mariadb-community-server-11-7-rc-with-vector-search-and-11-6-ga/
[1]
https://github.com/mariadb/server/blob/main/plugin/type_uuid/sql_type_uuid_v7.h#L32

#177

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#175)

Re: UUID v7

On Sat, Nov 23, 2024 at 12:20 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 23 Nov 2024, at 10:58, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached an updated patch that squashed changes I made for v33.
We're still discussing increasing entropy on Windows and macOS, but
the patch seems to be in good shape.

+1, thanks!

PFA version with improved comment.

Thank you for updating the patch!

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":

+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
 #define SUBMS_BITS     12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)

on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:

+#if defined(__darwin__) || defined(WIN32)
+       /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif

Is there a reason for using different macros?

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#178

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#177)

Re: UUID v7

On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":

+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
#define SUBMS_BITS     12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)

on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:

+#if defined(__darwin__) || defined(WIN32)
+       /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif

Is there a reason for using different macros?

No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.

<tldr>
In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
So on Saturday I squashed v33+change_v33.patch, and composed a message that I think we still should switch to _MSC_VER. And just before sending I received your message with v36 where you used _MSC_VER :)

I think this way:
_MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
MinGW - we use clock_gettime() and 12 bits.
Darwin - we use clock_gettime() and 10 bits.
Anything else - clock_gettime() and 12 bits.
</tldr>

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

I think it's a good idea if we will be kind of SQL-compatible.

Best regards, Andrey Borodin.

#179

[1]: https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-libraries/winpthreads/src/clock.c#L119

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#178)

1 attachment(s)

Re: UUID v7

On Mon, Nov 25, 2024 at 10:15 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
#define SUBMS_BITS     12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)
on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
+#if defined(__darwin__) || defined(WIN32)
+       /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
Is there a reason for using different macros?
No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.

<tldr>
In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
So on Saturday I squashed v33+change_v33.patch, and composed a message that I think we still should switch to _MSC_VER. And just before sending I received your message with v36 where you used _MSC_VER :)

I think this way:
_MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
MinGW - we use clock_gettime() and 12 bits.
Darwin - we use clock_gettime() and 10 bits.
Anything else - clock_gettime() and 12 bits.
</tldr>

Thank you for the summary.

On MinGW, IIUC we can get 100-ns precision timestamps[1]https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-libraries/winpthreads/src/clock.c#L119, so using 12
bits for calculating the minimal step would make sense.

Also, if we implement the Windows port of clock_gettime() in the
future, we can remove the part of using gettimeofday() in
get_real_time_ns_ascending(). It seems to me that it's
over-engineering to implement that part only for the UUID v7. So the
current implementation of get_real_time_ns_ascending() makes sense to
me.

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

I think it's a good idea if we will be kind of SQL-compatible.

Okay, let"s rename it.

I've merged patches and renamed functions (also updated the commit
message). Please find the attachment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v38-0001-Add-UUID-version-7-generation-function.patchapplication/octet-stream; name=v38-0001-Add-UUID-version-7-generation-function.patchDownload

From c589d9d5ea6ff5d814d2ad4813b8d7cc933bd0e2 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v38] Add UUID version 7 generation function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements the uuid_v7() function for generating UUID
version 7, as defined in RFC 9652. UUID v7 comprises a Unix timestamp
in milliseconds and random bits, providing uniqueness and
sortability.

In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, referred to as "rand_a" in
the RFC. This ensures additional monotonicity within a millisecond.

Additionally, an alias uuid_v4() is added for the existing
gen_random_uuid() function to maintain consistency.

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +-
 src/backend/utils/adt/uuid.c             | 237 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 338 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1c..3e6751d64cc 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13ccbe7d78c..ab1d42ae759 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuid_v4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuid_v7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv_4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv_7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+    This function returns a version 7 UUID (UNIX timestamp with millisecond
+    precision + sub-millisecond + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc4..4c907fa80aa 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,26 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * In UUID version 7, we use 12 bits in "rand_a" to store 1/4096
+ * fractions of sub-millisecond. On systems that have only 10 bits of sub-ms
+ * precision we still use 1/4096 parts of a millisecond, but fill lower 2 bits
+ * with random numbers. SUBMS_MINIMAL_STEP is the minimum amount of
+ * nanoseconds that guarantees step of UUID increased clock precision.
+ */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +59,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +425,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +453,182 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 4);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuid_v7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/*
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+#if defined(__darwin__) || defined(_MSC_VER)
+
+	/*
+	 * On MacOS real time is truncated to microseconds. Thus, 2 least
+	 * significant are dependent on other time-specific bits, thus they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG.
+	 *
+	 * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+	 * despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuid_v7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuid_v7(get_real_time_ns_ascending());
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Similar to uuid_v7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuid_v7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Compute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
+	 */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuid_v7(ns);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +638,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +657,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +684,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +692,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..9210edfec81 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuid_v4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuid_v7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuid_v7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuid_v7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d2..2341907a0b0 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuid_v4()
+uuid_v7()
+uuid_v7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8f4ef0d7a6a..b24e85d4475 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -199,6 +204,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuid_v4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuid_v4());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuid_v7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -219,8 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuid_v4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuid_v7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -239,4 +291,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 75ee966ded0..03a5a08be4d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -97,6 +102,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuid_v4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuid_v4());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7());
+INSERT INTO guid1 (guid_field) VALUES (uuid_v7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuid_v7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -104,12 +125,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuid_v4()); --4
+SELECT uuid_extract_version(uuid_v7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.5

#180

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#179)

Re: UUID v7

On 26 Nov 2024, at 01:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've merged patches and renamed functions (also updated the commit
message). Please find the attachment.

This comment
* On MacOS real time is truncated to microseconds.
should also note that on Windows we use ported version of gettimeofday(). Interface of this functions limits us with only 10 bits just like MacOS.

Besides this patch looks good to me. Thanks!

Best regards, Andrey Borodin.

#181

Japin Li

japinli@hotmail.com

about 1 year ago

In reply to: Masahiko Sawada (#179)

Re: UUID v7

On Mon, 25 Nov 2024 at 12:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 25, 2024 at 10:15 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
#define SUBMS_BITS     12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)
on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
+#if defined(__darwin__) || defined(WIN32)
+       /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+       uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
Is there a reason for using different macros?
No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.

<tldr>
In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
So on Saturday I squashed v33+change_v33.patch, and composed a
message that I think we still should switch to _MSC_VER. And just
before sending I received your message with v36 where you used
_MSC_VER :)

I think this way:
_MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
MinGW - we use clock_gettime() and 12 bits.
Darwin - we use clock_gettime() and 10 bits.
Anything else - clock_gettime() and 12 bits.
</tldr>
Thank you for the summary.

On MinGW, IIUC we can get 100-ns precision timestamps[1], so using 12
bits for calculating the minimal step would make sense.

Also, if we implement the Windows port of clock_gettime() in the
future, we can remove the part of using gettimeofday() in
get_real_time_ns_ascending(). It seems to me that it's
over-engineering to implement that part only for the UUID v7. So the
current implementation of get_real_time_ns_ascending() makes sense to
me.

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

I think it's a good idea if we will be kind of SQL-compatible.

Okay, let"s rename it.

I've merged patches and renamed functions (also updated the commit
message). Please find the attachment.

It seems a typo about uuid_v{4,7}.

+<function>uuidv_4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv_7</function> () <returnvalue>uuid</returnvalue>

--
Regrads,
Japin Li

#182

[1]: https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-libraries/winpthreads/src/clock.c#L119

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Masahiko Sawada (#179)

Re: UUID v7

Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.
It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Monday 25 November 2024 at 11:12:35 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Nov 25, 2024 at 10:15 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 25 Nov 2024, at 22:53, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

In the following code, we use "defined(__darwin__) || defined(_MSC_VER)":
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
#define SUBMS_BITS    12
-#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_BITS)) + 1)
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 <<
SUBMS_MINIMAL_STEP_BITS)) + 1)
on the other hand, we use "defined(__darwin__) || defined(WIN32)" here:
+#if defined(__darwin__) || defined(WIN32)
+      /*
+        * On MacOS real time is truncted to microseconds. Thus, 2 least
+        * significant are dependent on other time-specific bits, thus
they do not
+        * contribute to uniqueness. To make these bit random we mix in two bits
+        * from CSPRNG.
+        *
+        * SUBMS_MINIMAL_STEP is chosen so that we still guarantee monotonicity
+        * despite altering these bits.
+        */
+      uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
Is there a reason for using different macros?
No, that's an oversight. We should mix these 2 bits if an only if SUBMS_MINIMAL_STEP_BITS=10.

<tldr>
In your review change_v33.patch you used WIN32, but it did not actually compile on Windows.
So on Saturday I squashed v33+change_v33.patch, and composed a message that I think we still should switch to _MSC_VER. And just before sending I received your message with v36 where you used _MSC_VER :)

I think this way:
_MSC_VER - native Windows without clock_gettime, we used gettimeofday() and 10 bits of sub-ms.
MinGW - we use clock_gettime() and 12 bits.
Darwin - we use clock_gettime() and 10 bits.
Anything else - clock_gettime() and 12 bits.
</tldr>

Thank you for the summary.

On MinGW, IIUC we can get 100-ns precision timestamps[1] https://github.com/Alexpux/mingw-w64/blob/d0d7f784833bbb0b2d279310ddc6afb52fe47a46/mingw-w64-libraries/winpthreads/src/clock.c#L119, so using 12
bits for calculating the minimal step would make sense.

In get_real_time_ns_ascending(), we use _MSC_VER so we use
clock_gettime() on MinGW.

Sergey Prokhorenko just draw my attention to the new release of MariaDB [0]. They are doing very similar UUID v7 generation as we do [1].

Thank you for the references. It made me think that we can use the
function name uuid_v7() rather than uuidv7().

I think it's a good idea if we will be kind of SQL-compatible.

Okay, let"s rename it.

I've merged patches and renamed functions (also updated the commit
message). Please find the attachment.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#183

sawada.mshk@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#182)

Re: UUID v7

On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.

Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#184

przemyslaw@sztoch.pl

about 1 year ago

In reply to: Masahiko Sawada (#183)

Re: UUID v7

A lot of people use https://www.postgresql.org/docs/current/uuid-ossp.html.

|And `uuid_generate_v7()` will be the continuation...|

|From my point of view, absorbing uuid_generate_v5 into mainline would
be a great move too.
|

On 26.11.2024 20:30, Masahiko Sawada wrote:

On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.

Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#185

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Przemysław Sztoch (#184)

Re: UUID v7

gen_uuidv7() is OK
uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Tuesday 26 November 2024 at 10:35:20 pm GMT+3, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:

A lot of people use https://www.postgresql.org/docs/current/uuid-ossp.html.

And `uuid_generate_v7()` will be the continuation...

From my point of view, absorbing uuid_generate_v5 into mainline would be a great move too.

On 26.11.2024 20:30, Masahiko Sawada wrote:

On Tue, Nov 26, 2024 at 11:11 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Changing the name uuidv7() to uuid_v7() is a bad idea because the RFC 9562 uses the term UUIDv7, and therefore code containing uuid_v7() will not be found by searching the web in most cases.

It makes much more sense to rename it to get_uuidv7(), so that a query for "uuidv7" does not return a bunch of other unnecessary functions related to UUIDv7.

Thank you for pointing it out. How about gen_uuidv7() and gen_uuidv4()
as we already have gen_random_uuid()?

Regards,

--
Przemysław Sztoch | Mobile +48 509 99 00 66

#186

[1]: /messages/by-id/CAGECzQQ=38bVUR=LZ6vmBCEjaDfOOoQa+ygFJ1mCG_H2jsC90Q@mail.gmail.com
[2]: /messages/by-id/CAGECzQS=EjfLxdX89N95tHFGXS4m1aj2V_+xrJppBohgaKQhtQ@mail.gmail.com

postgres@jeltef.nl

about 1 year ago

In reply to: Sergey Prokhorenko (#185)

Re: UUID v7

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1]/messages/by-id/CAGECzQQ=38bVUR=LZ6vmBCEjaDfOOoQa+ygFJ1mCG_H2jsC90Q@mail.gmail.com[2]/messages/by-id/CAGECzQS=EjfLxdX89N95tHFGXS4m1aj2V_+xrJppBohgaKQhtQ@mail.gmail.com.

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

#187

sawada.mshk@gmail.com

about 1 year ago

In reply to: Jelte Fennema-Nio (#186)

1 attachment(s)

Re: UUID v7

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v39-0001-Add-UUID-version-7-generation-function.patchapplication/octet-stream; name=v39-0001-Add-UUID-version-7-generation-function.patchDownload

From 7d5b7c2740b242679bc6d7cc90c43a1cde440f64 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v39] Add UUID version 7 generation function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements the uuidv7() SQL function for generating UUID
version 7, as defined in RFC 9652. UUID v7 comprises a Unix timestamp
in milliseconds and random bits, providing uniqueness and
sortability.

In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, referred to as "rand_a" in
the RFC. This ensures additional monotonicity within a millisecond.

Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.

Bump catalog version.

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +-
 src/backend/utils/adt/uuid.c             | 243 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 344 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1c..3e6751d64cc 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13ccbe7d78c..848ae564540 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14213,6 +14213,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14222,12 +14230,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+    This function returns a version 7 UUID (UNIX timestamp with millisecond
+    precision + sub-millisecond timestamp + random).
   </para>
 
   <para>
@@ -14251,7 +14264,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc4..bb890e9f60d 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,34 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
+ * sub-millisecond. While most Unix-like platforms provide nanosecond-precision
+ * timestamps, some systems only offer microsecond precision, limiting us to 10
+ * bits of sub-millisecond information. For example, on macOS, real time is
+ * truncated to microseconds. Additionally, MSVC uses the ported version of
+ * gettimeofday() that returns microsecond precision.
+ *
+ * On systems with only 10 bits of sub-millisecond precision, we still use
+ * 1/4096 parts of a millisecond, but fill lower 2 bits with random numbers
+ * (see generate_uuidv7() for details).
+ *
+ * SUBMS_MINIMAL_STEP defines the minimum number of nanoseconds that guarantees
+ * an increase in the UUID's clock precision.
+ */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +67,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +433,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +461,180 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/*
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+#if SUBMS_MINIMAL_STEP_BITS == 10
+
+	/*
+	 * On systems that have only 10 bits of sub-ms precision,  2 least
+	 * significant are dependent on other time-specific bits, and they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG. SUBMS_MINIMAL_STEP is chosen so that we still guarantee
+	 * monotonicity despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Compute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
+	 */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +644,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +663,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +690,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +698,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd382..3353e9d6e36 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d2..43e7180a161 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8f4ef0d7a6a..0059a8c7168 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -199,6 +204,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -219,8 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -239,4 +291,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 75ee966ded0..6eb8efbd3d3 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -97,6 +102,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -104,12 +125,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.5

#188

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#187)

Re: UUID v7

On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I think uuidv7() is kind of consensual.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

Good, I think now it's a bit easier to understand those 2 bits.

Thanks!

Best regards, Andrey Borodin.

#189

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey Borodin (#188)

Re: UUID v7

On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I think uuidv7() is kind of consensual.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

Good, I think now it's a bit easier to understand those 2 bits.

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#190

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Masahiko Sawada (#189)

Re: UUID v7

It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.

I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Wednesday 27 November 2024 at 09:24:40 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I think uuidv7() is kind of consensual.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

Good, I think now it's a bit easier to understand those 2 bits.

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#191

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Sergey Prokhorenko (#190)

Отв.: Re: UUID v7

I forgot to mention the incremental download

Отправлено из Yahoo Почты на iPhone

Пользователь четверг, ноября 28, 2024, 2:07 AM написал Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au>:

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Wednesday 27 November 2024 at 09:24:40 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 7:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 27 Nov 2024, at 04:11, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I think uuidv7() is kind of consensual.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

Good, I think now it's a bit easier to understand those 2 bits.

Thanks.

I'm going to push the v39 patch (after self review again), barring any
objections and further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#192

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Sergey Prokhorenko (#190)

Re: UUID v7

On 28 Nov 2024, at 04:07, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.

I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.

Publishing benchmarks seems to be far beyond what our documentation go for. Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from generate_series(1,3e7);
INSERT 0 30000000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from generate_series(1,3e7);
INSERT 0 30000000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)

Best regards, Andrey Borodin.

#193

peter@eisentraut.org

about 1 year ago

In reply to: Masahiko Sawada (#187)

Re: UUID v7

On 27.11.24 00:11, Masahiko Sawada wrote:

On Tue, Nov 26, 2024 at 1:55 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Tue, 26 Nov 2024 at 21:48, Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

gen_uuidv7() is OK

I'd very much prefer to not have a gen_ or get_ prefix as argued before[1][2].

My vote is still for simply uuidv7() and uuidv4()

uuid-ossp is outdated, slow and not supported by the author. UUIDv7 is the renaissance of UUIDs. So we should not depend on legacy technology names

agreed

It seems that we agreed to use 'uuidv7' instead of 'uuid_v7()'. There
is discussion whether we should add 'gen_' or 'get_' but let's go back
to the previously-agreed function name 'uuidv7()' for now. We can
rename it later if we find a better name.

I've attached the new version patch that incorporated all comments and
renamed the functions. Also I avoided using 'if defined(__darwin__) ||
defined(_MSC_VER)' twice.

* doc/src/sgml/func.sgml

The function variant uuidv7(interval) is not documented.

* src/backend/utils/adt/uuid.c

+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{

This should be a block comment, like

/*
* Set the ...
*/

+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
...
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)

Is "ns" the timestamp argument? What format is it? Explain.

+   /*
+    * Shift the current timestamp by the given interval. To make correct
+    * calculating the time shift, we convert the UNIX epoch to TimestampTz
+    * and use timestamptz_pl_interval(). Since this calculation is done 
with
+    * microsecond precision, we carry back the nanoseconds.
+    */

This needs a bit of grammar tweaking, I think: "To make correct calculating"

I don't know what the meaning of "carry back" is.

+ Interval *span = PG_GETARG_INTERVAL_P(0);

Not sure why this is named "span"? Maybe "shift" would be better?

* src/include/catalog/pg_proc.dat

+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp
shifted on specific interval',

better "shifted by"?

* src/test/regress/expected/opr_sanity.out

+uuidv4()
+uuidv7()
+uuidv7(interval)

Functions without arguments don't need to be marked leakproof.

uuidv7(interval) internally calls timestamptz_pl_interval(), which is
not leakproof, so I don't think that classification is sound.

* src/test/regress/sql/uuid.sql

+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7

Make the whitespace of the comment consistent with the rest of the file.

-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis 
test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 
'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test 
vector for v1

Here as well.

#194

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Andrey M. Borodin (#192)

Отв.: Re: UUID v7

I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.

Отправлено из Yahoo Почты на iPhone

Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin <x4mmm@yandex-team.ru>:

On 28 Nov 2024, at 04:07, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

It would be useful to add a standard comparative benchmark with several parameters and use cases to the patch, so that IT departments can compare UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and conditions.

I know for a fact that IT departments make such benchmarks of low quality. They usually measure the generation rate, which is meaningless because it is usually excessive. It makes sense to measure the rate of single-threaded and multi-threaded insertion of a large number of records (with and without partitioning), as well as the rate of execution of queries to join big tables, to update or delete a large number of records. It is important to measure memory usage, processor load, etc.

Publishing benchmarks seems to be far beyond what our documentation go for. Mostly, because benchmarks are tricky. You can prove anything with benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39 applied on my MB Air M2 I get:

Almost an order of magnitude better :)

Best regards, Andrey Borodin.

#195

Kirill Reshke

reshkekirill@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#194)

Re: Отв.: Re: UUID v7

On Fri, 29 Nov 2024, 09:14 Sergey Prokhorenko, <
sergeyprokhorenko@yahoo.com.au> wrote:

I mean to add not benchmark results to the patch, but functions so that
everyone can compare themselves on their equipment. The comparison with
UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and
an integer key. And I have described many use cases, and in your benchmark
there is only one, the simplest.

Отправлено из Yahoo Почты на iPhone
<https://mail.onelink.me/107872968?pid=nativeplacement&c=Global_Acquisition_YMktg_315_Internal_EmailSignature&af_sub1=Acquisition&af_sub2=Global_YMktg&af_sub3=&af_sub4=100000604&af_sub5=EmailSignature__Static_>

Пользователь четверг, ноября 28, 2024, 11:09 AM написал Andrey M. Borodin <
x4mmm@yandex-team.ru>:

On 28 Nov 2024, at 04:07, Sergey Prokhorenko <

sergeyprokhorenko@yahoo.com.au> wrote:

It would be useful to add a standard comparative benchmark with several

parameters and use cases to the patch, so that IT departments can compare
UUIDv7, ULID, UUIDv4, Snowflake ID and BIGSERIAL for their hardware and
conditions.

I know for a fact that IT departments make such benchmarks of low

quality. They usually measure the generation rate, which is meaningless
because it is usually excessive. It makes sense to measure the rate of
single-threaded and multi-threaded insertion of a large number of records
(with and without partitioning), as well as the rate of execution of
queries to join big tables, to update or delete a large number of records.
It is important to measure memory usage, processor load, etc.

Publishing benchmarks seems to be far beyond what our documentation go
for. Mostly, because benchmarks are tricky. You can prove anything with
benchmarks.

Everyone is welcome to publish benchmark results in their blogs, but IMO
docs have a very different job to do.

I’ll just publish one benchmark in this mailing list. With patch v39
applied on my MB Air M2 I get:

postgres=# create table table_for_uuidv4(id uuid primary key);
CREATE TABLE
Time: 9.479 ms
postgres=# insert into table_for_uuidv4 select uuidv4() from
generate_series(1,3e7);
INSERT 0 30000000
Time: 2003918.770 ms (33:23.919)
postgres=# create table table_for_uuidv7(id uuid primary key);
CREATE TABLE
Time: 3.930 ms
postgres=# insert into table_for_uuidv7 select uuidv7() from
generate_series(1,3e7);
INSERT 0 30000000
Time: 337001.315 ms (05:37.001)

Almost an order of magnitude better :)

Best regards, Andrey Borodin.

Hi!

Do not top-post on this list

Show quoted text

#196

sawada.mshk@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#194)

Re: Отв.: Re: UUID v7

On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.

I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#197

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Masahiko Sawada (#196)

Re: Отв.: Re: UUID v7

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.

#198

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Sergey Prokhorenko (#197)

Re: Отв.: Re: UUID v7

On 29 Nov 2024, at 18:57, Sergey Prokhorenko <sergeyprokhorenko@yahoo.com.au> wrote:

Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.

I think we have pretty decent documentation in the patch. It only points to RFC and that's it.
There were patch versions with opinionated novels in docs. Giving advises, comparing possibilities and all that stuff. I'm so happy we passed through this stage and moved forward :)

Best regards, Andrey Borodin.

#199

sawada.mshk@gmail.com

about 1 year ago

In reply to: Sergey Prokhorenko (#197)

Re: Отв.: Re: UUID v7

On Fri, Nov 29, 2024 at 5:59 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko

<sergeyprokhorenko@yahoo.com.au> wrote:

I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.

I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.

There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#200

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Peter Eisentraut (#193)

2 attachment(s)

Re: UUID v7

On 29 Nov 2024, at 00:46, Peter Eisentraut <peter@eisentraut.org> wrote:

Here as well.

Peter, many thanks for the next round of review. I agree with all corrections.
I'm sending amendments addressing your review as a separate step in patch set. Step 1 of this patch set is identical to v39.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v40-0001-Add-UUID-version-7-generation-function.patchapplication/octet-stream; name=v40-0001-Add-UUID-version-7-generation-function.patch; x-unix-mode=0644Download

From 6c80fe539be9ecc81e72216124a938235f6006f8 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v40 1/2] Add UUID version 7 generation function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements the uuidv7() SQL function for generating UUID
version 7, as defined in RFC 9652. UUID v7 comprises a Unix timestamp
in milliseconds and random bits, providing uniqueness and
sortability.

In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, referred to as "rand_a" in
the RFC. This ensures additional monotonicity within a millisecond.

Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.

Bump catalog version.

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +-
 src/backend/utils/adt/uuid.c             | 243 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 344 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8b81106fa2..e9a2db2e93 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14255,6 +14255,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14264,12 +14272,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+    This function returns a version 7 UUID (UNIX timestamp with millisecond
+    precision + sub-millisecond timestamp + random).
   </para>
 
   <para>
@@ -14293,7 +14306,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..bb890e9f60 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,34 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
+ * sub-millisecond. While most Unix-like platforms provide nanosecond-precision
+ * timestamps, some systems only offer microsecond precision, limiting us to 10
+ * bits of sub-millisecond information. For example, on macOS, real time is
+ * truncated to microseconds. Additionally, MSVC uses the ported version of
+ * gettimeofday() that returns microsecond precision.
+ *
+ * On systems with only 10 bits of sub-millisecond precision, we still use
+ * 1/4096 parts of a millisecond, but fill lower 2 bits with random numbers
+ * (see generate_uuidv7() for details).
+ *
+ * SUBMS_MINIMAL_STEP defines the minimum number of nanoseconds that guarantees
+ * an increase in the UUID's clock precision.
+ */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +67,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +433,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +461,180 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/*
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+#if SUBMS_MINIMAL_STEP_BITS == 10
+
+	/*
+	 * On systems that have only 10 bits of sub-ms precision,  2 least
+	 * significant are dependent on other time-specific bits, and they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG. SUBMS_MINIMAL_STEP is chosen so that we still guarantee
+	 * monotonicity despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Compute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
+	 */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +644,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +663,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +690,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +698,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8f4ef0d7a6..0059a8c716 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -199,6 +204,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -219,8 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -239,4 +291,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 75ee966ded..6eb8efbd3d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -97,6 +102,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -104,12 +125,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.39.5 (Apple Git-154)

v40-0002-Fixes-to-address-review-notes-by-Peter-Eisentrau.patchapplication/octet-stream; name=v40-0002-Fixes-to-address-review-notes-by-Peter-Eisentrau.patch; x-unix-mode=0644Download

From a7720773eb27f821d603f8aba2cff6453d0797a3 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 29 Nov 2024 23:04:57 +0500
Subject: [PATCH v40 2/2] Fixes to address review notes by Peter Eisentraut

---
 doc/src/sgml/func.sgml                   |  4 +++-
 src/backend/utils/adt/uuid.c             | 27 ++++++++++++++++++------
 src/include/catalog/pg_proc.dat          | 10 ++++-----
 src/test/regress/expected/opr_sanity.out |  3 ---
 src/test/regress/expected/uuid.out       |  8 +++----
 src/test/regress/sql/uuid.sql            |  8 +++----
 6 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e9a2db2e93..f992d5266c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14282,7 +14282,9 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
     This function returns a version 7 UUID (UNIX timestamp with millisecond
-    precision + sub-millisecond timestamp + random).
+    precision + sub-millisecond timestamp + random). This function can accept
+    optional <parameter>shift</parameter> parameter of type <type>interval</type>
+    which shift internal timestamp by the given interval.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index bb890e9f60..e962f85838 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -433,7 +433,9 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
-/* Set the given UUID version and the variant bits */
+/*
+ * Set the given UUID version and the variant bits
+ */
 static inline void
 uuid_set_version(pg_uuid_t *uuid, unsigned char version)
 {
@@ -469,6 +471,15 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
+/*
+ * Wrapper for gen_random_uuid()
+ */
+Datum
+uuidv4(PG_FUNCTION_ARGS)
+{
+	return gen_random_uuid(fcinfo);
+}
+
 /*
  * Get the current timestamp with nanosecond precision for UUID generation.
  * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
@@ -520,6 +531,9 @@ get_real_time_ns_ascending()
  * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
  * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
  * (or 2^12) fraction of sub-millisecond precision.
+ *
+ * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * used for time-dependent bits of UUID.
  */
 static pg_attribute_always_inline pg_uuid_t *
 generate_uuidv7(int64 ns)
@@ -592,16 +606,17 @@ uuidv7(PG_FUNCTION_ARGS)
 Datum
 uuidv7_interval(PG_FUNCTION_ARGS)
 {
-	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	Interval   *shift = PG_GETARG_INTERVAL_P(0);
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
 	int64		ns = get_real_time_ns_ascending();
 
 	/*
-	 * Shift the current timestamp by the given interval. To make correct
-	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * Shift the current timestamp by the given interval. To calsulate time
+	 * shift correctly, we convert the UNIX epoch to TimestampTz
 	 * and use timestamptz_pl_interval(). Since this calculation is done with
-	 * microsecond precision, we carry back the nanoseconds.
+	 * microsecond precision, we carry nanoseconds from original ns value to
+	 * shifted ns value.
 	 */
 
 	ts = (TimestampTz) (ns / NS_PER_US) -
@@ -610,7 +625,7 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	/* Compute time shift */
 	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
 												 TimestampTzGetDatum(ts),
-												 IntervalPGetDatum(span)));
+												 IntervalPGetDatum(shift)));
 
 	/*
 	 * Convert a TimestampTz value back to an UNIX epoch and carry back
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3353e9d6e3..046ce3eb67 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9343,13 +9343,13 @@
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
 { oid => '9895', descr => 'generate UUID version 4',
-  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
-  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+  proname => 'uuidv4', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv4' },
 { oid => '9896', descr => 'generate UUID version 7',
-  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  proname => 'uuidv7', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
-{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
-  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted by specific interval',
+  proname => 'uuidv7', provolatile => 'v', proargnames => '{shift}',
   prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 43e7180a16..34a32bd11d 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,9 +878,6 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
-uuidv4()
-uuidv7()
-uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 0059a8c716..798633ad51 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -253,26 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
-SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv4());  -- 4
  uuid_extract_version 
 ----------------------
                     4
 (1 row)
 
-SELECT uuid_extract_version(uuidv7()); --7
+SELECT uuid_extract_version(uuidv7());  -- 7
  uuid_extract_version 
 ----------------------
                     7
 (1 row)
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 6eb8efbd3d..110188361d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -125,12 +125,12 @@ SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
-SELECT uuid_extract_version(uuidv4()); --4
-SELECT uuid_extract_version(uuidv7()); --7
+SELECT uuid_extract_version(uuidv4());  -- 4
+SELECT uuid_extract_version(uuidv7());  -- 7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
-SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#201

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#200)

Re: UUID v7

On Fri, Nov 29, 2024 at 10:39 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 29 Nov 2024, at 00:46, Peter Eisentraut <peter@eisentraut.org> wrote:

Here as well.

Peter, many thanks for the next round of review. I agree with all corrections.
I'm sending amendments addressing your review as a separate step in patch set. Step 1 of this patch set is identical to v39.

Thank you for updating the patch! Here are two comments:

 <function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
     This function returns a version 7 UUID (UNIX timestamp with millisecond
-    precision + sub-millisecond timestamp + random).
+    precision + sub-millisecond timestamp + random). This function can accept
+    optional <parameter>shift</parameter> parameter of type
<type>interval</type>
+    which shift internal timestamp by the given interval.
   </para>

There is no "shift" parameter in the function synopsis.

Also, while reviewing the changes for func.sgml, I find that now that
we have 5 UUID functions, it might make sense to create a table for
UUID functions instead of describing functions separately. Which seems
to be more readable and consistent with other functions in docs.

---
+/*
+ * Wrapper for gen_random_uuid()
+ */
+Datum
+uuidv4(PG_FUNCTION_ARGS)
+{
+ return gen_random_uuid(fcinfo);
+}

Why do we need this? IIUC we marked uuidv4() (and uuidv7()) leafproof
because gen_random_uuid() is marked too. Otherwise, the following test
in opr_sanity would fail:

-- Considering only built-in procs (prolang = 12), look for multiple uses
-- of the same internal function (ie, matching prosrc fields). It's OK to
-- have several entries with different pronames for the same internal function,
-- but conflicts in the number of arguments and other critical items should
-- be complained of. (We don't check data types here; see next query.)
-- Note: ignore aggregate functions here, since they all point to the same
-- dummy built-in function.

Given that these functions don't need to be marked leakproof, does it
make sense to remove the leakproof mark from gen_random_uuid() too?
That way, we don't need the wrapper function.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#202

Marcos Pegoraro

marcos@f10.com.br

about 1 year ago

In reply to: Masahiko Sawada (#201)

Re: UUID v7

Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com>
escreveu:

<function>uuidv7</function> () <returnvalue>uuid</returnvalue>

Wouldn't it be better to change this to
<function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue
and explain what that param is ?

regards
Marcos

#203

sawada.mshk@gmail.com

about 1 year ago

In reply to: Marcos Pegoraro (#202)

Re: UUID v7

On Fri, Nov 29, 2024 at 11:47 AM Marcos Pegoraro <marcos@f10.com.br> wrote:

Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

<function>uuidv7</function> () <returnvalue>uuid</returnvalue>

Wouldn't it be better to change this to
<function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue
and explain what that param is ?

Yes, the function synopsis in the doc should be either:

uuidv7([interval]) -> uuid

uuidv7([shift interval]) -> uuid

Since this function has only one function argument it doesn't
necessarily need an argument name 'shift'. So the proposed description
might be okay but we need to change at least the function synopsis.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#204

sawada.mshk@gmail.com

about 1 year ago

In reply to: Masahiko Sawada (#203)

Re: UUID v7

On Fri, Nov 29, 2024 at 12:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Nov 29, 2024 at 11:47 AM Marcos Pegoraro <marcos@f10.com.br> wrote:

Em sex., 29 de nov. de 2024 às 15:49, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:

<function>uuidv7</function> () <returnvalue>uuid</returnvalue>

Wouldn't it be better to change this to
<function>uuidv7</function> ([interval]) <returnvalue>uuid</returnvalue
and explain what that param is ?

Yes, the function synopsis in the doc should be either:

uuidv7([interval]) -> uuid

or

uuidv7([shift interval]) -> uuid

Since this function has only one function argument it doesn't
necessarily need an argument name 'shift'. So the proposed description
might be okay but we need to change at least the function synopsis.

I realized that the description of uuid_extract_timestamp() needs to
be updated as well since it now supports version 7 too:

<synopsis>
<function>uuid_extract_timestamp</function> (uuid)
<returnvalue>timestamp with time zone</returnvalue>
</synopsis>
This function extracts a <type>timestamp with time zone</type> from UUID
version 1. For other versions, this function returns null. Note that the
extracted timestamp is not necessarily exactly equal to the time the UUID
was generated; this depends on the implementation that generated the UUID.
</para>

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#205

Daniel Verite

daniel@manitou-mail.org

about 1 year ago

In reply to: Andrey M. Borodin (#200)

Re: UUID v7

Andrey M. Borodin wrote:

I'm sending amendments addressing your review as a separate step in patch
set. Step 1 of this patch set is identical to v39.

Some comments about the implementation of monotonicity:

+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP
greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;

[...]

+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;

In the case of parallel execution (uuidv7() being parallel-safe), if
there have been previous calls to uuidv7() in that backend,
previous_ns will be set in the backend process,
but zero in a newly spawned worker process.
If (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns) ever happens
to be true in the main process, it will start at false in the workers,
leading to non-monotonic results within the same query.

Also in the case of a backward clock change, we can end up with some
backends sticking to the "old time" plus increment per invocation
until they die, while some other backends spawned after the clock
change are on the "new time". These backends may produce series of
UUIDv7 that would be completely out of sync with each others.
A backward clock change is an abnormality, but if it occurs, what's
the best choice? Take the bullet and switch to the new time , or
stick to a time that is permanently decorrelated from the OS
clock? I would think that the latter is worse.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#206

sawada.mshk@gmail.com

about 1 year ago

In reply to: Daniel Verite (#205)

Re: UUID v7

On Sat, Nov 30, 2024 at 7:01 AM Daniel Verite <daniel@manitou-mail.org> wrote:

Andrey M. Borodin wrote:

I'm sending amendments addressing your review as a separate step in patch
set. Step 1 of this patch set is identical to v39.

Some comments about the implementation of monotonicity:
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP
greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+       static int64 previous_ns = 0;
[...]
+       /* Guarantee the minimal step advancement of the timestamp */
+       if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+               ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+       previous_ns = ns;
In the case of parallel execution (uuidv7() being parallel-safe), if
there have been previous calls to uuidv7() in that backend,
previous_ns will be set in the backend process,
but zero in a newly spawned worker process.
If (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns) ever happens
to be true in the main process, it will start at false in the workers,
leading to non-monotonic results within the same query.

The monotonicity of generated UUIDv7 is guaranteed only within a
single backend. I think your point is that UUIDs in parallel query
results might not be ordered. But is it guaranteed that without ORDER
BY clause, the results returned by parallel queries are in the same
order as the results from non-parallel queries in the first place?

Also in the case of a backward clock change, we can end up with some
backends sticking to the "old time" plus increment per invocation
until they die, while some other backends spawned after the clock
change are on the "new time". These backends may produce series of
UUIDv7 that would be completely out of sync with each others.
A backward clock change is an abnormality, but if it occurs, what's
the best choice? Take the bullet and switch to the new time , or
stick to a time that is permanently decorrelated from the OS
clock? I would think that the latter is worse.

IIUC after generating a UUIDv7 with the correct time, even if the
system time goes back, the time in the next UUIDv7 will be
SUBMS_MINIMAL_STEP_NS nanoseconds ahead of the last correct time.
Also, in case where the backend generates its first UUIDv7 with an
incorrect (e.g. an old) time, it generates UUIDv7 based on the
incorrect timestamp. However, it starts generating UUIDv7 with the
correct timestamp as soon as the system time goes back to the correct
time. So I think that it doesn't happen that one backend is sticking
to an old time while another backend is using the correct timestamp to
generate UUIDv7. Note that we use (the previous timestamp +
SUBMS_MINIMAL_STEP_NS) only if the system clock didn't move forward by
SUBMS_MINIMAL_STEP_NS.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#207

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#206)

3 attachment(s)

Re: UUID v7

On 2 Dec 2024, at 11:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The monotonicity of generated UUIDv7 is guaranteed only within a
single backend.

I've addressed all items, except formatting a table... I can't build docs to make a reasonable judgement if the table looks OK.

Changes:
- restored leakproof flag of functions without arguments to be consistent with gen_random_uuid()
- improved uuidv7() synopsis

Also PFA a prototype of making uuidv7() ordered across all backends via keeping previous_ns in shared memory. IMO it's overcomplicating and RFC does not require such guarantees. Also, this would cost us several hundreds of ns on each uuidv7() call. I think we should focus on committing existing implementation and leave such things for a future improvement.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v41-0001-Add-UUID-version-7-generation-function.patchapplication/octet-stream; name=v41-0001-Add-UUID-version-7-generation-function.patch; x-unix-mode=0644Download

From 6c80fe539be9ecc81e72216124a938235f6006f8 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v41 1/3] Add UUID version 7 generation function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements the uuidv7() SQL function for generating UUID
version 7, as defined in RFC 9652. UUID v7 comprises a Unix timestamp
in milliseconds and random bits, providing uniqueness and
sortability.

In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, referred to as "rand_a" in
the RFC. This ensures additional monotonicity within a millisecond.

Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.

Bump catalog version.

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml               |   2 +-
 doc/src/sgml/func.sgml                   |  21 +-
 src/backend/utils/adt/uuid.c             | 243 ++++++++++++++++++++++-
 src/include/catalog/pg_proc.dat          |  11 +-
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/uuid.out       |  56 +++++-
 src/test/regress/sql/uuid.sql            |  28 ++-
 7 files changed, 344 insertions(+), 20 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1..3e6751d64c 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8b81106fa2..e9a2db2e93 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14255,6 +14255,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14264,12 +14272,17 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+    This function returns a version 7 UUID (UNIX timestamp with millisecond
+    precision + sub-millisecond timestamp + random).
   </para>
 
   <para>
@@ -14293,7 +14306,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc..bb890e9f60 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,34 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
+ * sub-millisecond. While most Unix-like platforms provide nanosecond-precision
+ * timestamps, some systems only offer microsecond precision, limiting us to 10
+ * bits of sub-millisecond information. For example, on macOS, real time is
+ * truncated to microseconds. Additionally, MSVC uses the ported version of
+ * gettimeofday() that returns microsecond precision.
+ *
+ * On systems with only 10 bits of sub-millisecond precision, we still use
+ * 1/4096 parts of a millisecond, but fill lower 2 bits with random numbers
+ * (see generate_uuidv7() for details).
+ *
+ * SUBMS_MINIMAL_STEP defines the minimum number of nanoseconds that guarantees
+ * an increase in the UUID's clock precision.
+ */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +67,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +433,23 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/* Set the given UUID version and the variant bits */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +461,180 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
+	 */
+	uuid_set_version(uuid, 4);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday() where available, instead use
+	 * clock_gettime() with CLOCK_REALTIME in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note that a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms. On some platforms
+	 * such as macOS, it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
+ * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
+ * (or 2^12) fraction of sub-millisecond precision.
+ */
+static pg_attribute_always_inline pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/*
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+#if SUBMS_MINIMAL_STEP_BITS == 10
+
+	/*
+	 * On systems that have only 10 bits of sub-ms precision,  2 least
+	 * significant are dependent on other time-specific bits, and they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG. SUBMS_MINIMAL_STEP is chosen so that we still guarantee
+	 * monotonicity despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To make correct
+	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * and use timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry back the nanoseconds.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Compute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(span)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and carry back
+	 * nanoseconds.
+	 */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUID */
+	uuid = generate_uuidv7(ns);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +644,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +663,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +690,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +698,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..3353e9d6e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,11 +9342,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
+  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..43e7180a16 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -878,6 +878,9 @@ crc32(bytea)
 crc32c(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
+uuidv4()
+uuidv7()
+uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8f4ef0d7a6..0059a8c716 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -199,6 +204,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -219,8 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4()); --4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7()); --7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -239,4 +291,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 75ee966ded..6eb8efbd3d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -97,6 +102,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -104,12 +125,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv7()); --7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.39.5 (Apple Git-154)

v41-0003-Make-UUIDv7-ordered-across-all-backends.patchapplication/octet-stream; name=v41-0003-Make-UUIDv7-ordered-across-all-backends.patch; x-unix-mode=0644Download

From 70d02718e454399eda8a7f0fc0636c92e5e10897 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Wed, 4 Dec 2024 19:35:13 +0300
Subject: [PATCH v41 3/3] Make UUIDv7 ordered across all backends

---
 src/backend/storage/ipc/ipci.c |  3 ++
 src/backend/utils/adt/uuid.c   | 50 ++++++++++++++++++++++++++++++----
 src/include/utils/uuid.h       |  3 ++
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 7783ba854f..71b0266cc4 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -50,6 +50,7 @@
 #include "storage/sinvaladt.h"
 #include "utils/guc.h"
 #include "utils/injection_point.h"
+#include "utils/uuid.h"
 
 /* GUCs */
 int			shared_memory_type = DEFAULT_SHARED_MEMORY_TYPE;
@@ -148,6 +149,7 @@ CalculateShmemSize(int *num_semaphores)
 	size = add_size(size, WaitEventCustomShmemSize());
 	size = add_size(size, InjectionPointShmemSize());
 	size = add_size(size, SlotSyncShmemSize());
+	size = add_size(size, UuidShmemSize());
 
 	/* include additional requested shmem from preload libraries */
 	size = add_size(size, total_addin_request);
@@ -340,6 +342,7 @@ CreateOrAttachShmemStructs(void)
 	StatsShmemInit();
 	WaitEventCustomShmemInit();
 	InjectionPointShmemInit();
+	UuidShmemInit();
 }
 
 /*
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index b09297449a..2af100a915 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -18,7 +18,9 @@
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "port/atomics.h"
 #include "port/pg_bswap.h"
+#include "storage/shmem.h"
 #include "utils/fmgrprotos.h"
 #include "utils/guc.h"
 #include "utils/sortsupport.h"
@@ -471,6 +473,35 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 	PG_RETURN_UUID_P(uuid);
 }
 
+static pg_atomic_uint64 *previous_ns = NULL;
+
+/* Report shared memory space needed by previous_ns */
+Size
+UuidShmemSize(void)
+{
+	Size		size = 0;
+
+	size = add_size(size, sizeof(pg_atomic_uint64)+100);
+
+	return size;
+}
+
+/* Allocate and initialize previous_ns shared memory */
+void
+UuidShmemInit(void)
+{
+	bool		found;
+
+	previous_ns = (pg_atomic_uint64 *)
+		ShmemInitStruct("UUID timestamp", UuidShmemSize(), &found);
+
+	if (!found)
+	{
+		/* First time through, so initialize */
+		pg_atomic_init_u64(previous_ns, 0);
+	}
+}
+
 /*
  * Get the current timestamp with nanosecond precision for UUID generation.
  * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
@@ -479,7 +510,6 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 static inline int64
 get_real_time_ns_ascending()
 {
-	static int64 previous_ns = 0;
 	int64		ns;
 
 	/* Get the current real timestamp */
@@ -505,10 +535,20 @@ get_real_time_ns_ascending()
 	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
 #endif
 
-	/* Guarantee the minimal step advancement of the timestamp */
-	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
-		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
-	previous_ns = ns;
+	/* Guarantee the minimal step advancement of the timestamp across all backends */
+	while (true)
+	{
+		uint64 copy_pns = pg_atomic_read_u64(previous_ns);
+		uint64 copy_ns = ns;
+		if (copy_pns + SUBMS_MINIMAL_STEP_NS >= ns)
+			copy_ns = copy_pns + SUBMS_MINIMAL_STEP_NS;
+
+		if (pg_atomic_compare_exchange_u64(previous_ns, &copy_pns, copy_ns))
+		{
+			ns = copy_ns;
+			break;
+		}
+	}
 
 	return ns;
 }
diff --git a/src/include/utils/uuid.h b/src/include/utils/uuid.h
index ae631e75d5..e6ba83a9db 100644
--- a/src/include/utils/uuid.h
+++ b/src/include/utils/uuid.h
@@ -39,4 +39,7 @@ DatumGetUUIDP(Datum X)
 
 #define PG_GETARG_UUID_P(X)		DatumGetUUIDP(PG_GETARG_DATUM(X))
 
+extern Size UuidShmemSize(void);
+extern void UuidShmemInit(void);
+
 #endif							/* UUID_H */
-- 
2.39.5 (Apple Git-154)

v41-0002-Fixes-to-address-review-notes-by-Peter-Eisentrau.patchapplication/octet-stream; name=v41-0002-Fixes-to-address-review-notes-by-Peter-Eisentrau.patch; x-unix-mode=0644Download

From f66cde033f024403227c1d036dee410f3fe62a16 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 29 Nov 2024 23:04:57 +0500
Subject: [PATCH v41 2/3] Fixes to address review notes by Peter Eisentraut

---
 doc/src/sgml/func.sgml                   | 13 ++++++++-----
 src/backend/utils/adt/uuid.c             | 18 ++++++++++++------
 src/include/catalog/pg_proc.dat          | 12 ++++++------
 src/test/regress/expected/opr_sanity.out |  1 -
 src/test/regress/expected/uuid.out       |  8 ++++----
 src/test/regress/sql/uuid.sql            |  8 ++++----
 6 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e9a2db2e93..bf3a304134 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14279,10 +14279,12 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 </synopsis>
    These functions return a version 4 (random) UUID.
 <synopsis>
-<function>uuidv7</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv7</function> (<optional> <parameter>shift</parameter> <type>interval</type> </optional>) <returnvalue>uuid</returnvalue>
 </synopsis>
     This function returns a version 7 UUID (UNIX timestamp with millisecond
-    precision + sub-millisecond timestamp + random).
+    precision + sub-millisecond timestamp + random). This function can accept
+    optional <parameter>shift</parameter> parameter of type <type>interval</type>
+    which shift internal timestamp by the given interval.
   </para>
 
   <para>
@@ -14296,9 +14298,10 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamp with time zone</returnvalue>
 </synopsis>
    This function extracts a <type>timestamp with time zone</type> from UUID
-   version 1.  For other versions, this function returns null.  Note that the
-   extracted timestamp is not necessarily exactly equal to the time the UUID
-   was generated; this depends on the implementation that generated the UUID.
+   version 1 and 7.  For other versions, this function returns null.  Note that
+   the extracted timestamp is not necessarily exactly equal to the time the
+   UUID was generated; this depends on the implementation that generated the
+   UUID.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index bb890e9f60..b09297449a 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -433,7 +433,9 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
-/* Set the given UUID version and the variant bits */
+/*
+ * Set the given UUID version and the variant bits
+ */
 static inline void
 uuid_set_version(pg_uuid_t *uuid, unsigned char version)
 {
@@ -520,6 +522,9 @@ get_real_time_ns_ascending()
  * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)".
  * This method utilizes 12 bits from the "rand_a" bits to store a 1/4096
  * (or 2^12) fraction of sub-millisecond precision.
+ *
+ * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * used for time-dependent bits of UUID.
  */
 static pg_attribute_always_inline pg_uuid_t *
 generate_uuidv7(int64 ns)
@@ -592,16 +597,17 @@ uuidv7(PG_FUNCTION_ARGS)
 Datum
 uuidv7_interval(PG_FUNCTION_ARGS)
 {
-	Interval   *span = PG_GETARG_INTERVAL_P(0);
+	Interval   *shift = PG_GETARG_INTERVAL_P(0);
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
 	int64		ns = get_real_time_ns_ascending();
 
 	/*
-	 * Shift the current timestamp by the given interval. To make correct
-	 * calculating the time shift, we convert the UNIX epoch to TimestampTz
+	 * Shift the current timestamp by the given interval. To calsulate time
+	 * shift correctly, we convert the UNIX epoch to TimestampTz
 	 * and use timestamptz_pl_interval(). Since this calculation is done with
-	 * microsecond precision, we carry back the nanoseconds.
+	 * microsecond precision, we carry nanoseconds from original ns value to
+	 * shifted ns value.
 	 */
 
 	ts = (TimestampTz) (ns / NS_PER_US) -
@@ -610,7 +616,7 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	/* Compute time shift */
 	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
 												 TimestampTzGetDatum(ts),
-												 IntervalPGetDatum(span)));
+												 IntervalPGetDatum(shift)));
 
 	/*
 	 * Convert a TimestampTz value back to an UNIX epoch and carry back
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3353e9d6e3..a218c3e22b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9342,14 +9342,14 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
-{ oid => '9895', descr => 'generate UUID version 4',
-  proname => 'uuidv4', proleakproof => 't', provolatile => 'v',
+{ oid => '9895', descr => 'generate UUID version 4', proleakproof => 't',
+  proname => 'uuidv4', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
-{ oid => '9896', descr => 'generate UUID version 7',
-  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+{ oid => '9896', descr => 'generate UUID version 7', proleakproof => 't',
+  proname => 'uuidv7', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
-{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted on specific interval',
-  proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted by specific interval',
+  proname => 'uuidv7', provolatile => 'v', proargnames => '{shift}',
   prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 43e7180a16..2a9e15b39b 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -880,7 +880,6 @@ bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 uuidv4()
 uuidv7()
-uuidv7(interval)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 0059a8c716..798633ad51 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -253,26 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
-SELECT uuid_extract_version(uuidv4()); --4
+SELECT uuid_extract_version(uuidv4());  -- 4
  uuid_extract_version 
 ----------------------
                     4
 (1 row)
 
-SELECT uuid_extract_version(uuidv7()); --7
+SELECT uuid_extract_version(uuidv7());  -- 7
  uuid_extract_version 
 ----------------------
                     7
 (1 row)
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
  ?column? 
 ----------
  t
 (1 row)
 
-SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 6eb8efbd3d..110188361d 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -125,12 +125,12 @@ SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
-SELECT uuid_extract_version(uuidv4()); --4
-SELECT uuid_extract_version(uuidv7()); --7
+SELECT uuid_extract_version(uuidv4());  -- 4
+SELECT uuid_extract_version(uuidv7());  -- 7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v1
-SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00'; -- RFC 9562 test vector for v7
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
-- 
2.39.5 (Apple Git-154)

#208

sergeyprokhorenko@yahoo.com.au

about 1 year ago

In reply to: Masahiko Sawada (#199)

Benchmark function for uuidv7()

On Friday 29 November 2024 at 08:55:09 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Fri, Nov 29, 2024 at 5:59 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Friday 29 November 2024 at 09:19:33 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Thu, Nov 28, 2024 at 8:13 PM Sergey Prokhorenko

<sergeyprokhorenko@yahoo.com.au> wrote:

I mean to add not benchmark results to the patch, but functions so that everyone can compare themselves on their equipment. The comparison with UUIDv4 is not very interesting, as the choice is usually between UUIDv7 and an integer key. And I have described many use cases, and in your benchmark there is only one, the simplest.

I don't think we should add such benchmark functions at least to this
patch. If there already is a well-established workload using UUIDv7
and UUIDv4 etc, users can use pgbench with custom scripts, or it might
make sense to add it to pgbench as a built-in workload. Which however
should be a separate patch. Having said that, I think users should use
benchmarks that fit their workloads, and it would not be easy to
establish workloads that are reasonable for most systems.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Workloads can and must be added with parameters. Typically, companies use test tables of 10,000 and 1,000,000 records, etc. Different companies have mostly similar usage scenarios (for example, incremental loading). Each company has to duplicate the work of others, creating the same benchmarks. The worst thing is that this is entrusted to incompetent employees who are not very good at understanding typical key usage scenarios. As a rule, these are programmers, not system analysts. Accordingly, the solution in 99% of cases will be in favor of integer keys, as they take up less space and are generated faster. If we leave this problem until the next patch, it will take us a year and a half. This is completely wrong.

There are still 4 months left until the feature freeze. We can discuss
this topic and might find solutions. I don't think it's a blocker of
this patch (UUIDv7 implementation patch).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

_________________________________________________________________________________________
_________________________________________________________________________________________

I am not a programmer, but a systems analyst. Therefore, I cannot develop a function for the benchmark myself, but I can describe the requirements for this function.
The function for the benchmark could be implemented as a separate patch (not UUIDv7 implementation patch) to avoid blocking the UUIDv7 implementation patch.
Requirements for the uuidv7() benchmark function
1. Benchmark function
The function for uuidv7() benchmark should be called uuidv7_benchmark().
This function should have parameters:• optional parameter mock_table_record_count. If the parameter is not passed, then the value of 1 million records is taken• optional parameter payload_size_b. If the parameter is not passed, then the value of 2048 bytes is taken
2. Benchmark results table
The result of each benchmark step for a certain surrogate key type should be dumped into a separate row of the uuidv7_benchmark_results table. This table should contain the following columns:• benchmark_start_datetime• mock_table_record_count• step_name• surrogate_key_type• rate_per_ms (nullable)• cpu_usage_ percent (nullable)• memory_usage_mb (nullable)• drive_usage_mb_per_s (nullable)
The table rows should be sequentially sorted by the columns benchmark_start_datetime, step_name, surrogate_key_type.
Sample benchmark results table:
| benchmark_start_datetime | mock_table_record_count | step_name | surrogate_key_type | rate_per_ms | cpu_usage \_ percent | memory_usage_mb | drive_usage_mb_per_s || ------------------------ | ----------------------- | ----------------- | ------------------ | ----------- | -------------------- | --------------- | -------------------- || | | 1_insert | UUIDv7 | | | | || | | 1_insert | BIGSERIAL | | | | || | | 1_insert | UUIDv4 | | | | || | | 2_parallel_insert | UUIDv7 | | | | || | | 2_parallel_insert | BIGSERIAL | | | | || | | 2_parallel_insert | UUIDv4 | | | | || | | 3_left_join | UUIDv7 | | | | || | | 3_left_join | BIGSERIAL | | | | || | | 3_left_join | UUIDv4 | | | | || | | … | … | | | | |
3. Mock tableEach record in the mock_table table must contain the following columns:• id with the UUID data type, PRIMARY KEY (indexed)• payload with the bytea data type
4. Compared surrogate key typesEach benchmark step is run sequentially with the following surrogate key types in the mock_table.id column:• UUIDv7• BIGSERIAL• The name of the function for generating surrogate keys (for example, one of the formats: UUIDv4, ULID or Snowflake ID), if the user specifies such a function and makes it available
5. Benchmark steps
Before running the benchmark steps, a mock table is created.Pseudocode:CREATE TABLE mock_table (id UUID PRIMARY KEY DEFAULT uuidv7(),payload BYTEA);
The benchmark must have the following steps:
step_name = '1_insert'Pseudocode:INSERT INTO mock_table (payload)SELECT filled_payload(payload_size_b)FROM generate_series(1, mock_table_record_count);
step_name = '2_parallel_insert'The algorithm is at the discretion of the developer.
step_name = '3_left_join'Pseudocode:SELECT COUNT(*)FROM mock_table aLEFT JOIN mock_table b ON b.id = a.idWHERE b.id IS NULL;
step_name = '4_inner_join'Pseudocode:SELECT COUNT(*)FROM mock_table aINNER JOIN mock_table b ON b.id = a.idWHERE b.id IS NULL;
step_name = '5_group_by'Pseudocode:SELECT id, COUNT(*)FROM mock_tableGROUP BY idHAVING COUNT(*) > 1;
step_name = '6_delete'Pseudocode:DELETE FROM mock_table aUSING mock_table bWHERE b.id = a.id;

Regards,

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

#209

[1]: /messages/by-id/CAD21AoBE1ePPWY1NQEgk3DkqjYzLPZwYTzCySHm0e+9a69PfZw@mail.gmail.com

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#207)

2 attachment(s)

Re: UUID v7

On Wed, Dec 4, 2024 at 9:04 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 2 Dec 2024, at 11:00, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

The monotonicity of generated UUIDv7 is guaranteed only within a
single backend.

I've addressed all items, except formatting a table... I can't build docs to make a reasonable judgement if the table looks OK.

Thank you for updating the patch!

Changes:
- restored leakproof flag of functions without arguments to be consistent with gen_random_uuid()

If I understand the below Peter's comment correctly, we don't need to
mark all of three functions leakproof:

* src/test/regress/expected/opr_sanity.out
+uuidv4()
+uuidv7()
+uuidv7(interval)
Functions without arguments don't need to be marked leakproof.

uuidv7(interval) internally calls timestamptz_pl_interval(), which is
not leakproof, so I don't think that classification is sound.

I've attached the updated patches. The 0001 patch unmarks the existing
gen_random_uuid() leakproof and is being discussed on another
thread[1]/messages/by-id/CAD21AoBE1ePPWY1NQEgk3DkqjYzLPZwYTzCySHm0e+9a69PfZw@mail.gmail.com. I'm going to push the main UUIDv7 patch barring objections
and further comments, after pushing the fix for gen_random_uuid().

Also PFA a prototype of making uuidv7() ordered across all backends via keeping previous_ns in shared memory. IMO it's overcomplicating and RFC does not require such guarantees. Also, this would cost us several hundreds of ns on each uuidv7() call. I think we should focus on committing existing implementation and leave such things for a future improvement.

I also feel like it's overcomplicating. We can focus on the main patch
and can implement it later if we really need it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v42-0001-Unmark-gen_random_uuid-leakproof.patchapplication/x-patch; name=v42-0001-Unmark-gen_random_uuid-leakproof.patchDownload

From 960c0df86affe4cbc6feba74df6d9643e37506e4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 9 Dec 2024 13:17:27 -0800
Subject: [PATCH v42 1/3] Unmark gen_random_uuid() leakproof.

---
 src/include/catalog/pg_proc.dat          | 2 +-
 src/test/regress/expected/opr_sanity.out | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9575524007f..ccf79761da5 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9345,7 +9345,7 @@
   proname => 'uuid_hash_extended', prorettype => 'int8',
   proargtypes => 'uuid int8', prosrc => 'uuid_hash_extended' },
 { oid => '3432', descr => 'generate random UUID',
-  proname => 'gen_random_uuid', proleakproof => 't', provolatile => 'v',
+  proname => 'gen_random_uuid', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d2..452f2572302 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -855,7 +855,6 @@ sha224(bytea)
 sha256(bytea)
 sha384(bytea)
 sha512(bytea)
-gen_random_uuid()
 starts_with(text,text)
 macaddr8_eq(macaddr8,macaddr8)
 macaddr8_lt(macaddr8,macaddr8)
-- 
2.43.5

v42-0002-Add-UUID-version-7-generation-function.patchapplication/x-patch; name=v42-0002-Add-UUID-version-7-generation-function.patchDownload

From cb7912a7d97e78253ce498d23224097cf2f5d895 Mon Sep 17 00:00:00 2001
From: "Andrey M. Borodin" <x4mmm@night.local>
Date: Wed, 20 Mar 2024 22:30:14 +0500
Subject: [PATCH v42 2/3] Add UUID version 7 generation function.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit implements the uuidv7() SQL function for generating UUID
version 7, as defined in RFC 9652. UUID v7 comprises a Unix timestamp
in milliseconds and random bits, providing uniqueness and
sortability.

In our implementation, the 12-bit sub-millisecond timestamp fraction
is stored immediately after the timestamp, referred to as "rand_a" in
the RFC. This ensures additional monotonicity within a millisecond.

Additionally, an alias uuidv4() is added for the existing
gen_random_uuid() SQL function to maintain consistency.

XXX Bump catalog version.

Author: Andrey Borodin
Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch
Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl
Reviewed-by: Michael Paquier, Masahiko Sawada, Stepan Neretin
Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com
---
 doc/src/sgml/datatype.sgml         |   2 +-
 doc/src/sgml/func.sgml             |  30 +++-
 src/backend/utils/adt/uuid.c       | 248 +++++++++++++++++++++++++++--
 src/include/catalog/pg_proc.dat    |  11 +-
 src/test/regress/expected/uuid.out |  56 ++++++-
 src/test/regress/sql/uuid.sql      |  28 +++-
 6 files changed, 352 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index e0d33f12e1c..3e6751d64cc 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -4380,7 +4380,7 @@ SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
 
    <para>
     The data type <type>uuid</type> stores Universally Unique Identifiers
-    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
+    (UUID) as defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>,
     ISO/IEC 9834-8:2005, and related standards.
     (Some systems refer to this data type as a globally unique identifier, or
     GUID,<indexterm><primary>GUID</primary></indexterm> instead.)  This
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8b81106fa23..bf3a3041344 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14255,6 +14255,14 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
    <primary>gen_random_uuid</primary>
   </indexterm>
 
+  <indexterm>
+   <primary>uuidv4</primary>
+  </indexterm>
+
+  <indexterm>
+   <primary>uuidv7</primary>
+  </indexterm>
+
   <indexterm>
    <primary>uuid_extract_timestamp</primary>
   </indexterm>
@@ -14264,12 +14272,19 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
   </indexterm>
 
   <para>
-   <productname>PostgreSQL</productname> includes one function to generate a UUID:
+   <productname>PostgreSQL</productname> includes several functions to generate a UUID.
 <synopsis>
 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>
+<function>uuidv4</function> () <returnvalue>uuid</returnvalue>
+</synopsis>
+   These functions return a version 4 (random) UUID.
+<synopsis>
+<function>uuidv7</function> (<optional> <parameter>shift</parameter> <type>interval</type> </optional>) <returnvalue>uuid</returnvalue>
 </synopsis>
-   This function returns a version 4 (random) UUID.  This is the most commonly
-   used type of UUID and is appropriate for most applications.
+    This function returns a version 7 UUID (UNIX timestamp with millisecond
+    precision + sub-millisecond timestamp + random). This function can accept
+    optional <parameter>shift</parameter> parameter of type <type>interval</type>
+    which shift internal timestamp by the given interval.
   </para>
 
   <para>
@@ -14283,9 +14298,10 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamp with time zone</returnvalue>
 </synopsis>
    This function extracts a <type>timestamp with time zone</type> from UUID
-   version 1.  For other versions, this function returns null.  Note that the
-   extracted timestamp is not necessarily exactly equal to the time the UUID
-   was generated; this depends on the implementation that generated the UUID.
+   version 1 and 7.  For other versions, this function returns null.  Note that
+   the extracted timestamp is not necessarily exactly equal to the time the
+   UUID was generated; this depends on the implementation that generated the
+   UUID.
   </para>
 
   <para>
@@ -14293,7 +14309,7 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 <function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>
 </synopsis>
    This function extracts the version from a UUID of the variant described by
-   <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>.  For
+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For
    other variants, this function returns null.  For example, for a UUID
    generated by <function>gen_random_uuid</function>, this function will
    return 4.
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 5284d23dcc4..5842e8b9f4a 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -13,6 +13,8 @@
 
 #include "postgres.h"
 
+#include <time.h>				/* for clock_gettime() */
+
 #include "common/hashfn.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
@@ -23,6 +25,34 @@
 #include "utils/timestamp.h"
 #include "utils/uuid.h"
 
+/* helper macros */
+#define NS_PER_S	INT64CONST(1000000000)
+#define NS_PER_MS	INT64CONST(1000000)
+#define NS_PER_US	INT64CONST(1000)
+
+/*
+ * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
+ * sub-millisecond. While most Unix-like platforms provide nanosecond-precision
+ * timestamps, some systems only offer microsecond precision, limiting us to 10
+ * bits of sub-millisecond information. For example, on macOS, real time is
+ * truncated to microseconds. Additionally, MSVC uses the ported version of
+ * gettimeofday() that returns microsecond precision.
+ *
+ * On systems with only 10 bits of sub-millisecond precision, we still use
+ * 1/4096 parts of a millisecond, but fill lower 2 bits with random numbers
+ * (see generate_uuidv7() for details).
+ *
+ * SUBMS_MINIMAL_STEP_NS defines the minimum number of nanoseconds that guarantees
+ * an increase in the UUID's clock precision.
+ */
+#if defined(__darwin__) || defined(_MSC_VER)
+#define SUBMS_MINIMAL_STEP_BITS 10
+#else
+#define SUBMS_MINIMAL_STEP_BITS 12
+#endif
+#define SUBMS_BITS	12
+#define SUBMS_MINIMAL_STEP_NS ((NS_PER_MS / (1 << SUBMS_MINIMAL_STEP_BITS)) + 1)
+
 /* sortsupport for uuid */
 typedef struct
 {
@@ -37,6 +67,8 @@ static int	uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2);
 static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
+static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
+static inline int64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -401,6 +433,25 @@ uuid_hash_extended(PG_FUNCTION_ARGS)
 	return hash_any_extended(key->data, UUID_LEN, PG_GETARG_INT64(1));
 }
 
+/*
+ * Set the given UUID version and the variant bits
+ */
+static inline void
+uuid_set_version(pg_uuid_t *uuid, unsigned char version)
+{
+	/* set version field, top four bits */
+	uuid->data[6] = (uuid->data[6] & 0x0f) | (version << 4);
+
+	/* set variant field, top two bits are 1, 0 */
+	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;
+}
+
+/*
+ * Generate UUID version 4.
+ *
+ * All UUID bytes are filled with strong random numbers except version and
+ * variant bits.
+ */
 Datum
 gen_random_uuid(PG_FUNCTION_ARGS)
 {
@@ -412,21 +463,183 @@ gen_random_uuid(PG_FUNCTION_ARGS)
 				 errmsg("could not generate random values")));
 
 	/*
-	 * Set magic numbers for a "version 4" (pseudorandom) UUID, see
-	 * http://tools.ietf.org/html/rfc4122#section-4.4
+	 * Set magic numbers for a "version 4" (pseudorandom) UUID and variant,
+	 * see https://datatracker.ietf.org/doc/html/rfc9562#name-uuid-version-4
 	 */
-	uuid->data[6] = (uuid->data[6] & 0x0f) | 0x40;	/* time_hi_and_version */
-	uuid->data[8] = (uuid->data[8] & 0x3f) | 0x80;	/* clock_seq_hi_and_reserved */
+	uuid_set_version(uuid, 4);
 
 	PG_RETURN_UUID_P(uuid);
 }
 
-#define UUIDV1_EPOCH_JDATE  2299161 /* == date2j(1582,10,15) */
+/*
+ * Get the current timestamp with nanosecond precision for UUID generation.
+ * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
+ * than the previous returned timestamp (on this backend).
+ */
+static inline int64
+get_real_time_ns_ascending()
+{
+	static int64 previous_ns = 0;
+	int64		ns;
+
+	/* Get the current real timestamp */
+
+#ifdef	_MSC_VER
+	struct timeval tmp;
+
+	gettimeofday(&tmp, NULL);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_usec * NS_PER_US;
+#else
+	struct timespec tmp;
+
+	/*
+	 * We don't use gettimeofday(), instead use clock_gettime() with
+	 * CLOCK_REALTIME where available in order to get a high-precision
+	 * (nanoseconds) real timestamp.
+	 *
+	 * Note while a timestamp returned by clock_gettime() with CLOCK_REALTIME
+	 * is nanosecond-precision on most Unix-like platforms, on some platforms
+	 * such as macOS it's restricted to microsecond-precision.
+	 */
+	clock_gettime(CLOCK_REALTIME, &tmp);
+	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
+#endif
+
+	/* Guarantee the minimal step advancement of the timestamp */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
+	previous_ns = ns;
+
+	return ns;
+}
+
+/*
+ * Generate UUID version 7 per RFC 9562, with the given timestamp.
+ *
+ * UUID version 7 consists of a Unix timestamp in milliseconds (48 bits) and
+ * 74 random bits, excluding the required version and variant bits. To ensure
+ * monotonicity in scenarios of high-frequency UUID generation, we employ the
+ * method "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)",
+ * described in the RFC. This method utilizes 12 bits from the "rand_a" bits
+ * to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
+ *
+ * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * used for time-dependent bits of UUID.
+ */
+static pg_uuid_t *
+generate_uuidv7(int64 ns)
+{
+	pg_uuid_t  *uuid = palloc(UUID_LEN);
+	int64		unix_ts_ms;
+	int32		increased_clock_precision;
+
+	unix_ts_ms = ns / NS_PER_MS;
+
+	/* Fill in time part */
+	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
+	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
+	uuid->data[2] = (unsigned char) (unix_ts_ms >> 24);
+	uuid->data[3] = (unsigned char) (unix_ts_ms >> 16);
+	uuid->data[4] = (unsigned char) (unix_ts_ms >> 8);
+	uuid->data[5] = (unsigned char) unix_ts_ms;
+
+	/*
+	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
+	 * SUBMS_MINIMAL_STEP_BITS)
+	 */
+	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+
+	/* Fill the increased clock precision to "rand_a" bits */
+	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
+	uuid->data[7] = (unsigned char) (increased_clock_precision);
+
+	/* fill everything after the increased clock precision with random bytes */
+	if (!pg_strong_random(&uuid->data[8], UUID_LEN - 8))
+		ereport(ERROR,
+				(errcode(ERRCODE_INTERNAL_ERROR),
+				 errmsg("could not generate random values")));
+
+#if SUBMS_MINIMAL_STEP_BITS == 10
+
+	/*
+	 * On systems that have only 10 bits of sub-ms precision,  2 least
+	 * significant are dependent on other time-specific bits, and they do not
+	 * contribute to uniqueness. To make these bit random we mix in two bits
+	 * from CSPRNG. SUBMS_MINIMAL_STEP is chosen so that we still guarantee
+	 * monotonicity despite altering these bits.
+	 */
+	uuid->data[7] = uuid->data[7] ^ (uuid->data[8] >> 6);
+#endif
+
+	/*
+	 * Set magic numbers for a "version 7" (pseudorandom) UUID and variant,
+	 * see https://www.rfc-editor.org/rfc/rfc9562#name-version-field
+	 */
+	uuid_set_version(uuid, 7);
+
+	return uuid;
+}
+
+/*
+ * Generate UUID version 7 with the current timestamp.
+ */
+Datum
+uuidv7(PG_FUNCTION_ARGS)
+{
+	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Similar to uuidv7() but with the timestamp adjusted by the given interval.
+ */
+Datum
+uuidv7_interval(PG_FUNCTION_ARGS)
+{
+	Interval   *shift = PG_GETARG_INTERVAL_P(0);
+	TimestampTz ts;
+	pg_uuid_t  *uuid;
+	int64		ns = get_real_time_ns_ascending();
+
+	/*
+	 * Shift the current timestamp by the given interval. To calsulate time
+	 * shift correctly, we convert the UNIX epoch to TimestampTz and use
+	 * timestamptz_pl_interval(). Since this calculation is done with
+	 * microsecond precision, we carry nanoseconds from original ns value to
+	 * shifted ns value.
+	 */
+
+	ts = (TimestampTz) (ns / NS_PER_US) -
+		(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+
+	/* Compute time shift */
+	ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
+												 TimestampTzGetDatum(ts),
+												 IntervalPGetDatum(shift)));
+
+	/*
+	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
+	 */
+	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
+		* NS_PER_US + ns % NS_PER_US;
+
+	/* Generate an UUIDv7 */
+	uuid = generate_uuidv7(ns);
+
+	PG_RETURN_UUID_P(uuid);
+}
+
+/*
+ * Start of a Gregorian epoch == date2j(1582,10,15)
+ * We cast it to 64-bit because it's used in overflow-prone computations
+ */
+#define GREGORIAN_EPOCH_JDATE  INT64CONST(2299161)
 
 /*
  * Extract timestamp from UUID.
  *
- * Returns null if not RFC 4122 variant or not a version that has a timestamp.
+ * Returns null if not RFC 9562 variant or not a version that has a timestamp.
  */
 Datum
 uuid_extract_timestamp(PG_FUNCTION_ARGS)
@@ -436,7 +649,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 	uint64		tms;
 	TimestampTz ts;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
@@ -455,7 +668,22 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 
 		/* convert 100-ns intervals to us, then adjust */
 		ts = (TimestampTz) (tms / 10) -
-			((uint64) POSTGRES_EPOCH_JDATE - UUIDV1_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+			((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
+		PG_RETURN_TIMESTAMPTZ(ts);
+	}
+
+	if (version == 7)
+	{
+		tms = (uuid->data[5])
+			+ (((uint64) uuid->data[4]) << 8)
+			+ (((uint64) uuid->data[3]) << 16)
+			+ (((uint64) uuid->data[2]) << 24)
+			+ (((uint64) uuid->data[1]) << 32)
+			+ (((uint64) uuid->data[0]) << 40);
+
+		/* convert ms to us, then adjust */
+		ts = (TimestampTz) (tms * NS_PER_US) -
+			(POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 		PG_RETURN_TIMESTAMPTZ(ts);
 	}
@@ -467,7 +695,7 @@ uuid_extract_timestamp(PG_FUNCTION_ARGS)
 /*
  * Extract UUID version.
  *
- * Returns null if not RFC 4122 variant.
+ * Returns null if not RFC 9562 variant.
  */
 Datum
 uuid_extract_version(PG_FUNCTION_ARGS)
@@ -475,7 +703,7 @@ uuid_extract_version(PG_FUNCTION_ARGS)
 	pg_uuid_t  *uuid = PG_GETARG_UUID_P(0);
 	uint16		version;
 
-	/* check if RFC 4122 variant */
+	/* check if RFC 9562 variant */
 	if ((uuid->data[8] & 0xc0) != 0x80)
 		PG_RETURN_NULL();
 
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ccf79761da5..0f22c217235 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9347,11 +9347,20 @@
 { oid => '3432', descr => 'generate random UUID',
   proname => 'gen_random_uuid', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9895', descr => 'generate UUID version 4',
+  proname => 'uuidv4', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'gen_random_uuid' },
+{ oid => '9896', descr => 'generate UUID version 7',
+  proname => 'uuidv7', provolatile => 'v',
+  prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
+{ oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted by specified interval',
+  proname => 'uuidv7', provolatile => 'v', proargnames => '{shift}',
+  prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
   prorettype => 'timestamptz', proargtypes => 'uuid',
   prosrc => 'uuid_extract_timestamp' },
-{ oid => '6343', descr => 'extract version from RFC 4122 UUID',
+{ oid => '6343', descr => 'extract version from RFC 9562 UUID',
   proname => 'uuid_extract_version', proleakproof => 't', prorettype => 'int2',
   proargtypes => 'uuid', prosrc => 'uuid_extract_version' },
 
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 8f4ef0d7a6a..798633ad51e 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 -- inserting invalid data tests
 -- too long
 INSERT INTO guid1(guid_field) VALUES('11111111-1111-1111-1111-111111111111F');
@@ -199,6 +204,35 @@ SELECT count(DISTINCT guid_field) FROM guid1;
      2
 (1 row)
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     2
+(1 row)
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+ count 
+-------
+     3
+(1 row)
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
+       array_agg        
+------------------------
+ {1,2,3,4,5,6,7,8,9,10}
+(1 row)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
@@ -219,8 +253,26 @@ SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
                      
 (1 row)
 
+SELECT uuid_extract_version(uuidv4());  -- 4
+ uuid_extract_version 
+----------------------
+                    4
+(1 row)
+
+SELECT uuid_extract_version(uuidv7());  -- 7
+ uuid_extract_version 
+----------------------
+                    7
+(1 row)
+
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
  ?column? 
 ----------
  t
@@ -239,4 +291,4 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 (1 row)
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 75ee966ded0..110188361d1 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -10,6 +10,11 @@ CREATE TABLE guid2
 	guid_field UUID,
 	text_field TEXT DEFAULT(now())
 );
+CREATE TABLE guid3
+(
+	id SERIAL,
+	guid_field UUID
+);
 
 -- inserting invalid data tests
 -- too long
@@ -97,6 +102,22 @@ INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 INSERT INTO guid1 (guid_field) VALUES (gen_random_uuid());
 SELECT count(DISTINCT guid_field) FROM guid1;
 
+-- test of uuidv4() alias
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+INSERT INTO guid1 (guid_field) VALUES (uuidv4());
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- generation test for v7
+TRUNCATE guid1;
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7());
+INSERT INTO guid1 (guid_field) VALUES (uuidv7(INTERVAL '1 day'));
+SELECT count(DISTINCT guid_field) FROM guid1;
+
+-- test sortability of v7
+INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
+SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
 -- extract functions
 
@@ -104,12 +125,15 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
 SELECT uuid_extract_version(gen_random_uuid());  -- 4
 SELECT uuid_extract_version('11111111-1111-1111-1111-111111111111');  -- null
+SELECT uuid_extract_version(uuidv4());  -- 4
+SELECT uuid_extract_version(uuidv7());  -- 7
 
 -- timestamp
-SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 4122bis test vector
+SELECT uuid_extract_timestamp('C232AB00-9414-11EC-B3C8-9F6BDECED846') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v1
+SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday, February 22, 2022 2:22:22.00 PM GMT+05:00';  -- RFC 9562 test vector for v7
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
 
 -- clean up
-DROP TABLE guid1, guid2 CASCADE;
+DROP TABLE guid1, guid2, guid3 CASCADE;
-- 
2.43.5

#210

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Masahiko Sawada (#209)

Re: UUID v7

On 10 Dec 2024, at 03:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches.

Both patches look good to me.
I'm not sure, but, perhaps, commit message of unleakproofing a function should mention that the problem was reported in Peter E. review.

Best regards, Andrey Borodin.

#211

sawada.mshk@gmail.com

about 1 year ago

In reply to: Andrey M. Borodin (#210)

Re: UUID v7

On Mon, Dec 9, 2024 at 7:42 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:

On 10 Dec 2024, at 03:34, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I've attached the updated patches.

Both patches look good to me.
I'm not sure, but, perhaps, commit message of unleakproofing a function should mention that the problem was reported in Peter E. review.

Pushed both patches.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#212

Daniel Verite

daniel@manitou-mail.org

about 1 year ago

In reply to: Andrey M. Borodin (#207)

Re: UUID v7

Andrey M. Borodin wrote:

I've addressed all items, except formatting a table...

Sorry for not following up sooner.

To illustrate my point upthread that was left unaddressed, let's say
I have a server with an incorrect date in the future.

A session generates an uuid

postgres=# select pg_backend_pid(), uuidv7();
pg_backend_pid | uuidv7
----------------+--------------------------------------
13545 | 019ad701-c798-7000-a0e4-7119e2c82446

Now somebody sets the clock backward to the correct date.

Then if that backend continues to generate uuids, here's
what it outputs:

postgres=# select pg_backend_pid(), uuidv7() from generate_series(1,10);
pg_backend_pid | uuidv7
----------------+--------------------------------------
13545 | 019ad701-c798-7001-8df7-d296dafd98fd
13545 | 019ad701-c798-7002-9995-bf103bbb56d7
13545 | 019ad701-c798-7003-88b3-5ea58c738ade
13545 | 019ad701-c798-7004-ba5e-e675fe103060
13545 | 019ad701-c798-7005-8608-59b9c852b4ce
13545 | 019ad701-c798-7006-832c-d06c15e2865a
13545 | 019ad701-c798-7007-8f45-360c0825c671
13545 | 019ad701-c798-7008-bb47-bcb7915503b2
13545 | 019ad701-c798-7009-9124-e6873b0265f6
13545 | 019ad701-c798-700a-8422-8d75c5ade9f7
(10 rows)

The timestamps are now just a sequence incrementing by 1
on each call, independently of the server's clock and
the actual time span between calls. It has become a counter
and will remain so until the backend terminates.

It does not have to be that way. In get_real_time_ns_ascending(),
it could switch immediately to the new time:

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 2e32592f57..8df194daea 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -505,8 +505,11 @@ get_real_time_ns_ascending()
	ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
 #endif

-	/* Guarantee the minimal step advancement of the timestamp */
-	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+	/*
+	 * Guarantee the minimal step advancement of the timestamp,
+	 * unless the clock has moved backward.
+	 */
+	if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns && previous_ns <= ns)
		ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
	previous_ns = ns;

Also PFA a prototype of making uuidv7() ordered across all backends via
keeping previous_ns in shared memory. IMO it's overcomplicating and RFC
does not require such guarantees

It does not have to be in core, but an extension might want to provide
a generator that guarantees monotonicity across backends.

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

#213

x4mmm@yandex-team.ru

about 1 year ago

In reply to: Daniel Verite (#212)

Re: UUID v7

Hi Daniel!

On 16 Dec 2024, at 19:08, Daniel Verite <daniel@manitou-mail.org> wrote:

The timestamps are now just a sequence incrementing by 1
on each call, independently of the server's clock and
the actual time span between calls. It has become a counter
and will remain so until the backend terminates.

This is exactly what RFC suggest us to do. It’s a feature, not a bug.

It does not have to be that way. In get_real_time_ns_ascending(),
it could switch immediately to the new time:

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 2e32592f57..8df194daea 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -505,8 +505,11 @@ get_real_time_ns_ascending()
ns = tmp.tv_sec * NS_PER_S + tmp.tv_nsec;
#endif

- /* Guarantee the minimal step advancement of the timestamp */
- if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns)
+ /*
+ * Guarantee the minimal step advancement of the timestamp,
+ * unless the clock has moved backward.
+ */
+ if (previous_ns + SUBMS_MINIMAL_STEP_NS >= ns && previous_ns <= ns)
ns = previous_ns + SUBMS_MINIMAL_STEP_NS;
previous_ns = ns;

We have that previous_ns to protect us from clocks moving backwards. And you suggest us to disable this protection.
To achieve this we would rather delete previous_ns at all. It was there not to guarantee minimal step, but to ensure clocks always move forward only.

Also PFA a prototype of making uuidv7() ordered across all backends via
keeping previous_ns in shared memory. IMO it's overcomplicating and RFC
does not require such guarantees

It does not have to be in core, but an extension might want to provide
a generator that guarantees monotonicity across backends.

AFAIK extension pg_uuidv7 does not have this protection right now. But Florian might add it in future.

Best regards, Andrey Borodin.

#214

x4mmm@yandex-team.ru

12 months ago

In reply to: Masahiko Sawada (#211)

Re: UUID v7

On 12 Dec 2024, at 23:08, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Pushed

Hi Masahiko!

I’ve found some inconsistency in handling of overflow. I’m not sure we should handle it, but anyway.

postgres=# select x,
uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(237,238) x;;
x | uuid_extract_timestamp | interval
-----+-----------------------------+-----------
237 | 2262-01-30 13:43:23.737+05 | 237 years
238 | 10598-02-10 19:41:13.736+05 | 238 years
(2 rows)

The thing is per RFC we represent time as number of nanoseconds since UNIX epoch. And we use int64, which will overflow in year 2262. I sincerely wish us to see this great year.
We can have a couple more centuries if we resort to unsigned int 64.

But it would be great to make our code work until

postgres=# select uuid_extract_timestamp('FFFFFFFF-FFFF-7FFF-bFFF-FFFFFFFFFFFF');
uuid_extract_timestamp
-----------------------------
10889-08-02 10:31:50.655+05
(1 row)

And using uint64 won’t help us.

Can we use int128 in code? Or, perhaps, carry this extra 10 bits in the extra argument of generate_uuidv7()? Or, perhaps, leave things as they stand now?

Thanks!

Best regards, Andrey Borodin.

#215

sawada.mshk@gmail.com

12 months ago

In reply to: Andrey Borodin (#214)

Re: UUID v7

On Thu, Jan 30, 2025 at 12:59 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 12 Dec 2024, at 23:08, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Pushed

Hi Masahiko!

I’ve found some inconsistency in handling of overflow. I’m not sure we should handle it, but anyway.

Thank you for the report!

postgres=# select x,
uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(237,238) x;;
x | uuid_extract_timestamp | interval
-----+-----------------------------+-----------
237 | 2262-01-30 13:43:23.737+05 | 237 years
238 | 10598-02-10 19:41:13.736+05 | 238 years
(2 rows)

The thing is per RFC we represent time as number of nanoseconds since UNIX epoch. And we use int64, which will overflow in year 2262. I sincerely wish us to see this great year.
We can have a couple more centuries if we resort to unsigned int 64.

But it would be great to make our code work until

postgres=# select uuid_extract_timestamp('FFFFFFFF-FFFF-7FFF-bFFF-FFFFFFFFFFFF');
uuid_extract_timestamp
-----------------------------
10889-08-02 10:31:50.655+05
(1 row)

And using uint64 won’t help us.

I don't think using uint64 instead of int64 for nanoseconds doesn't
resolve the problem. We will not be able to shift the timestamp for a
date before 1970/1/1.

Or, perhaps, carry this extra 10 bits in the extra argument of generate_uuidv7()?

I like this idea. Would you like to write a patch, or shall I?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#216

x4mmm@yandex-team.ru

12 months ago

In reply to: Masahiko Sawada (#215)

1 attachment(s)

Re: UUID v7

On 31 Jan 2025, at 00:54, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I like this idea. Would you like to write a patch, or shall I?

I propose to separate milliseconds from nanoseconds. Please find attached implementation of this.
With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

Best regards, Andrey Borodin.

Attachments:

0001-UUDv7-fix-offset-computations-in-dates-after-2262.patchapplication/octet-stream; name=0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch; x-unix-mode=0644Download

From 8b52d0942f657c35e238bd95bd2d95aa5c4a5b2e Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 31 Jan 2025 12:03:16 +0500
Subject: [PATCH] UUDv7: fix offset computations in dates after 2262

We used nanosecond representation of offsetted time values which
cannot be stored in 64-bit integer for dates significantly after
beginning of UNIX epoch. To prevent overflow we separate millisecond
part from nanoseconds, thus allowing us to store both parts in 64-bit
integers.
---
 src/backend/utils/adt/uuid.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f8402ef925..3349c2674c8 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -68,7 +68,7 @@ static int	uuid_fast_cmp(Datum x, Datum y, SortSupport ssup);
 static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
-static inline int64 get_real_time_ns_ascending();
+static inline uint64 get_real_time_ns_ascending();
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -476,11 +476,11 @@ gen_random_uuid(PG_FUNCTION_ARGS)
  * The returned timestamp is ensured to be at least SUBMS_MINIMAL_STEP greater
  * than the previous returned timestamp (on this backend).
  */
-static inline int64
+static inline uint64
 get_real_time_ns_ascending()
 {
-	static int64 previous_ns = 0;
-	int64		ns;
+	static uint64 previous_ns = 0;
+	uint64		ns;
 
 	/* Get the current real timestamp */
 
@@ -527,13 +527,13 @@ get_real_time_ns_ascending()
  * used for time-dependent bits of UUID.
  */
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 ms, uint64 ns_in_ms)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
 	int64		unix_ts_ms;
 	int32		increased_clock_precision;
 
-	unix_ts_ms = ns / NS_PER_MS;
+	unix_ts_ms = ms;
 
 	/* Fill in time part */
 	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
@@ -547,7 +547,7 @@ generate_uuidv7(int64 ns)
 	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
 	 * SUBMS_MINIMAL_STEP_BITS)
 	 */
-	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+	increased_clock_precision = ((ns_in_ms) * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
@@ -586,7 +586,8 @@ generate_uuidv7(int64 ns)
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+	uint64		ns = get_real_time_ns_ascending();
+	pg_uuid_t  *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
 
 	PG_RETURN_UUID_P(uuid);
 }
@@ -600,7 +601,9 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	Interval   *shift = PG_GETARG_INTERVAL_P(0);
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
-	int64		ns = get_real_time_ns_ascending();
+	/* 64 bits is enough for real time, but not for a time range of UUID */
+	uint64		ns = get_real_time_ns_ascending();
+	uint64		us;
 
 	/*
 	 * Shift the current timestamp by the given interval. To calculate time
@@ -621,11 +624,10 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	/*
 	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
 	 */
-	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* NS_PER_US + ns % NS_PER_US;
+	us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC);
 
 	/* Generate an UUIDv7 */
-	uuid = generate_uuidv7(ns);
+	uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % NS_PER_US);
 
 	PG_RETURN_UUID_P(uuid);
 }
-- 
2.42.0

#217

sawada.mshk@gmail.com

12 months ago

In reply to: Andrey Borodin (#216)

Re: UUID v7

On Thu, Jan 30, 2025 at 11:09 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 31 Jan 2025, at 00:54, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I like this idea. Would you like to write a patch, or shall I?

I propose to separate milliseconds from nanoseconds. Please find attached implementation of this.
With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

Thank you for the patch! I agree with the basic direction of this fix.
Here are some review comments:

---
-static inline int64 get_real_time_ns_ascending();
+static inline uint64 get_real_time_ns_ascending();

IIUC we don't need to replace int64 with uint64 if we have two
separate parameters for generate_uuidv7(). It seems to be conventional
to use a signed int for timestamps.

---
Need to update the function comment of generate_uuidv7() as we changed
the function arguments.

---
-       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-               * NS_PER_US + ns % NS_PER_US;
+       us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);

        /* Generate an UUIDv7 */
-       uuid = generate_uuidv7(ns);
+       uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % NS_PER_US);

I think we can have an inline function or a marco (or use TMODULO()?)
to split nanoseconds into milliseconds and sub-milliseconds so that
uuidv7() and uuidv7_interval() can pass them to generate_uuidv7().

The comments in uuidv7_interval() also need to be updated accordingly.

---
I think we need to consider how we can handle the timestamp shifting.
UUIDv7 contains 48 bits Unix timestamp at milliseconds precision,
which can represent timestamps approximately between 2493 BC and 6432
AC. If users specify an interval to shift the timestamp beyond the
range, 48-bits timestamp would be wrapped around and they would not be
able to get an expected result. Do we need to raise an error in that
case?

---
Another problem I found in uuid_extract_timestamp() is that it cannot
correctly extract a timestamp before 1970/1/1 stored in a UUIDv7
value:

postgres(1:1795331)=# select year, uuid_extract_timestamp(uuidv7((year
|| 'year ago')::interval)) from generate_series(54, 56) year;
year | uuid_extract_timestamp
------+-----------------------------
54 | 1971-01-31 10:46:25.111-08
55 | 1970-01-31 10:46:25.111-08
56 | 10888-09-01 17:18:15.768-07
(3 rows)

The problem is that we correctly store a negative timestamp value in a
UUIDv7 value but uuid_extract_timestamp() unconditionally treats it as
a positive timestamp value. I think this is a separate bug we need to
fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#218

x4mmm@yandex-team.ru

11 months ago

In reply to: Masahiko Sawada (#217)

1 attachment(s)

Re: UUID v7

On 31 Jan 2025, at 23:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the patch! I agree with the basic direction of this fix.
Here are some review comments:
---
-static inline int64 get_real_time_ns_ascending();
+static inline uint64 get_real_time_ns_ascending();
IIUC we don't need to replace int64 with uint64 if we have two
separate parameters for generate_uuidv7(). It seems to be conventional
to use a signed int for timestamps.

OK, done.

---
Need to update the function comment of generate_uuidv7() as we changed
the function arguments.

Done.

---
-       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-               * NS_PER_US + ns % NS_PER_US;
+       us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);
/* Generate an UUIDv7 */
-       uuid = generate_uuidv7(ns);
+       uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % NS_PER_US);
I think we can have an inline function or a marco (or use TMODULO()?)
to split nanoseconds into milliseconds and sub-milliseconds so that
uuidv7() and uuidv7_interval() can pass them to generate_uuidv7().

I doubt that such macro will make core more readable. I've replaced 1000 with macros.

The comments in uuidv7_interval() also need to be updated accordingly.

Done.

---
I think we need to consider how we can handle the timestamp shifting.
UUIDv7 contains 48 bits Unix timestamp at milliseconds precision,
which can represent timestamps approximately between 2493 BC and 6432
AC. If users specify an interval to shift the timestamp beyond the
range, 48-bits timestamp would be wrapped around and they would not be
able to get an expected result. Do we need to raise an error in that
case?

---
Another problem I found in uuid_extract_timestamp() is that it cannot
correctly extract a timestamp before 1970/1/1 stored in a UUIDv7
value:

postgres(1:1795331)=# select year, uuid_extract_timestamp(uuidv7((year
|| 'year ago')::interval)) from generate_series(54, 56) year;
year | uuid_extract_timestamp
------+-----------------------------
54 | 1971-01-31 10:46:25.111-08
55 | 1970-01-31 10:46:25.111-08
56 | 10888-09-01 17:18:15.768-07
(3 rows)

The problem is that we correctly store a negative timestamp value in a
UUIDv7 value but uuid_extract_timestamp() unconditionally treats it as
a positive timestamp value. I think this is a separate bug we need to
fix.

RFC says unix_ts_ms is unsigned. So, luckily, no BC dates. I bet Pharaohs could not measure nanoseconds.
I think it's totally fine to wrap UUID values around year 10598 without an error.

I was thinking about incorporating test like this.

With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

or maybe something simple like

with u as (select uuidv7() id) select uuid_extract_timestamp(uuidv7('9999-09-09 12:34:56.789+05' - uuid_extract_timestamp(u.id))) from u;

But it would still be flaky, second call to uuidv7() can overflow a millisecond.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v2-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patchapplication/octet-stream; name=v2-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch; x-unix-mode=0644Download

From 6890f52395a924f6af33eb86c2c3addb204ef483 Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 31 Jan 2025 12:03:16 +0500
Subject: [PATCH v2] UUDv7: fix offset computations in dates after 2262

We used nanosecond representation of offsetted time values which
cannot be stored in 64-bit integer for dates significantly after
beginning of UNIX epoch. To prevent overflow we separate millisecond
part from nanoseconds, thus allowing us to store both parts in 64-bit
integers.
---
 src/backend/utils/adt/uuid.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f8402ef92..f579bb3a64 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -28,6 +28,7 @@
 /* helper macros */
 #define NS_PER_S	INT64CONST(1000000000)
 #define NS_PER_MS	INT64CONST(1000000)
+#define US_PER_MS	INT64CONST(1000)
 #define NS_PER_US	INT64CONST(1000)
 
 /*
@@ -69,6 +70,7 @@ static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
 static inline int64 get_real_time_ns_ascending();
+static pg_uuid_t *generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms);
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -523,18 +525,16 @@ get_real_time_ns_ascending()
  * described in the RFC. This method utilizes 12 bits from the "rand_a" bits
  * to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
  *
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * ns_in_ms is a number of nanoseconds within millisecond. These values are
  * used for time-dependent bits of UUID.
  */
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
-	int64		unix_ts_ms;
 	int32		increased_clock_precision;
 
-	unix_ts_ms = ns / NS_PER_MS;
-
 	/* Fill in time part */
 	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
 	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
@@ -547,7 +547,7 @@ generate_uuidv7(int64 ns)
 	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
 	 * SUBMS_MINIMAL_STEP_BITS)
 	 */
-	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+	increased_clock_precision = ((ns_in_ms) * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
@@ -586,7 +586,8 @@ generate_uuidv7(int64 ns)
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+	int64		ns = get_real_time_ns_ascending();
+	pg_uuid_t  *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
 
 	PG_RETURN_UUID_P(uuid);
 }
@@ -600,7 +601,9 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	Interval   *shift = PG_GETARG_INTERVAL_P(0);
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
+	/* 64 bits is enough for real time, but not for a time range of UUID */
 	int64		ns = get_real_time_ns_ascending();
+	int64		us;
 
 	/*
 	 * Shift the current timestamp by the given interval. To calculate time
@@ -621,11 +624,10 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	/*
 	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
 	 */
-	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* NS_PER_US + ns % NS_PER_US;
+	us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC);
 
 	/* Generate an UUIDv7 */
-	uuid = generate_uuidv7(ns);
+	uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) * NS_PER_US + ns % NS_PER_US);
 
 	PG_RETURN_UUID_P(uuid);
 }
-- 
2.39.5 (Apple Git-154)

#219

sergeyprokhorenko@yahoo.com.au

11 months ago

In reply to: Andrey Borodin (#218)

2 attachment(s)

Re: UUID v7

Dearcolleagues,

I wouldlike to present for discussion my attached new draft documentation on UUIDfunctions (Section 9.14. UUID Functions), which replaces the previouslyproposed draft at https://www.postgresql.org/docs/devel/functions-uuid.html.I have preserved and significantly supplemented the text that was there.

I have thefollowing goals:

1. Statethat from now on, the function uuidv7(), rather than autoincrement, is thedefault choice for generating primary keys

2. Describethe advantages of uuidv7() over autoincrement and uuidv4()

3. Refutethe often-cited imaginary disadvantages of UUIDv7 compared to autoincrement,such as:

- Lower performance (see the refutation inthe article "UUID Benchmark War" https://ardentperf.com/2024/02/03/uuid-benchmark-war/)

- Disclosure of date and time of recordcreation in the table (in reality, the timestamp offset parameter distorts thisinformation)

4. Confirm thefault tolerance of the uuidv7() function in all possible critical situations,namely:

- System clock failure

- Receiving an invalid value of the offsetargument, which would result in a timestamp overflow or a negative timestamp

Regards,

SergeyProkhorenko

sergeyprokhorenko@yahoo.com.au

#220

sawada.mshk@gmail.com

11 months ago

In reply to: Andrey Borodin (#218)

Re: UUID v7

On Sun, Feb 2, 2025 at 2:15 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 31 Jan 2025, at 23:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Thank you for the patch! I agree with the basic direction of this fix.
Here are some review comments:
---
-static inline int64 get_real_time_ns_ascending();
+static inline uint64 get_real_time_ns_ascending();
IIUC we don't need to replace int64 with uint64 if we have two
separate parameters for generate_uuidv7(). It seems to be conventional
to use a signed int for timestamps.
OK, done.

---
Need to update the function comment of generate_uuidv7() as we changed
the function arguments.

Done.
---
-       ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-               * NS_PER_US + ns % NS_PER_US;
+       us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);
/* Generate an UUIDv7 */
-       uuid = generate_uuidv7(ns);
+       uuid = generate_uuidv7(us / 1000, (us % 1000) * 1000 + ns % NS_PER_US);
I think we can have an inline function or a marco (or use TMODULO()?)
to split nanoseconds into milliseconds and sub-milliseconds so that
uuidv7() and uuidv7_interval() can pass them to generate_uuidv7().
I doubt that such macro will make core more readable. I've replaced 1000 with macros.

The comments in uuidv7_interval() also need to be updated accordingly.

Done.

---
I think we need to consider how we can handle the timestamp shifting.
UUIDv7 contains 48 bits Unix timestamp at milliseconds precision,
which can represent timestamps approximately between 2493 BC and 6432
AC. If users specify an interval to shift the timestamp beyond the
range, 48-bits timestamp would be wrapped around and they would not be
able to get an expected result. Do we need to raise an error in that
case?

---
Another problem I found in uuid_extract_timestamp() is that it cannot
correctly extract a timestamp before 1970/1/1 stored in a UUIDv7
value:

postgres(1:1795331)=# select year, uuid_extract_timestamp(uuidv7((year
|| 'year ago')::interval)) from generate_series(54, 56) year;
year | uuid_extract_timestamp
------+-----------------------------
54 | 1971-01-31 10:46:25.111-08
55 | 1970-01-31 10:46:25.111-08
56 | 10888-09-01 17:18:15.768-07
(3 rows)

The problem is that we correctly store a negative timestamp value in a
UUIDv7 value but uuid_extract_timestamp() unconditionally treats it as
a positive timestamp value. I think this is a separate bug we need to
fix.

RFC says unix_ts_ms is unsigned. So, luckily, no BC dates.

Good to know.

I think it's totally fine to wrap UUID values around year 10598 without an error.

Okay.

I was thinking about incorporating test like this.

With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

or maybe something simple like

with u as (select uuidv7() id) select uuid_extract_timestamp(uuidv7('9999-09-09 12:34:56.789+05' - uuid_extract_timestamp(u.id))) from u;

But it would still be flaky, second call to uuidv7() can overflow a millisecond.

Something like following queries might be workable for example?

create table test (c serial, d uuid, t timestamptz generated always as
(uuid_extract_timestamp(d)) stored);
insert into test (d) select uuidv7((n || 'years')::interval) from
generate_series(1, 2000) n;
select count(*) from (select t - lag(t) over (order by c) as diff from
test) where diff > '10 year' ;

Here are some review comments:

#define NS_PER_S INT64CONST(1000000000)
#define NS_PER_MS INT64CONST(1000000)
+#define US_PER_MS INT64CONST(1000)
#define NS_PER_US INT64CONST(1000)

I think it's clear if we put US_PER_MS below NS_PER_US.

---
  *
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * ns_in_ms is a number of nanoseconds within millisecond. These values are
  * used for time-dependent bits of UUID.

I think we can mention that the RFC describes that stored unix
timestamp as an unsigned integer.

---
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)

How about renaming ns_in_ms with sub_ms?

---
+        /* 64 bits is enough for real time, but not for a time range of UUID */

I could not understand the point of this comment. It seems to say that
64-bits is not enough for a time range of UUID, but doesn't the time
range of UUIDv7 use only 48 bits? It seems to need more comments.

---
-        ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-                * NS_PER_US + ns % NS_PER_US;
+        us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);

         /* Generate an UUIDv7 */
-        uuid = generate_uuidv7(ns);
+        uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
NS_PER_US + ns % NS_PER_US);

Need to update comments in uuidv7_internval() such as:

/*
* Shift the current timestamp by the given interval. To calculate time
* shift correctly, we convert the UNIX epoch to TimestampTz and use
* timestamptz_pl_interval(). Since this calculation is done with
* microsecond precision, we carry nanoseconds from original ns value to
* shifted ns value.
*/

and

/*
* Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
*/

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#221

sawada.mshk@gmail.com

11 months ago

In reply to: Sergey Prokhorenko (#219)

Re: UUID v7

On Sun, Feb 2, 2025 at 11:41 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Dear colleagues,

I would like to present for discussion my attached new draft documentation on UUID functions (Section 9.14. UUID Functions), which replaces the previously proposed draft at https://www.postgresql.org/docs/devel/functions-uuid.html. I have preserved and significantly supplemented the text that was there.

I have the following goals:

1. State that from now on, the function uuidv7(), rather than autoincrement, is the default choice for generating primary keys

2. Describe the advantages of uuidv7() over autoincrement and uuidv4()

3. Refute the often-cited imaginary disadvantages of UUIDv7 compared to autoincrement, such as:

- Lower performance (see the refutation in the article "UUID Benchmark War" https://ardentperf.com/2024/02/03/uuid-benchmark-war/)

- Disclosure of date and time of record creation in the table (in reality, the timestamp offset parameter distorts this information)

4. Confirm the fault tolerance of the uuidv7() function in all possible critical situations, namely:

- System clock failure

- Receiving an invalid value of the offset argument, which would result in a timestamp overflow or a negative timestamp

Thank you for the proposal. Could you share the proposed document as a
.diff or .patch file? That would be easier to review the updates.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#222

sergeyprokhorenko@yahoo.com.au

11 months ago

In reply to: Masahiko Sawada (#221)

1 attachment(s)

Re: UUID v7

On Wednesday 5 February 2025 at 01:07:02 am GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Feb 2, 2025 at 11:41 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Dear colleagues,

I would like to present for discussion my attached new draft documentation on UUID functions (Section 9.14. UUID Functions), which replaces the previously proposed draft at https://www.postgresql.org/docs/devel/functions-uuid.html. I have preserved and significantly supplemented the text that was there.

I have the following goals:

1. State that from now on, the function uuidv7(), rather than autoincrement, is the default choice for generating primary keys

2. Describe the advantages of uuidv7() over autoincrement and uuidv4()

3. Refute the often-cited imaginary disadvantages of UUIDv7 compared to autoincrement, such as:

- Lower performance (see the refutation in the article "UUID Benchmark War" https://ardentperf.com/2024/02/03/uuid-benchmark-war/)

- Disclosure of date and time of record creation in the table (in reality, the timestamp offset parameter distorts this information)

4. Confirm the fault tolerance of the uuidv7() function in all possible critical situations, namely:

- System clock failure

- Receiving an invalid value of the offset argument, which would result in a timestamp overflow or a negative timestamp

Thank you for the proposal. Could you share the proposed document as a
.diff or .patch file? That would be easier to review the updates.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
_________________________________________________________

Dear colleagues,

I have attached a file changed.diff containing my proposed changes to the documentation on UUID functions (Section 9.14. UUID Functions). The text of the changes in this file is updated, and it is slightly different from the previously submitted text.

Regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

Attachments:

changed.diffapplication/octet-streamDownload

��diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml

index 7efc819..f7d3f6e 100644

--- a/doc/src/sgml/func.sgml

+++ b/doc/src/sgml/func.sgml

@@ -14323,31 +14323,95 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

   </indexterm>

 

   <para>

-   <productname>PostgreSQL</productname> includes several functions to generate a UUID.

+	<productname>PostgreSQL</productname> provides functions for generating 

+	Universally Unique Identifiers (UUIDs) as defined by 

+	<ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>. 

+	This section details the UUID functions included in the core distribution.

+  </para>

+

+  <para>

+   The <xref linkend="uuid-ossp"/> module provides additional functions that

+   implement other standard algorithms for generating UUIDs.

+  </para>

+

+  <para>

+   <productname>PostgreSQL</productname> also provides the usual comparison 

+   operators shown in <xref linkend="functions-comparison-op-table"/> for

+   UUIDs.

+  </para>

+

+  <sect2 id="functions-generating-uuidv7">

+   <title>Generating Version 7 UUIDs</title>

+

+<synopsis>

+<function>uuidv7</function> (<optional> <parameter>offset</parameter> 

+<type>interval</type> </optional>) <returnvalue>uuid</returnvalue>

+</synopsis>

+

+   <para>

+	The <function>uuidv7()</function> function is designed as the preferred 

+	method for generating primary keys, offering an alternative to integer data 

+	types backed by sequence generators.

+   </para>

+   

+   <para>

+	The function returns a version 7 UUID, which includes a UNIX timestamp with 

+	millisecond precision, a 12-bit sub-millisecond timestamp, and a random 

+	component. This function can accept optional <parameter>offset</parameter> 

+	parameter of type <type>interval</type> which is added to the internal 

+	timestamp.

+   </para>

+

+   <para>

+	Monotonically increasing identifiers are generated even if the system clock 

+	jumps backward, if access to the system clock is unavailable, or if UUIDs 

+	are generated at a very high frequency, due to the internal timestamp 

+	functioning as a counter to maintain order.

+   </para>

+

+   <para>

+	If the <parameter>offset</parameter> parameter results in a timestamp 

+	overflow or a negative timestamp, an adjusted timestamp value is 

+	automatically used. The timestamp behaves like a ring buffer: when the 

+	maximum value is exceeded, it wraps around to the minimum value. Similarly, 

+	if the absolute value of the negative <parameter>offset</parameter> exceeds 

+	the time elapsed since 00:00:00 UTC on 1 January, 1970, the timestamp wraps 

+	around to the maximum value.

+   </para>

+

+  </sect2>

+     

+  <sect2 id="functions-generating-uuidv4">

+   <title>Generating Version 4 UUIDs</title>

+

 <synopsis>

 <function>gen_random_uuid</function> () <returnvalue>uuid</returnvalue>

 <function>uuidv4</function> () <returnvalue>uuid</returnvalue>

 </synopsis>

+

+   <para>

    These functions return a version 4 (random) UUID.

-<synopsis>

-<function>uuidv7</function> (<optional> <parameter>shift</parameter> <type>interval</type> </optional>) <returnvalue>uuid</returnvalue>

-</synopsis>

-    This function returns a version 7 UUID (UNIX timestamp with millisecond

-    precision + sub-millisecond timestamp + random). This function can accept

-    optional <parameter>shift</parameter> parameter of type <type>interval</type>

-    which shift internal timestamp by the given interval.

-  </para>

+   </para>

+   

+   <para>

+	They are not recommended for generation of primary keys.

+   </para>

+

+  </sect2>

+    

+  <sect2 id="functions-extracting-data-from-uuid">

+   <title>Extracting Data from UUIDs</title>

 

   <para>

-   The <xref linkend="uuid-ossp"/> module provides additional functions that

-   implement other standard algorithms for generating UUIDs.

+There are also two functions to extract data from UUIDs:

   </para>

 

-  <para>

-   There are also functions to extract data from UUIDs:

 <synopsis>

-<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamp with time zone</returnvalue>

+<function>uuid_extract_timestamp</function> (uuid) <returnvalue>timestamp with 

+time zone</returnvalue>

 </synopsis>

+

+  <para>

    This function extracts a <type>timestamp with time zone</type> from UUID

    version 1 and 7.  For other versions, this function returns null.  Note that

    the extracted timestamp is not necessarily exactly equal to the time the

@@ -14355,22 +14419,195 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

    UUID.

   </para>

 

-  <para>

 <synopsis>

-<function>uuid_extract_version</function> (uuid) <returnvalue>smallint</returnvalue>

+<function>uuid_extract_version</function> (uuid) 

+<returnvalue>smallint</returnvalue>

 </synopsis>

+

+  <para>

    This function extracts the version from a UUID of the variant described by

-   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>.  For

-   other variants, this function returns null.  For example, for a UUID

+   <ulink url="https://datatracker.ietf.org/doc/html/rfc9562">RFC 9562</ulink>. 

+   For other variants, this function returns null. For example, for a UUID

    generated by <function>gen_random_uuid</function>, this function will

    return 4.

   </para>

 

-  <para>

-   <productname>PostgreSQL</productname> also provides the usual comparison

-   operators shown in <xref linkend="functions-comparison-op-table"/> for

-   UUIDs.

-  </para>

+  </sect2>  

+  

+  <sect2 id="functions-uuid-type-choice">

+   <title>Deciding Whether and Which UUID to Use</title>

+

+   <para>

+	UUIDs serve as unique identifiers. Alternatives include integer data types 

+	backed by a sequence generator. When choosing between them for primary keys, 

+	consider the following information.

+   </para>

+

+    <informaltable>

+     <tgroup cols="5">

+      <thead>

+       <row>

+        <entry>No.</entry>

+        <entry>Disadvantages or limitations of identifier types</entry>

+        <entry>uuidv4()</entry>

+        <entry>uuidv7()</entry>

+		<entry>uuidv7(<parameter>offset</parameter>)</entry>

+		<entry><type>identity</type> or <type>bigserial</type></entry>

+       </row>

+      </thead>

+

+      <tbody>

+

+       <row>

+        <entry>1.</entry>

+        <entry>Data merging necessitates key replacement; furthermore, 

+		maintaining key relationships requires disk space</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>2.</entry>

+        <entry>Exporting data to external information systems requires key 

+		replacement</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>3.</entry>

+        <entry>Synchronization is necessary for distributed generation across 

+		multiple processes (microservices)</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>	

+       </row>

+

+       <row>

+        <entry>4.</entry>

+        <entry>Lock contention arises from concurrent writes to the same table 

+		by multiple processes (microservices)</entry>

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+        <entry>NO (with several offsets)</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>5.</entry>

+        <entry>Identifier locality absence reduces performance and increases 

+		index size</entry>

+        <entry>YES</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+       </row>

+

+       <row>

+        <entry>6.</entry>

+        <entry>Identifier locality absence results in inefficient 

+		partitioning</entry>

+        <entry>YES</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+       </row>

+

+       <row>

+        <entry>7.</entry>

+        <entry>The record creation order is unknown for logging systems, 

+		time-series databases, debugging, and auditing</entry>

+        <entry>YES</entry>		

+        <entry>NO</entry>		

+        <entry>NO (with nondecreasing offsets)</entry>		

+        <entry>NO</entry>		

+       </row>

+

+       <row>

+        <entry>8.</entry>

+        <entry>The number of records within the table is disclosed</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>9.</entry>

+        <entry>The record creation date and time are disclosed</entry>

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+       </row>

+

+       <row>

+        <entry>10.</entry>

+        <entry>A longer identifier is utilized</entry>

+        <entry>YES</entry>		

+        <entry>YES</entry>		

+        <entry>YES</entry>		

+        <entry>NO</entry>		

+       </row>

+

+       <row>

+        <entry>11.</entry>

+        <entry>Identifier-based full-text search is ambiguous</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>12.</entry>

+        <entry>Erroneous accidental key matches occur</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+       <row>

+        <entry>13.</entry>

+        <entry>Valid keys can be generated maliciously</entry>

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>NO</entry>		

+        <entry>YES</entry>		

+       </row>

+

+      </tbody>

+     </tgroup>

+    </informaltable>

+

+   <para>

+	When generating identifiers simultaneously in several client sessions, the 

+	<function>uuidv7()</function> function does not guarantee monotonicity, 

+	although monotonicity is usually preserved in such a situation.

+   </para>

+

+   <para>

+	In real-world scenarios, the performance of keys generated by 

+	<function>uuidv7()</function> function is nearly equivalent to that of 

+	<type>identity</type> or <type>bigserial</type> type, significantly 

+	outperforming <function>uuidv4()</function>.

+   </para>

+

+   <para>

+	It is advisable to assess the performance of keys generated by different 

+	methods using the <xref linkend="pgbench"/> benchmarking utility, along 

+	with custom scenarios and script files tailored to your specific 

+	requirements.

+   </para>

+

+  </sect2>  

+  

  </sect1>

 

  <sect1 id="functions-xml">

#223

x4mmm@yandex-team.ru

11 months ago

In reply to: Masahiko Sawada (#220)

1 attachment(s)

Re: UUID v7

I've took into account note from Sergey that "offset" is better name for uuidv7() argument than "shift".

On 5 Feb 2025, at 03:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I was thinking about incorporating test like this.

With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

Something like following queries might be workable for example?

create table test (c serial, d uuid, t timestamptz generated always as
(uuid_extract_timestamp(d)) stored);
insert into test (d) select uuidv7((n || 'years')::interval) from
generate_series(1, 2000) n;
select count(*) from (select t - lag(t) over (order by c) as diff from
test) where diff > '10 year' ;

Yeah, makes sense. I reduced tolerance to 366+1 day. Must be stable if we've done all the time offset business right.

Here are some review comments:

#define NS_PER_S INT64CONST(1000000000)
#define NS_PER_MS INT64CONST(1000000)
+#define US_PER_MS INT64CONST(1000)
#define NS_PER_US INT64CONST(1000)

I think it's clear if we put US_PER_MS below NS_PER_US.

OK.

---
*
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * ns_in_ms is a number of nanoseconds within millisecond. These values are
* used for time-dependent bits of UUID.

I think we can mention that the RFC describes that stored unix
timestamp as an unsigned integer.

Done. Feel free to adjust my wordings, I've no sense of idiomatic English.

---
static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)

How about renaming ns_in_ms with sub_ms?

OK.

---
+        /* 64 bits is enough for real time, but not for a time range of UUID */
I could not understand the point of this comment. It seems to say that
64-bits is not enough for a time range of UUID, but doesn't the time
range of UUIDv7 use only 48 bits? It seems to need more comments.

I've tried to say that acquiring current time as an int64 ns since UNIX epoch is still viable for the code (until year 2262).

---
-        ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-                * NS_PER_US + ns % NS_PER_US;
+        us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);
/* Generate an UUIDv7 */
-        uuid = generate_uuidv7(ns);
+        uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
NS_PER_US + ns % NS_PER_US);
Need to update comments in uuidv7_internval() such as:

/*
* Shift the current timestamp by the given interval. To calculate time
* shift correctly, we convert the UNIX epoch to TimestampTz and use
* timestamptz_pl_interval(). Since this calculation is done with
* microsecond precision, we carry nanoseconds from original ns value to
* shifted ns value.
*/

and

/*
* Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
*/

I've tried. I'm not very satisfied with comments, but could not come up with easier description.

Thanks!

Best regards, Andrey Borodin.

Attachments:

v3-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patchapplication/octet-stream; name=v3-0001-UUDv7-fix-offset-computations-in-dates-after-2262.patch; x-unix-mode=0644Download

From cd78c0872ec9791e100e4569a980f5988ec5a13d Mon Sep 17 00:00:00 2001
From: Andrey Borodin <amborodin@acm.org>
Date: Fri, 31 Jan 2025 12:03:16 +0500
Subject: [PATCH v3] UUDv7: fix offset computations in dates after 2262

We used nanosecond representation of offsetted time values which
cannot be stored in 64-bit integer for dates significantly after
beginning of UNIX epoch. To prevent overflow we separate millisecond
part from nanoseconds, thus allowing us to store both parts in 64-bit
integers.
---
 doc/src/sgml/func.sgml             |  6 ++---
 src/backend/utils/adt/uuid.c       | 37 ++++++++++++++++--------------
 src/include/catalog/pg_proc.dat    |  2 +-
 src/test/regress/expected/uuid.out | 15 +++++++++++-
 src/test/regress/sql/uuid.sql      | 12 +++++++++-
 5 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7efc81936a..8cf3e374b8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14330,12 +14330,12 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 </synopsis>
    These functions return a version 4 (random) UUID.
 <synopsis>
-<function>uuidv7</function> (<optional> <parameter>shift</parameter> <type>interval</type> </optional>) <returnvalue>uuid</returnvalue>
+<function>uuidv7</function> (<optional> <parameter>offset</parameter> <type>interval</type> </optional>) <returnvalue>uuid</returnvalue>
 </synopsis>
     This function returns a version 7 UUID (UNIX timestamp with millisecond
     precision + sub-millisecond timestamp + random). This function can accept
-    optional <parameter>shift</parameter> parameter of type <type>interval</type>
-    which shift internal timestamp by the given interval.
+    optional <parameter>offset</parameter> parameter of type <type>interval</type>
+    which offset internal timestamp by the given interval.
   </para>
 
   <para>
diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f8402ef92..f368081cc4 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -29,6 +29,7 @@
 #define NS_PER_S	INT64CONST(1000000000)
 #define NS_PER_MS	INT64CONST(1000000)
 #define NS_PER_US	INT64CONST(1000)
+#define US_PER_MS	INT64CONST(1000)
 
 /*
  * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
@@ -69,6 +70,7 @@ static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
 static inline int64 get_real_time_ns_ascending();
+static pg_uuid_t *generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms);
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -523,18 +525,18 @@ get_real_time_ns_ascending()
  * described in the RFC. This method utilizes 12 bits from the "rand_a" bits
  * to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
  *
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * sub_ms is a number of nanoseconds within millisecond. These values are
  * used for time-dependent bits of UUID.
+ *
+ * NB: all numbers here are unsigned, unix_ts_ms cannot be negative per RFC.
  */
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
-	int64		unix_ts_ms;
 	int32		increased_clock_precision;
 
-	unix_ts_ms = ns / NS_PER_MS;
-
 	/* Fill in time part */
 	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
 	uuid->data[1] = (unsigned char) (unix_ts_ms >> 32);
@@ -547,7 +549,7 @@ generate_uuidv7(int64 ns)
 	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
 	 * SUBMS_MINIMAL_STEP_BITS)
 	 */
-	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+	increased_clock_precision = ((sub_ms) * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
@@ -586,7 +588,8 @@ generate_uuidv7(int64 ns)
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+	int64		ns = get_real_time_ns_ascending();
+	pg_uuid_t  *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
 
 	PG_RETURN_UUID_P(uuid);
 }
@@ -600,14 +603,17 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	Interval   *shift = PG_GETARG_INTERVAL_P(0);
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
+	/*
+	 * 64 bits is enough for ns in our centuries(until 2200-ies), but not for
+	 * a whole time range of UUID (year 10888).
+	 */
 	int64		ns = get_real_time_ns_ascending();
+	int64		us;
 
 	/*
 	 * Shift the current timestamp by the given interval. To calculate time
 	 * shift correctly, we convert the UNIX epoch to TimestampTz and use
-	 * timestamptz_pl_interval(). Since this calculation is done with
-	 * microsecond precision, we carry nanoseconds from original ns value to
-	 * shifted ns value.
+	 * timestamptz_pl_interval().
 	 */
 
 	ts = (TimestampTz) (ns / NS_PER_US) -
@@ -618,14 +624,11 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 												 TimestampTzGetDatum(ts),
 												 IntervalPGetDatum(shift)));
 
-	/*
-	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
-	 */
-	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* NS_PER_US + ns % NS_PER_US;
+	/* Convert a TimestampTz value back to an UNIX epoch in us */
+	us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC);
 
-	/* Generate an UUIDv7 */
-	uuid = generate_uuidv7(ns);
+	/* Generate an UUIDv7, not forgetting ns remainder */
+	uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) * NS_PER_US + ns % NS_PER_US);
 
 	PG_RETURN_UUID_P(uuid);
 }
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 5b8c2ad2a5..c28cace01a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -9392,7 +9392,7 @@
   proname => 'uuidv7', provolatile => 'v',
   prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
 { oid => '9897', descr => 'generate UUID version 7 with a timestamp shifted by specified interval',
-  proname => 'uuidv7', provolatile => 'v', proargnames => '{shift}',
+  proname => 'uuidv7', provolatile => 'v', proargnames => '{offset}',
   prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7_interval' },
 { oid => '6342', descr => 'extract timestamp from UUID',
   proname => 'uuid_extract_timestamp', proleakproof => 't',
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 798633ad51..96e93fbb28 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -290,5 +290,18 @@ SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
  
 (1 row)
 
+-- offset generation
+CREATE TABLE guid4(c SERIAL, d uuid, t timestamptz generated always as
+(uuid_extract_timestamp(d)) stored);
+-- generate UUIDs up to year 10000
+INSERT INTO guid4 (d) SELECT uuidv7((n || 'years')::interval) FROM generate_series(1, 8000) n; -- should work fine until year 28888 = 10888 (end of UUIDv7) - 8000
+SELECT count(*) FROM 
+	(SELECT t - lag(t) OVER (ORDER BY c) AS diff FROM guid4)
+WHERE diff > '367 days'; -- If UUIDs would be generated instantly and without overlap we would have up to '366 days'. One day is extra tolerance in case of machine stalls
+ count 
+-------
+     0
+(1 row)
+
 -- clean up
-DROP TABLE guid1, guid2, guid3 CASCADE;
+DROP TABLE guid1, guid2, guid3, guid4 CASCADE;
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 110188361d..9b0bec4912 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -134,6 +134,16 @@ SELECT uuid_extract_timestamp('017F22E2-79B0-7CC3-98C4-DC0C0C07398F') = 'Tuesday
 SELECT uuid_extract_timestamp(gen_random_uuid());  -- null
 SELECT uuid_extract_timestamp('11111111-1111-1111-1111-111111111111');  -- null
 
+-- offset generation
+CREATE TABLE guid4(c SERIAL, d uuid, t timestamptz generated always as
+(uuid_extract_timestamp(d)) stored);
+
+-- generate UUIDs up to year 10000
+INSERT INTO guid4 (d) SELECT uuidv7((n || 'years')::interval) FROM generate_series(1, 8000) n; -- should work fine until year 28888 = 10888 (end of UUIDv7) - 8000
+
+SELECT count(*) FROM 
+	(SELECT t - lag(t) OVER (ORDER BY c) AS diff FROM guid4)
+WHERE diff > '367 days'; -- If UUIDs would be generated instantly and without overlap we would have up to '366 days'. One day is extra tolerance in case of machine stalls
 
 -- clean up
-DROP TABLE guid1, guid2, guid3 CASCADE;
+DROP TABLE guid1, guid2, guid3, guid4 CASCADE;
-- 
2.39.5 (Apple Git-154)

#224

Andrew Alsup

bluesbreaker@gmail.com

11 months ago

In reply to: Sergey Prokhorenko (#222)

Re: UUID v7

Sergey,

I took a look at your patch for chapter 9.14 "UUID Functions" docs page.
You've added some really good content here. I think section 9.14.4.
"Deciding Whether and Which UUID to Use" would be better suited for Chapter
8: "Data Types" -- specifically, 8.12. "UUID Type", since the content seems
to deal more with the UUID data type than the UUID functions.

Best regards,
Andy A.

On Sun, Feb 16, 2025 at 8:13 PM Sergey Prokhorenko <
sergeyprokhorenko@yahoo.com.au> wrote:

Show quoted text

On Wednesday 5 February 2025 at 01:07:02 am GMT+3, Masahiko Sawada <
sawada.mshk@gmail.com> wrote:

On Sun, Feb 2, 2025 at 11:41 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:

Dear colleagues,

I would like to present for discussion my attached new draft

documentation on UUID functions (Section 9.14. UUID Functions), which
replaces the previously proposed draft at
https://www.postgresql.org/docs/devel/functions-uuid.html. I have
preserved and significantly supplemented the text that was there.

I have the following goals:

1. State that from now on, the function uuidv7(), rather than

autoincrement, is the default choice for generating primary keys

2. Describe the advantages of uuidv7() over autoincrement and uuidv4()

3. Refute the often-cited imaginary disadvantages of UUIDv7 compared to

autoincrement, such as:

- Lower performance (see the refutation in the article "UUID

Benchmark War" https://ardentperf.com/2024/02/03/uuid-benchmark-war/)

- Disclosure of date and time of record creation in the table (in

reality, the timestamp offset parameter distorts this information)

4. Confirm the fault tolerance of the uuidv7() function in all possible

critical situations, namely:

- System clock failure

- Receiving an invalid value of the offset argument, which would

result in a timestamp overflow or a negative timestamp

Thank you for the proposal. Could you share the proposed document as a
.diff or .patch file? That would be easier to review the updates.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

_________________________________________________________

Dear colleagues,

I have attached a file changed.diff containing my proposed changes to the
documentation on UUID functions (Section 9.14. UUID Functions). The text
of the changes in this file is updated, and it is slightly different from
the previously submitted text.

Regards,

Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

#225

david.g.johnston@gmail.com

11 months ago

In reply to: Andrew Alsup (#224)

Re: UUID v7

On Sun, Feb 16, 2025 at 6:21 PM Andrew Alsup <bluesbreaker@gmail.com> wrote:

Sergey,

I took a look at your patch for chapter 9.14 "UUID Functions" docs page.
You've added some really good content here. I think section 9.14.4.
"Deciding Whether and Which UUID to Use" would be better suited for Chapter
8: "Data Types" -- specifically, 8.12. "UUID Type", since the content seems
to deal more with the UUID data type than the UUID functions.

Any chance we can bring some organization to this work? The subject line
is too vague to be helpful and thus the thread itself seems to be torn
between fixing stuff in UUID v7 and, separately, making a documentation
enhancement. I don't see a commitfest entry yet - so how about just
starting two new, well-named, threads, with a summary and current patch
proposal for each topic? Though I do sense some overlap such that some of
the content probably needs to be written up assuming the fix patch goes in
first, then the documentation patch. We can always tweak that should the
documentation patch take the lead.

I haven't followed the technical subthread but was asked to comment on the
documentation work off-list (spending enough time editing the DOCX original
and imposing what seems to be the current proposed structure to at least
warrant a mention). A decent part of my commentary was basically that a
lot of this material seems outside the scope of what we cover in our
documentation generally. I'd probably be ok with moving it to an appendix
and making sure the relevant places have links to there. More than just
UUID needs to be altered if you begin comparing UUID to bigint - you need
to add some content to bigint too.

Commenting specifically on the 4 goals:

1. State that from now on, the function uuidv7(), rather than
autoincrement, is the default choice for generating primary keys

We don't make these judgements in the documentation, typically. Happy to
be pointed to exceptions. Simple enough to avoid saying:

"The uuidv7 function is designed as the preferred method for generating
primary keys" and instead just say "...it can be used as an identifier
generator and see Appendix Z for how it compares to other methods."

2. Describe the advantages of uuidv7() over autoincrement and uuidv4()

This is fine; but probably appendix material.

3. Refute the often-cited imaginary disadvantages of UUIDv7 compared to
autoincrement,

This also seems strictly out-of-place; better suited to a Wiki page than
the documentation. Though making statements of fact, and possibly the
occasional clarification of a non-fact, would likely fit inline. Or as
part of the Appendix created to hold the table in goal #2.

4. Confirm the fault tolerance of the uuidv7() function in all possible
critical situations,

This fits into the user-facing specification of the generator function in
some manner.

I did have some alternative text for all this which I'll share, ideally
when there is a clear place to put it up for discussion.

David J.

#226

sergeyprokhorenko@yahoo.com.au

11 months ago

In reply to: Andrew Alsup (#224)

Re: UUID v7

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Monday 17 February 2025 at 04:21:31 am GMT+3, Andrew Alsup <bluesbreaker@gmail.com> wrote:

Sergey,
I took a look at your patch for chapter 9.14 "UUID Functions" docs page. You've added some really good content here. I think section 9.14.4. "Deciding Whether and Which UUID to Use" would be better suited for Chapter 8: "Data Types" -- specifically, 8.12. "UUID Type", since the content seems to deal more with the UUID data type than the UUID functions.
Best regards,Andy A.

________________________________________________
Andrew,
I don't think so. The documentation text specifically concerns identifiers generated by these functions, and not UUID version 4 or 7 in general, or the UUID type itself.

Regards,
Sergey Prokhorenko
sergeyprokhorenko@yahoo.com.au

#227

david.g.johnston@gmail.com

11 months ago

In reply to: Sergey Prokhorenko (#226)

Re: UUID v7

On Mon, Feb 17, 2025 at 12:57 PM Sergey Prokhorenko <
sergeyprokhorenko@yahoo.com.au> wrote:

Sergey Prokhorenko sergeyprokhorenko@yahoo.com.au

On Monday 17 February 2025 at 04:21:31 am GMT+3, Andrew Alsup <
bluesbreaker@gmail.com> wrote:

Sergey,

I took a look at your patch for chapter 9.14 "UUID Functions" docs page.
You've added some really good content here. I think section 9.14.4.
"Deciding Whether and Which UUID to Use" would be better suited for Chapter
8: "Data Types" -- specifically, 8.12. "UUID Type", since the content seems
to deal more with the UUID data type than the UUID functions.

Best regards,
Andy A.

________________________________________________

Andrew,

I don't think so. The documentation text specifically concerns identifiers
generated by these functions, and not UUID version 4 or 7 in general, or
the UUID type itself.

That likely just means we need to talk about the various UUID versions
within the data type section. This is a property of the stored data
independent of how those values are generated. It isn't like this behaves
differently if the user computes it in the application and passes it in
compared to using the built-in function.

David J.

#228

david.g.johnston@gmail.com

11 months ago

In reply to: Andrey M. Borodin (#200)

Re: UUID v7

On Monday, February 17, 2025, Sergey Prokhorenko <
sergeyprokhorenko@yahoo.com.au> wrote:

This means exactly the opposite of what you wrote. There is a big
difference between UUID versions and data types. The properties of
identifiers strongly depend on the implementation of generating function. Don't
give advice without studying the subject area.

I learn by doing and will make mistakes. In this case being imprecise and
not making my point with concrete examples.

But can we please stop this thread-jacking and give these two patches
proper homes? This thread should have died on the 12th with the “both
patches pushed” message, IMHO.

David J.

Import Notes

Reply to msg id not found: 1007700465.1175055.1739858557750@mail.yahoo.com

#229

sawada.mshk@gmail.com

10 months ago

In reply to: Andrey Borodin (#223)

1 attachment(s)

Re: UUID v7

On Sun, Feb 9, 2025 at 9:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I've took into account note from Sergey that "offset" is better name for uuidv7() argument than "shift".

On 5 Feb 2025, at 03:02, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I was thinking about incorporating test like this.

With this patch we can generate correct UUIDs in a very distant future.
postgres=# select x, uuid_extract_timestamp(uuidv7((x::text || ' year'::text)::interval)),
(x::text || ' year'::text)::interval
from generate_series(1,9000,1000) x;
x | uuid_extract_timestamp | interval
------+-----------------------------+------------
1 | 2026-01-31 12:00:53.084+05 | 1 year
1001 | 3026-01-31 12:00:53.084+05 | 1001 years
2001 | 4026-01-31 12:00:53.084+05 | 2001 years
3001 | 5026-01-31 12:00:53.084+05 | 3001 years
4001 | 6026-01-31 12:00:53.084+05 | 4001 years
5001 | 7026-01-31 12:00:53.085+05 | 5001 years
6001 | 8026-01-31 12:00:53.085+05 | 6001 years
7001 | 9026-01-31 12:00:53.085+05 | 7001 years
8001 | 10026-01-31 12:00:53.085+05 | 8001 years
(9 rows)

Something like following queries might be workable for example?

create table test (c serial, d uuid, t timestamptz generated always as
(uuid_extract_timestamp(d)) stored);
insert into test (d) select uuidv7((n || 'years')::interval) from
generate_series(1, 2000) n;
select count(*) from (select t - lag(t) over (order by c) as diff from
test) where diff > '10 year' ;

Yeah, makes sense. I reduced tolerance to 366+1 day. Must be stable if we've done all the time offset business right.

Here are some review comments:

#define NS_PER_S INT64CONST(1000000000)
#define NS_PER_MS INT64CONST(1000000)
+#define US_PER_MS INT64CONST(1000)
#define NS_PER_US INT64CONST(1000)

I think it's clear if we put US_PER_MS below NS_PER_US.

OK.
---
*
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * ns_in_ms is a number of nanoseconds within millisecond. These values are
* used for time-dependent bits of UUID.
I think we can mention that the RFC describes that stored unix
timestamp as an unsigned integer.
Done. Feel free to adjust my wordings, I've no sense of idiomatic English.
---
static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 ns_in_ms)
How about renaming ns_in_ms with sub_ms?
OK.
---
+        /* 64 bits is enough for real time, but not for a time range of UUID */
I could not understand the point of this comment. It seems to say that
64-bits is not enough for a time range of UUID, but doesn't the time
range of UUIDv7 use only 48 bits? It seems to need more comments.
I've tried to say that acquiring current time as an int64 ns since UNIX epoch is still viable for the code (until year 2262).
---
-        ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC)
-                * NS_PER_US + ns % NS_PER_US;
+        us = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
SECS_PER_DAY * USECS_PER_SEC);
/* Generate an UUIDv7 */
-        uuid = generate_uuidv7(ns);
+        uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) *
NS_PER_US + ns % NS_PER_US);
Need to update comments in uuidv7_internval() such as:

/*
* Shift the current timestamp by the given interval. To calculate time
* shift correctly, we convert the UNIX epoch to TimestampTz and use
* timestamptz_pl_interval(). Since this calculation is done with
* microsecond precision, we carry nanoseconds from original ns value to
* shifted ns value.
*/

and

/*
* Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
*/
I've tried. I'm not very satisfied with comments, but could not come up with easier description.

Thank you for updating the patch. I had missed to track this patch.

I've updated the patch from your v4 patch. In this version, I excluded
the argument name change (from 'shift' to 'offset') as it's not
related to the bug fix and simplified the regression test case.

Please review it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-Fix-timestamp-overflow-in-UUIDv7-implementation.patchapplication/octet-stream; name=v5-0001-Fix-timestamp-overflow-in-UUIDv7-implementation.patchDownload

From 82ba268b7af93a75c76cf36a85c764761e0dbeb1 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 25 Mar 2025 15:14:42 -0700
Subject: [PATCH v5] Fix timestamp overflow in UUIDv7 implementation.

Previously, the uuidv7_interval() function performed timestamp
shifting calculations using microsecond precision, but then converted
the result back to nanosecond precision. Since the millisecond and
sub-millisecond parts were extracted from this nanosecond timestamp
and stored into the UUIDv7 value, overflow occurred for timestamps
beyond the year 2262.

With this commit, the millisecond and sub-millisecond parts are stored
directly into the UUIDv7 value without being converted back to a
nanosecond precision timestamp. Following RFC 9562, the timestamp is
stored as an unsigned integer, enabling support for dates up to the
year 10889.

Reported and fixed by Andrey Borodin, with cosmetic changes and
regression tests by me.

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru
---
 src/backend/utils/adt/uuid.c       | 34 +++++++++++++++---------------
 src/test/regress/expected/uuid.out | 14 ++++++++++++
 src/test/regress/sql/uuid.sql      | 11 ++++++++++
 3 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f8402ef925..be0f0f9f1ce 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -29,6 +29,7 @@
 #define NS_PER_S	INT64CONST(1000000000)
 #define NS_PER_MS	INT64CONST(1000000)
 #define NS_PER_US	INT64CONST(1000)
+#define US_PER_MS	INT64CONST(1000)
 
 /*
  * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
@@ -69,6 +70,7 @@ static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
 static inline int64 get_real_time_ns_ascending();
+static pg_uuid_t *generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms);
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -523,17 +525,17 @@ get_real_time_ns_ascending()
  * described in the RFC. This method utilizes 12 bits from the "rand_a" bits
  * to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
  *
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * and sub_ms is a number of nanoseconds within millisecond. These values are
  * used for time-dependent bits of UUID.
+ *
+ * NB: all numbers here are unsigned, unix_ts_ms cannot be negative per RFC.
  */
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
-	int64		unix_ts_ms;
-	int32		increased_clock_precision;
-
-	unix_ts_ms = ns / NS_PER_MS;
+	uint32		increased_clock_precision;
 
 	/* Fill in time part */
 	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
@@ -547,7 +549,7 @@ generate_uuidv7(int64 ns)
 	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
 	 * SUBMS_MINIMAL_STEP_BITS)
 	 */
-	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+	increased_clock_precision = (sub_ms * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
@@ -586,7 +588,8 @@ generate_uuidv7(int64 ns)
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+	int64		ns = get_real_time_ns_ascending();
+	pg_uuid_t  *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
 
 	PG_RETURN_UUID_P(uuid);
 }
@@ -601,13 +604,13 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
 	int64		ns = get_real_time_ns_ascending();
+	int64		us;
 
 	/*
 	 * Shift the current timestamp by the given interval. To calculate time
 	 * shift correctly, we convert the UNIX epoch to TimestampTz and use
-	 * timestamptz_pl_interval(). Since this calculation is done with
-	 * microsecond precision, we carry nanoseconds from original ns value to
-	 * shifted ns value.
+	 * timestamptz_pl_interval(). This calculation is done with microsecond
+	 * precision.
 	 */
 
 	ts = (TimestampTz) (ns / NS_PER_US) -
@@ -618,14 +621,11 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 												 TimestampTzGetDatum(ts),
 												 IntervalPGetDatum(shift)));
 
-	/*
-	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
-	 */
-	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* NS_PER_US + ns % NS_PER_US;
+	/* Convert a TimestampTz value back to an UNIX epoch timestamp */
+	us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 	/* Generate an UUIDv7 */
-	uuid = generate_uuidv7(ns);
+	uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) * NS_PER_US + ns % NS_PER_US);
 
 	PG_RETURN_UUID_P(uuid);
 }
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 798633ad51e..cbd497376c4 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -233,6 +233,20 @@ SELECT array_agg(id ORDER BY guid_field) FROM guid3;
  {1,2,3,4,5,6,7,8,9,10}
 (1 row)
 
+-- Check the timestamp offsets for v7.
+--
+-- generate UUIDv7 having timestamps up to 10889 year, which is the maximum year
+-- can be stored in UUIDv7, and then check if the timestamps extracted from UUIDv7
+-- values are not overflowed.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts FROM generate_series(-50, 10889 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;
+ y | ts | prev_ts 
+---+----+---------
+(0 rows)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 110188361d1..cd0e65d3a8b 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -119,6 +119,17 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
+-- Check the timestamp offsets for v7.
+--
+-- generate UUIDv7 having timestamps up to 10889 year, which is the maximum year
+-- can be stored in UUIDv7, and then check if the timestamps extracted from UUIDv7
+-- values are not overflowed.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts FROM generate_series(-50, 10889 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;
+
 -- extract functions
 
 -- version
-- 
2.43.5

#230

x4mmm@yandex-team.ru

10 months ago

In reply to: Masahiko Sawada (#229)

Re: UUID v7

On 26 Mar 2025, at 08:32, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Please review it.

The patch looks good to me except one nit.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts FROM generate_series(-50, 10889 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;

if "extract(year from now())::int)" runs slightly before new year and the rest of the test after - the test will fail. How about avoiding overflow by using 10888 instead of 10889?

If we are sure citizen time never will go back, IMO we can safely move other border back to -55.

Also the test is not proof to NTP time drift during New Year's edge, but it's hardly a problem. The test, NTP clock sync and New Year millisecond must coincide for a false failure.

Thank you!

Best regards, Andrey Borodin.

#231

sawada.mshk@gmail.com

10 months ago

In reply to: Andrey Borodin (#230)

1 attachment(s)

Re: UUID v7

On Wed, Mar 26, 2025 at 6:00 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 26 Mar 2025, at 08:32, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Please review it.
The patch looks good to me except one nit.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts FROM generate_series(-50, 10889 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;
if "extract(year from now())::int)" runs slightly before new year and the rest of the test after - the test will fail. How about avoiding overflow by using 10888 instead of 10889?

Agreed. I've done this in the attached patch.

If we are sure citizen time never will go back, IMO we can safely move other border back to -55.

Yes. Or I think we can verify the range from 1970 to 10888 like I did
in the updated patch.

Also the test is not proof to NTP time drift during New Year's edge, but it's hardly a problem. The test, NTP clock sync and New Year millisecond must coincide for a false failure.

Agreed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Fix-timestamp-overflow-in-UUIDv7-implementation.patchapplication/octet-stream; name=v6-0001-Fix-timestamp-overflow-in-UUIDv7-implementation.patchDownload

From 0baca061a9ff508b81d90c514c498d9cb03a63b2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Tue, 25 Mar 2025 15:14:42 -0700
Subject: [PATCH v6] Fix timestamp overflow in UUIDv7 implementation.

Previously, the uuidv7_interval() function performed timestamp
shifting calculations using microsecond precision, but then converted
the result back to nanosecond precision. Since the millisecond and
sub-millisecond parts were extracted from this nanosecond timestamp
and stored into the UUIDv7 value, overflow occurred for timestamps
beyond the year 2262.

With this commit, the millisecond and sub-millisecond parts are stored
directly into the UUIDv7 value without being converted back to a
nanosecond precision timestamp. Following RFC 9562, the timestamp is
stored as an unsigned integer, enabling support for dates up to the
year 10889.

Reported and fixed by Andrey Borodin, with cosmetic changes and
regression tests by me.

Reported-by: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/96DEC2D9-659A-40E8-B7BA-AF5D162A9E21@yandex-team.ru
---
 src/backend/utils/adt/uuid.c       | 34 +++++++++++++++---------------
 src/test/regress/expected/uuid.out | 15 +++++++++++++
 src/test/regress/sql/uuid.sql      | 12 +++++++++++
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/src/backend/utils/adt/uuid.c b/src/backend/utils/adt/uuid.c
index 4f8402ef925..be0f0f9f1ce 100644
--- a/src/backend/utils/adt/uuid.c
+++ b/src/backend/utils/adt/uuid.c
@@ -29,6 +29,7 @@
 #define NS_PER_S	INT64CONST(1000000000)
 #define NS_PER_MS	INT64CONST(1000000)
 #define NS_PER_US	INT64CONST(1000)
+#define US_PER_MS	INT64CONST(1000)
 
 /*
  * UUID version 7 uses 12 bits in "rand_a" to store  1/4096 (or 2^12) fractions of
@@ -69,6 +70,7 @@ static bool uuid_abbrev_abort(int memtupcount, SortSupport ssup);
 static Datum uuid_abbrev_convert(Datum original, SortSupport ssup);
 static inline void uuid_set_version(pg_uuid_t *uuid, unsigned char version);
 static inline int64 get_real_time_ns_ascending();
+static pg_uuid_t *generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms);
 
 Datum
 uuid_in(PG_FUNCTION_ARGS)
@@ -523,17 +525,17 @@ get_real_time_ns_ascending()
  * described in the RFC. This method utilizes 12 bits from the "rand_a" bits
  * to store a 1/4096 (or 2^12) fraction of sub-millisecond precision.
  *
- * ns is a number of nanoseconds since start of the UNIX epoch. This value is
+ * unix_ts_ms is a number of milliseconds since start of the UNIX epoch,
+ * and sub_ms is a number of nanoseconds within millisecond. These values are
  * used for time-dependent bits of UUID.
+ *
+ * NB: all numbers here are unsigned, unix_ts_ms cannot be negative per RFC.
  */
 static pg_uuid_t *
-generate_uuidv7(int64 ns)
+generate_uuidv7(uint64 unix_ts_ms, uint32 sub_ms)
 {
 	pg_uuid_t  *uuid = palloc(UUID_LEN);
-	int64		unix_ts_ms;
-	int32		increased_clock_precision;
-
-	unix_ts_ms = ns / NS_PER_MS;
+	uint32		increased_clock_precision;
 
 	/* Fill in time part */
 	uuid->data[0] = (unsigned char) (unix_ts_ms >> 40);
@@ -547,7 +549,7 @@ generate_uuidv7(int64 ns)
 	 * sub-millisecond timestamp fraction (SUBMS_BITS bits, not
 	 * SUBMS_MINIMAL_STEP_BITS)
 	 */
-	increased_clock_precision = ((ns % NS_PER_MS) * (1 << SUBMS_BITS)) / NS_PER_MS;
+	increased_clock_precision = (sub_ms * (1 << SUBMS_BITS)) / NS_PER_MS;
 
 	/* Fill the increased clock precision to "rand_a" bits */
 	uuid->data[6] = (unsigned char) (increased_clock_precision >> 8);
@@ -586,7 +588,8 @@ generate_uuidv7(int64 ns)
 Datum
 uuidv7(PG_FUNCTION_ARGS)
 {
-	pg_uuid_t  *uuid = generate_uuidv7(get_real_time_ns_ascending());
+	int64		ns = get_real_time_ns_ascending();
+	pg_uuid_t  *uuid = generate_uuidv7(ns / NS_PER_MS, ns % NS_PER_MS);
 
 	PG_RETURN_UUID_P(uuid);
 }
@@ -601,13 +604,13 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 	TimestampTz ts;
 	pg_uuid_t  *uuid;
 	int64		ns = get_real_time_ns_ascending();
+	int64		us;
 
 	/*
 	 * Shift the current timestamp by the given interval. To calculate time
 	 * shift correctly, we convert the UNIX epoch to TimestampTz and use
-	 * timestamptz_pl_interval(). Since this calculation is done with
-	 * microsecond precision, we carry nanoseconds from original ns value to
-	 * shifted ns value.
+	 * timestamptz_pl_interval(). This calculation is done with microsecond
+	 * precision.
 	 */
 
 	ts = (TimestampTz) (ns / NS_PER_US) -
@@ -618,14 +621,11 @@ uuidv7_interval(PG_FUNCTION_ARGS)
 												 TimestampTzGetDatum(ts),
 												 IntervalPGetDatum(shift)));
 
-	/*
-	 * Convert a TimestampTz value back to an UNIX epoch and back nanoseconds.
-	 */
-	ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC)
-		* NS_PER_US + ns % NS_PER_US;
+	/* Convert a TimestampTz value back to an UNIX epoch timestamp */
+	us = ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY * USECS_PER_SEC;
 
 	/* Generate an UUIDv7 */
-	uuid = generate_uuidv7(ns);
+	uuid = generate_uuidv7(us / US_PER_MS, (us % US_PER_MS) * NS_PER_US + ns % NS_PER_US);
 
 	PG_RETURN_UUID_P(uuid);
 }
diff --git a/src/test/regress/expected/uuid.out b/src/test/regress/expected/uuid.out
index 798633ad51e..2e868457d63 100644
--- a/src/test/regress/expected/uuid.out
+++ b/src/test/regress/expected/uuid.out
@@ -233,6 +233,21 @@ SELECT array_agg(id ORDER BY guid_field) FROM guid3;
  {1,2,3,4,5,6,7,8,9,10}
 (1 row)
 
+-- Check the timestamp offsets for v7.
+--
+-- generate UUIDv7 values with timestamps ranging from 1970 (the Unix epoch year)
+-- to 10888 (one year before the maximum possible year), and then verify that
+-- the extracted timestamps from these UUIDv7 values have not overflowed.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts
+     	  FROM generate_series(1970 - extract(year from now())::int, 10888 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;
+ y | ts | prev_ts 
+---+----+---------
+(0 rows)
+
 -- extract functions
 -- version
 SELECT uuid_extract_version('11111111-1111-5111-8111-111111111111');  -- 5
diff --git a/src/test/regress/sql/uuid.sql b/src/test/regress/sql/uuid.sql
index 110188361d1..241d514eb9c 100644
--- a/src/test/regress/sql/uuid.sql
+++ b/src/test/regress/sql/uuid.sql
@@ -119,6 +119,18 @@ SELECT count(DISTINCT guid_field) FROM guid1;
 INSERT INTO guid3 (guid_field) SELECT uuidv7() FROM generate_series(1, 10);
 SELECT array_agg(id ORDER BY guid_field) FROM guid3;
 
+-- Check the timestamp offsets for v7.
+--
+-- generate UUIDv7 values with timestamps ranging from 1970 (the Unix epoch year)
+-- to 10888 (one year before the maximum possible year), and then verify that
+-- the extracted timestamps from these UUIDv7 values have not overflowed.
+WITH uuidts AS (
+     SELECT y, ts as ts, lag(ts) OVER (ORDER BY y) AS prev_ts
+     FROM (SELECT y, uuid_extract_timestamp(uuidv7((y || ' years')::interval)) AS ts
+     	  FROM generate_series(1970 - extract(year from now())::int, 10888 - extract(year from now())::int) y)
+)
+SELECT y, ts, prev_ts FROM uuidts WHERE ts < prev_ts;
+
 -- extract functions
 
 -- version
-- 
2.43.5

#232

Andrei Borodin

x4mmm@yandex-team.ru

10 months ago

In reply to: Masahiko Sawada (#231)

Re: UUID v7

<br /><br />26.03.2025, 21:06, "Masahiko Sawada" <sawada.mshk@gmail.com>:<br /><blockquote><p>Agreed. I've done this in the attached patch.</p></blockquote>Great! The patch looks good to me.<div><br /></div><div>Best regards, Andrey Borodin.</div>

#233

sawada.mshk@gmail.com

10 months ago

In reply to: Andrei Borodin (#232)

Re: UUID v7

On Wed, Mar 26, 2025 at 12:32 PM Andrei Borodin <x4mmm@yandex-team.ru> wrote:

26.03.2025, 21:06, "Masahiko Sawada" <sawada.mshk@gmail.com>:

Agreed. I've done this in the attached patch.

Great! The patch looks good to me.

Thank you for reviewing it. I'm going to push the fix tomorrow,
barring further comments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

#234