UUID v7
Hello pgsql-hackers!
As you may know there's a new version of UUID being standardized [0]https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04.
These new algorithms of UUID generation are very promising for
database performance. It keeps data locality for time-ordered values.
From my POV v7 is especially needed for users. Current standard status
is "draft". And I'm not sure it will be accepted before our feature
freeze for PG16. Maybe we could provide a draft implementation in 16
and adjust it to the accepted version if the standard is changed? PFA
patch with implementation.
What do you think?
cc Brad Peabody and Kyzer R. Davis, authors of the standard
cc Kirk Wolak and Nik Samokhvalov who requested the feature
Thanks!
Best regards, Andrey Borodin.
[0]: https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
Attachments:
v1-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v1-0001-Implement-UUID-v7-as-per-IETF-draft.patchDownload+72-2
Hi,
On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance.
I agree it's very useful to have.
[0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
It keeps data locality for time-ordered values.
From my POV v7 is especially needed for users. Current standard status
is "draft". And I'm not sure it will be accepted before our feature
freeze for PG16. Maybe we could provide a draft implementation in 16
and adjust it to the accepted version if the standard is changed? PFA
patch with implementation.
Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.
Perhaps we should name the function something like
gen_time_ordered_random_uuid() instead? That gives us a bit more flexibility
about what uuid version we generate. And it might be easier for users, anyway.
Still not sure what version we'd best use for now. Perhaps v8?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
From my POV v7 is especially needed for users. Current standard status
is "draft". And I'm not sure it will be accepted before our feature
freeze for PG16. Maybe we could provide a draft implementation in 16
and adjust it to the accepted version if the standard is changed? PFA
patch with implementation.
Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.
I think there is no need to rush this into v16. Let's wait for the
standardization process to play out.
regards, tom lane
On Fri, Feb 10, 2023 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
Perhaps we should name the function something like
gen_time_ordered_random_uuid() instead? That gives us a bit more flexibility
about what uuid version we generate. And it might be easier for users, anyway.
I think users would be happy with any name.
Still not sure what version we'd best use for now. Perhaps v8?
V8 is just a "custom data" format. Like "place whatever you want".
Though I agree that its sample implementation looks to be better.
On Fri, Feb 10, 2023 at 5:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm. It seems somewhat worrisome to claim something is a v7 UUID when it might
turn out to not be one.I think there is no need to rush this into v16. Let's wait for the
standardization process to play out.
Standardization per se does not bring value to users. However, I agree
that eager users can just have it today as an extension and be happy
with it [0]https://github.com/x4m/pg_uuid_next.
Maybe it's fine to wait a year for others...
Best regards, Andrey Borodin.
On 11.02.23 02:14, Andres Freund wrote:
On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance.I agree it's very useful to have.
[0] https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid-format-04
That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
Yes, this means that the draft that an individual had uploaded has now
been taken on by a working group for formal review. If there is a
prototype implementation, this is a good time to provide feedback. But
it's too early to ship a production version.
Hello Group,
I am happy to see others interested in the improvements provided by UUIDv7!
I caught up on the thread and you all are correct.
Work has moved on GitHub from uuid6/uuid6-ietf-draft to
ietf-wg-uuidrev/rfc4122bis
- Draft 00 merged RFC4122 with Draft 04 and fixed as many problems as possible
with RFC4122.
- Draft 01 continued to iterate on RFC4122 problems:
https://author-tools.ietf.org/iddiff?url2=draft-ietf-uuidrev-rfc4122bis-01
- Draft 02 items being changed are summarized in the latest PR for review in
the upcoming interim meeting (Feb 16th):
https://github.com/ietf-wg-uuidrev/rfc4122bis/pull/60
Note: Draft 02 should be published by the end of the week and long term we
have one more meeting at IETF 116 to iron out the replacement of RFC4122,
perform last call and submit to the IESG for official review and consideration
for replacement of RFC4122 (actual timeline for that varies based on what IESG
wants me to fix.)
That all being said:
The point is 99% of the work since adoption by the IETF has been ironing out
RFC4122's problems and nothing major related to UUIDv6/7/8 which are all in a
very good state.
If anybody has any feedback found during draft reviewing or prototyping;
please either email uuidrev@ietf.org or drop an issue on the tracker:
https://github.com/ietf-wg-uuidrev/rfc4122bis/issues
Lastly, I have added the C/SQL implementation to the prototypes page below:
https://github.com/uuid6/prototypes
Thanks!
-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Sent: Saturday, February 11, 2023 10:51 AM
To: Andres Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com; Kyzer Davis (kydavis) <kydavis@cisco.com>; Nikolay
Samokhvalov <samokhvalov@gmail.com>
Subject: Re: UUID v7
On 11.02.23 02:14, Andres Freund wrote:
On 2023-02-10 15:57:50 -0800, Andrey Borodin wrote:
As you may know there's a new version of UUID being standardized [0].
These new algorithms of UUID generation are very promising for
database performance.I agree it's very useful to have.
[0]
https://datatracker.ietf.org/doc/html/draft-peabody-dispatch-new-uuid
-format-04That looks to not be the current version anymore, it's superseded by:
https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis
Yes, this means that the draft that an individual had uploaded has now been
taken on by a working group for formal review. If there is a prototype
implementation, this is a good time to provide feedback. But it's too early
to ship a production version.
Attachments:
On Tue, Feb 14, 2023 at 6:13 AM Kyzer Davis (kydavis) <kydavis@cisco.com> wrote:
I am happy to see others interested in the improvements provided by UUIDv7!
Thank you for providing the details!
Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
https://commitfest.postgresql.org/43/4388/
- recent post by Ants Aasma, Cybertec about the downsides of
traditional UUID raised a big discussion today on HN:
https://news.ycombinator.com/item?id=36429986
On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:
I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.
--
Daniel Gustafsson
On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.
I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month. I don't
know how long this publishing process usually takes, but it seems like
the WG considers the text final, so unless this would take months I
wouldn't mind keeping this patch around as "waiting for external
process to complete". Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
On Thu, 6 Jul 2023 at 14:24, Daniel Gustafsson <daniel@yesql.se> wrote:
On 22 Jun 2023, at 20:30, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Some small updates as I see them:
- there is revision 7 now in https://github.com/ietf-wg-uuidrev/rfc4122bis
- noticing that there is no commitfest record, I created one:I will actually go ahead and close this entry in the current CF, not because we
don't want the feature but because it's unlikely that it will go in now given
that standardization is still underway. Comitting anything right now seems
premature, we might as well wait for standardization given that we have lots of
time before the v17 freeze.I'd like to note that this draft has recently had its last call
period, and has been proposed for publishing early last month.
Sure, but this document is in AD Evaluation and there are many stages left in
the IESG process, it may still take a fair bit of time before this is done.
Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.
I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project. I'm only commenting
on this particular Internet standard which we have plenty of time to commit
before v17 without rushing to beat a standards committee.
Also, if you look you can see that I moved it to the next CF in a vague hope
that standardization will be swift (which is admittedly never is).
--
Daniel Gustafsson
Daniel Gustafsson <daniel@yesql.se> writes:
On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.
I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.
We have done a couple of things that way recently. An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version. I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.
regards, tom lane
On 06.07.23 16:02, Tom Lane wrote:
Daniel Gustafsson <daniel@yesql.se> writes:
On 6 Jul 2023, at 15:29, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
Sure, it's earlier than the actual release of
the standard, but that wasn't a blocker for SQL features that were
considered finalized either.I can't speak for any SQL standard features we've committed before being
standardized, it's for sure not the norm for the project.We have done a couple of things that way recently. An important
reason why we felt we could get away with that is that nowadays
we have people who actually sit on the SQL committee and have
reliable information on what's likely to make it into the final text
of the next version. I don't think we have equivalent visibility or
should have equivalent confidence about how UUID v7 standardization
will play out.
(I have been attending some meetings and I'm on the mailing list.)
Anyway, I think it would be reasonable to review this patch now. We
might leave it hanging in "Ready for Committer" for a while when we get
there. But surely review can start now.
On 6 Jul 2023, at 21:38, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
I think it would be reasonable to review this patch now.
+1.
Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness, but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated numbers.
I think that "sequence" is there to help generate local ascending identifiers when the real time clock do not provide enough resolution. So "sequence" field must be placed after 6 bytes of time-generated identifier.
On a contrary "node ID" must differentiate identifiers generated on different nodes. So it makes sense to place "node ID" before timing. So identifiers generated on different nodes will tend to be in different ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID" is there to decrease the likelihood of a collision. So my intuition might be wrong here.
Do we want to provide this "vendor-specific" UUID with tweaks for databases? Or should we limit the scope with well defined UUID v7?
Best regards, Andrey Borodin.
[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01
Attachments:
v2-0001-Implement-UUID-v7-and-v8-as-per-IETF-draft.patchapplication/octet-stream; name=v2-0001-Implement-UUID-v7-and-v8-as-per-IETF-draft.patch; x-unix-mode=0644Download+144-2
Great discussions group,
I think it would be reasonable to review this patch now.
I am happy to review the format and logic for any proposed v7 and/or v8
UUID. Just point me to a PR or some code review location.
Distributed UUID Generation" states that "node ID" is there to decrease
the likelihood of a collision.
Correct, node identifiers help provide some bit space that ensures no
collision in the event the stars align where two nodes create the exact
UUID.
From what I have seen UUIDv7 should meet the requirements outlined thus far
In this thread.
Also to add, there are two UUID prototypes for postgres from my checks.
Although they are outdated from the latest draft sent up for official
Publication so review them from an academic perspective.)
- https://github.com/uuid6/prototypes
- pg_uuid_next (see this thread which nicely summarizes some UUIDv7
"checkboxes" https://github.com/x4m/pg_uuid_next/issues/1)
- UUID_v7_for_Postgres.sql
Don't forget, if we have UUIDv1 already implemented in the postgres code you
may want to examine UUIDv6.
UUIDv6 is simply a fork of that code and swap of the timestamp bits.
In terms of effort UUIDv6 easy to implement and gives you a time ordered
UUID re-using 99% of the code you may already have.
Lastly, my advice on v8 is that I would examine/implement v6 or v7 first
before jumping to v8
because whatever you do for implementing v6 or v7 will help you implement a
better v8.
There are also a number of v8 prototype implementations (at the previous
link) if somebody wants to give them a scroll.
Happy to answer any other questions where I can provide input.
Thanks,
-----Original Message-----
From: Andrey M. Borodin <x4mmm@yandex-team.ru>
Sent: Friday, July 7, 2023 8:06 AM
To: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; Daniel Gustafsson <daniel@yesql.se>;
Matthias van de Meent <boekewurm+postgres@gmail.com>; Nikolay Samokhvalov
<samokhvalov@gmail.com>; Kyzer Davis (kydavis) <kydavis@cisco.com>; Andres
Freund <andres@anarazel.de>; Andrey Borodin <amborodin86@gmail.com>;
PostgreSQL Hackers <pgsql-hackers@postgresql.org>; brad@peabody.io;
wolakk@gmail.com
Subject: Re: UUID v7
On 6 Jul 2023, at 21:38, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
I think it would be reasonable to review this patch now.
+1.
Also, I think we should discuss UUID v8. UUID version 8 provides an
RFC-compatible format for experimental or vendor-specific use cases.
Revision 1 of IETF draft contained interesting code for v8: almost similar
to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8
per [0]https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01. But from my point of view this implementation has some flaws.
These two new fields "node ID" and "sequence" are there not for uniqueness,
but rather for data locality.
But they are placed at the end, in bytes 14 and 15, after randomly generated
numbers.
I think that "sequence" is there to help generate local ascending
identifiers when the real time clock do not provide enough resolution. So
"sequence" field must be placed after 6 bytes of time-generated identifier.
On a contrary "node ID" must differentiate identifiers generated on
different nodes. So it makes sense to place "node ID" before timing. So
identifiers generated on different nodes will tend to be in different
ranges.
Although, section "6.4. Distributed UUID Generation" states that "node ID"
is there to decrease the likelihood of a collision. So my intuition might be
wrong here.
Do we want to provide this "vendor-specific" UUID with tweaks for databases?
Or should we limit the scope with well defined UUID v7?
Best regards, Andrey Borodin.
[0]: https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-01
Attachments:
On 07.07.23 14:06, Andrey M. Borodin wrote:
Also, I think we should discuss UUID v8. UUID version 8 provides an RFC-compatible format for experimental or vendor-specific use cases. Revision 1 of IETF draft contained interesting code for v8: almost similar to v7, but with fields for "node ID" and "rolling sequence number".
I think this is reasonable approach, thus I attach implementation of UUID v8 per [0].
I suggest we keep this thread to v7, which has pretty straightforward
semantics for PostgreSQL. v8 by definition has many possible
implementations, so you're going to have to make pretty strong arguments
that yours is the best and only one, if you are going to claim the
gen_uuid_v8 function name.
On 10 Jul 2023, at 21:50, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
I suggest we keep this thread to v7, which has pretty straightforward semantics for PostgreSQL. v8 by definition has many possible implementations, so you're going to have to make pretty strong arguments that yours is the best and only one, if you are going to claim the gen_uuid_v8 function name.
Thanks Peter, I'll follow this course of action.
After discussion on GitHub with Sergey Prokhorenko [0]https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776 <https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776> I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.
Best regards, Andrey Borodin.
[0]: https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776 <https://github.com/x4m/pg_uuid_next/issues/1#issuecomment-1657074776>
Attachments:
v3-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v3-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download+79-2
On 30 Jul 2023, at 13:08, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
After discussion on GitHub with Sergey Prokhorenko [0] I understood that counter is optional, but useful part of UUID v7. It actually promotes sortability of data generated at high speed.
The standard does not specify how big counter should be. PFA patch with 16 bit counter. Maybe it worth doing 18bit counter - it will save us one byte of PRNG data. Currently we only take 2 bits out of the whole random byte.
Here's a new patch version. Now counter is initialised with strong random on every time change (each ms). However, one first bit of the counter is preserved to zero. This is done to extend counter capacity (I left comments with reference to RFC with explanations).
Thanks!
Best regards, Andrey Borodin.
Attachments:
v4-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download+105-2
On 20 Aug 2023, at 23:56, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
<v4-0001-Implement-UUID-v7-as-per-IETF-draft.patch>
I've observed, that pre-generating and buffering random numbers makes UUID generation 10 times faster.
Without buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 5286.572 ms (00:05.287)
With buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select count(*) from x;
Time: 390.091 ms
This can speed up gen_random_uuid() on the same scale too. PFA implementation of this technique.
Best regards, Andrey Borodin.
Attachments:
v5-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download+105-2
v5-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v5-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download+20-4
v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download+1-2
On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>
FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated on the same backend
- counter overflow is now translated into ms step forward
Best regards, Andrey Borodin.
Attachments:
v6-0001-Implement-UUID-v7-as-per-IETF-draft.patchapplication/octet-stream; name=v6-0001-Implement-UUID-v7-as-per-IETF-draft.patch; x-unix-mode=0644Download+118-2
v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patchapplication/octet-stream; name=v6-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch; x-unix-mode=0644Download+1-2
v6-0002-Buffer-random-numbers.patchapplication/octet-stream; name=v6-0002-Buffer-random-numbers.patch; x-unix-mode=0644Download+20-4
Andrey,
Thanks for all your work on this. I think this will be really useful.
From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for
defining constraints if partitioning by the uuid-embedded timestamps, for
instance).
- Can we add an optional timestamptz argument to gen_uuid_v7 so that you
can explicitly specify a time instead of always generating for the current
time? If the argument is NULL, then use current time. This could be useful
for backfilling and other applications.
Thanks,
Matvey Arye
Timescale software developer.
On Wed, Aug 30, 2023 at 3:05 PM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:
Show quoted text
On 21 Aug 2023, at 13:42, Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:
<v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch><v5-0002-Buffer-random-numbers.patch><v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch>
FPA attached next version.
Changes:
- implemented protection from time leap backwards when series is generated
on the same backend
- counter overflow is now translated into ms step forwardBest regards, Andrey Borodin.