Re: Add uuid_to_base32hex() and base32hex_to_uuid() built-in functions
Hi Sergey,
I'm writing to propose adding two new built-in functions to PostgreSQL that provide compact UUID encoding using the base32hex format.
Firstly, cc:'ing a few dozens of people is not the best way to get
attention to your patch. Please don't do this.
Secondly, in order to propose a patch please use `git format-patch`
and send it as an attachment. Then register it on the nearest open
commitfest [1]https://commitfest.postgresql.org/ -- Best regards, Aleksander Alekseev.
The interface you are proposing is ugly and is not composable. The
right way of doing this IMO would be:
1. Implement uuid -> bytea and bytea -> uuid casting
2. Implement encode(bytea, 'base32') and decode(text, 'base32')
So the overall interface should be like this:
SELECT encode(uuidv7() :: bytea, 'base32');
The value of converting uuid to base32 is not obvious though, so I
would recommend explaining it in more detail. Consider starting a new
thread for each separate patch.
[1]: https://commitfest.postgresql.org/ -- Best regards, Aleksander Alekseev
--
Best regards,
Aleksander Alekseev
Import Notes
Reply to msg id not found: 1791665551.452444.1761209220211@mail.yahoo.comReference msg id not found: 1791665551.452444.1761209220211.ref@mail.yahoo.comReference msg id not found: 1791665551.452444.1761209220211@mail.yahoo.com
Hello!
On 23 Oct 2025, at 14:55, Aleksander Alekseev <aleksander@tigerdata.com> wrote:
Secondly, in order to propose a patch please use `git format-patch`
and send it as an attachment. Then register it on the nearest open
commitfest [1].
I think it's not about review yet, but more of a discussing viability and general approach.
The code itself is trivial in this case.
My first reaction was very skeptical too. Yes, this representation (28V4APV8JC9D792M89J185Q000) seems more developer-friendly than default (123e4567-e89b-12d3-a456-426614174000). But why should we bother with propagating one data format over another?
Yet, this format is RFC-blessed. It makes sense to consider providing an alternative to unfriendly format.
The interface you are proposing is ugly and is not composable. The
right way of doing this IMO would be:1. Implement uuid -> bytea and bytea -> uuid casting
2. Implement encode(bytea, 'base32') and decode(text, 'base32')So the overall interface should be like this:
SELECT encode(uuidv7() :: bytea, 'base32');
That's an excellent feedback! Would such conversion be idiomatic for Postgres users?
Are there any other alternative approaches?
The value of converting uuid to base32 is not obvious though, so I
would recommend explaining it in more detail.
Yes, and maybe some examples of other systems that adopted this format would be handy too. Sergey, can you, please, extend reasoning why this particular format is prominent? RFC 4648 describes a bunch of formats.
Consider starting a new
thread for each separate patch.
I think this thread is fine for discussing.
Thank you!
Best regards, Andrey Borodin.
On Thu, 23 Oct 2025 at 15:07, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
SELECT encode(uuidv7() :: bytea, 'base32');
That's an excellent feedback! Would such conversion be idiomatic for Postgres users?
Are there any other alternative approaches?
Agreed that extending the encode function is the way to go. An example
of that is the recently added support for base64url:
https://git.postgresql.org/cgit/postgresql.git/commit/?h=REL_18_0&id=e1d917182c1953b16b32a39ed2fe38e3d0823047
The value of converting uuid to base32 is not obvious though, so I
would recommend explaining it in more detail.Yes, and maybe some examples of other systems that adopted this format would be handy too. Sergey, can you, please, extend reasoning why this particular format is prominent? RFC 4648 describes a bunch of formats.
I've definitely used base32 to encode uuids myself. The primary
benefit being shorter strings, while still being able to spell them
out by voice to people without having to specify whether a letter is
upper or lowercase.
Hi,
I'm writing to propose adding two new built-in functions to PostgreSQL that provide compact UUID encoding using the base32hex format.
Firstly, cc:'ing a few dozens of people is not the best way to get
attention to your patch. Please don't do this.[...]
I checked pgsql-hackers@ archive [1]/messages/by-id/CAJ7c6TOramr1UTLcyB128LWMqita1Y7=arq3KHaU=qikf5yKOQ@mail.gmail.com and if I understand correctly
Sergey is not on the mailing list. So people that were not cc:'ed
didn't receive his e-mail. I attached the original text for those
interested and also for history.
Sergey, please make sure you are subscribed to the mailing list [2]https://www.postgresql.org/list/.
[1]: /messages/by-id/CAJ7c6TOramr1UTLcyB128LWMqita1Y7=arq3KHaU=qikf5yKOQ@mail.gmail.com
[2]: https://www.postgresql.org/list/
--
Best regards,
Aleksander Alekseev
Attachments:
hi-hackers.txttext/plain; charset=US-ASCII; name=hi-hackers.txtDownload
The value of converting uuid to base32 is not obvious though, so I>> would recommend explaining it in more detail.
Yes, and maybe some examples of other systems that adopted this format would be handy too.
DNSSEC (https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions)
many encoders and decoders
Sergey, can you, please, extend reasoning why this particular format is prominent? RFC 4648 describes a bunch of formats.
Best regards, Andrey Borodin.
Base32hex:1. Preserves sort order (unlike base64)2. Compact3. Standardized and therefore implemented consistently everywhere4. Implemented in many programming languages' standard libraries5. Does not require specifying character case during dictation6. Has simple and high-performance encoding and decoding algorithms (necessary for system integration using JSON)
The only compact text encoding eliminates the problem of incompatibility. The authors and contributors of RFC 9562 were categorically against having multiple encodings for UUIDs. They wanted to have only one compact, sort-order-preserving text encoding. For compatibility, they added the canonical UUID format. Due to time constraints, the compact encoding was not included in RFC 9562.
In databases, UUIDs should preferably be stored in binary format (the UUID type in PostgreSQL) according to RFC 9562.
Intermediate formats (bytea) reduce performance, which is the very reason we even abandoned the more compact base36 encoding.
On Thu, Oct 23, 2025 at 10:34 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
The value of converting uuid to base32 is not obvious though, so I
would recommend explaining it in more detail.Yes, and maybe some examples of other systems that adopted this format would be handy too.
DNSSEC (https://en.wikipedia.org/wiki/Domain_Name_System_Security_Extensions)
many encoders and decodersSergey, can you, please, extend reasoning why this particular format is prominent? RFC 4648 describes a bunch of formats.
Best regards, Andrey Borodin.
Base32hex:
1. Preserves sort order (unlike base64)
2. Compact
3. Standardized and therefore implemented consistently everywhere
4. Implemented in many programming languages' standard libraries
5. Does not require specifying character case during dictation
6. Has simple and high-performance encoding and decoding algorithms (necessary for system integration using JSON)The only compact text encoding eliminates the problem of incompatibility. The authors and contributors of RFC 9562 were categorically against having multiple encodings for UUIDs. They wanted to have only one compact, sort-order-preserving text encoding. For compatibility, they added the canonical UUID format. Due to time constraints, the compact encoding was not included in RFC 9562.
In databases, UUIDs should preferably be stored in binary format (the UUID type in PostgreSQL) according to RFC 9562.
Intermediate formats (bytea) reduce performance, which is the very reason we even abandoned the more compact base36 encoding.
Given that what uuid_to_base32hex() actually does is encoding the
input UUID, I find that it could be confusing if we have a similar
function other than encode() function. Also, we could end up
introducing as many encoding and decoding functions dedicated for UUID
as we want to support encoding methods, bloating the functions.
So as the first step, +1 for supporting base32hex for encode() and
decode() functions and supporting the UUID <-> bytea conversion. I
believe it would cover most use cases and the cost of UUID <-> bytea
conversion is negligible.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Given that what uuid_to_base32hex() actually does is encoding theinput UUID, I find that it could be confusing if we have a similar
function other than encode() function. Also, we could end up
introducing as many encoding and decoding functions dedicated for UUID
as we want to support encoding methods, bloating the functions.
So as the first step, +1 for supporting base32hex for encode() and
decode() functions and supporting the UUID <-> bytea conversion. I
believe it would cover most use cases and the cost of UUID <-> bytea
conversion is negligible.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Masahiko,
I see you're in favor of base32hex encoding. That's great!
Your arguments make sense, and I generally support enhancing the standard encode() and decode() functions to handle base32hex. It seems like the right approach from a developer experience standpoint.
However, I'm unclear about some implementation aspects. Why add conversions between UUID and bytea data types? Wouldn't that require creating dedicated UUID <-> bytea conversion functions? Instead, could we implement encode() as polymorphic to handle UUID type inputs directly? For decode(), we'd need some way (a parameter?) to specify the UUID output type instead of bytea. Another option would be automatic type casting when inserting bytea data into UUID columns. Neither an extra parameter nor additional type casting seems ideal to me, though I don't have better alternatives. But actually, for a short UUID text encoding to succeed, it's more important that it becomes the single, de facto standard. We should avoid supporting multiple encodings, just as the authors and contributors of RFC 9562 did: https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817 Therefore, whenever possible, encode() and decode() should support just one UUID text encoding, namely base32hex.
Best regards,Sergey Prokhorenko
On Thu, Oct 23, 2025 at 3:46 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Given that what uuid_to_base32hex() actually does is encoding the
input UUID, I find that it could be confusing if we have a similar
function other than encode() function. Also, we could end up
introducing as many encoding and decoding functions dedicated for UUID
as we want to support encoding methods, bloating the functions.So as the first step, +1 for supporting base32hex for encode() and
decode() functions and supporting the UUID <-> bytea conversion. I
believe it would cover most use cases and the cost of UUID <-> bytea
conversion is negligible.Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.comMasahiko,
I see you're in favor of base32hex encoding. That's great!
Your arguments make sense, and I generally support enhancing the standard encode() and decode() functions to handle base32hex. It seems like the right approach from a developer experience standpoint.
However, I'm unclear about some implementation aspects. Why add conversions between UUID and bytea data types? Wouldn't that require creating dedicated UUID <-> bytea conversion functions? Instead, could we implement encode() as polymorphic to handle UUID type inputs directly? For decode(), we'd need some way (a parameter?) to specify the UUID output type instead of bytea. Another option would be automatic type casting when inserting bytea data into UUID columns. Neither an extra parameter nor additional type casting seems ideal to me, though I don't have better alternatives.
While we can implement something like decode(uuid, text), I don't
think we can implement decode() in the way you proposed unless I'm
missing something.
I think the conversion support between UUID and bytea is useful in
general, not limited to encode()/decode() support. And users would be
able to create wrapper functions if they don't want to add casting for
every encode() and decode() calls. For example,
create function uuid_to_base32(uuid) returns text language sql immutable strict
begin atomic
select encode($1::bytea, 'base32hex');
end;
Since such functions are inlineable, the different between executing
encode(uuid_data::bytea, 'base32hex') and encode(uuid_data,
'base32hex') would only be the conversion; one palloc and one memcpy.
But actually, for a short UUID text encoding to succeed, it's more important that it becomes the single, de facto standard. We should avoid supporting multiple encodings, just as the authors and contributors of RFC 9562 did: https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817 Therefore, whenever possible, encode() and decode() should support just one UUID text encoding, namely base32hex.
I guess it's ultimately the developer's choice, no? For example, if
they are using multiple databases (or data processing platforms) in
their system and 'hex' is the only encoding that all components can
encode and decode, they might choose 'hex' encoding.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu, Oct 23, 2025 at 3:46 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Given that what uuid_to_base32hex() actually does is encoding the
input UUID, I find that it could be confusing if we have a similar
function other than encode() function. Also, we could end up
introducing as many encoding and decoding functions dedicated for UUID
as we want to support encoding methods, bloating the functions.So as the first step, +1 for supporting base32hex for encode() and
decode() functions and supporting the UUID <-> bytea conversion. I
believe it would cover most use cases and the cost of UUID <-> bytea
conversion is negligible.Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.comMasahiko,
I see you're in favor of base32hex encoding. That's great!
Your arguments make sense, and I generally support enhancing the standard encode() and decode() functions to handle base32hex. It seems like the right approach from a developer experience standpoint.
However, I'm unclear about some implementation aspects. Why add conversions between UUID and bytea data types? Wouldn't that require creating dedicated UUID <-> bytea conversion functions? Instead, could we implement encode() as polymorphic to handle UUID type inputs directly? For decode(), we'd need some way (a parameter?) to specify the UUID output type instead of bytea. Another option would be automatic type casting when inserting bytea data into UUID columns. Neither an extra parameter nor additional type casting seems ideal to me, though I don't have better alternatives.
While we can implement something like decode(uuid, text), I don't
think we can implement decode() in the way you proposed unless I'm
missing something.
I think the conversion support between UUID and bytea is useful in
general, not limited to encode()/decode() support. And users would be
able to create wrapper functions if they don't want to add casting for
every encode() and decode() calls. For example,
create function uuid_to_base32(uuid) returns text language sql immutable strict
begin atomic
select encode($1::bytea, 'base32hex');
end;
Since such functions are inlineable, the different between executing
encode(uuid_data::bytea, 'base32hex') and encode(uuid_data,
'base32hex') would only be the conversion; one palloc and one memcpy.
But actually, for a short UUID text encoding to succeed, it's more important that it becomes the single, de facto standard. We should avoid supporting multiple encodings, just as the authors and contributors of RFC 9562 did: https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817 Therefore, whenever possible, encode() and decode() should support just one UUID text encoding, namely base32hex.
I guess it's ultimately the developer's choice, no? For example, if
they are using multiple databases (or data processing platforms) in
their system and 'hex' is the only encoding that all components can
encode and decode, they might choose 'hex' encoding.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
____________________________________________
Masahiko,
Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.
But here is a solution that I hope will satisfy everyone:
encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJOdecode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
I don't see any real business need for UUID <-> bytea conversions.
Best regards,Sergey Prokhorenko
On Fri, Oct 24, 2025 at 12:17 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Masahiko,
Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.
But here is a solution that I hope will satisfy everyone:
encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
Does it mean the first argument is uuid type data and when
'uuid_to_base32hex' is specified as the format the function requires a
uuid data at the first argument? I could not understand the difference
between specifying 'based32hex' and 'uuid_to_base32hex' when encoding
UUID data with base32hex encoding.
decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
Suppose that the decode() takes text data at the first argument and
returns UUID data, the function signature would be decode(text, text)
-> uuid. But we cannot create two functions that have the same name
and the same argument types.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Friday 24 October 2025 at 09:24:15 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 24, 2025 at 12:17 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
Masahiko,
Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.
But here is a solution that I hope will satisfy everyone:
encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
Does it mean the first argument is uuid type data and when
'uuid_to_base32hex' is specified as the format the function requires a
uuid data at the first argument?
Yes, that's right.PostgreSQL will automatically cast the string '019535d9-3df7-79fb-b466-fa907fa17f9e' to the uuid type, since the format is correct.
I could not understand the difference
between specifying 'based32hex' and 'uuid_to_base32hex' when encoding
UUID data with base32hex encoding.
1. Specifying 'based32hex' in encode function means the first parameter is of bytea type as usual.
But specifying 'uuid_to_base32hex' means the first parameter is of uuid type.
2. The encode function does not yet support format based32hex. Therefore, it is not known whether padding ===== should be used.
But padding ===== is not used when specifying 'uuid_to_base32hex'.
decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
Suppose that the decode() takes text data at the first argument and
returns UUID data, the function signature would be decode(text, text)
-> uuid. But we cannot create two functions that have the same name
and the same argument types.
Yes, you're right. This is a problem that can't be solved without composite return values. We clearly took the wrong approach by coupling UUID conversion with encode/decode functions, which only apply to bytea. UUID and bytea are fundamentally different data types. Meanwhile, PostgreSQL has over 30 other type conversion functions that deal with other data types. For example, array_to_string, string_to_array, jsonb_to_record, to_char, to_timestamp, and to_hex. In this situation, the best solution would be to revert to the original uuid_to_base32hex() and base32hex_to_uuid() functions rather than deal with type incompatibility issues.
On Fri, Oct 24, 2025 at 3:42 PM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:
On Friday 24 October 2025 at 09:24:15 pm GMT+3, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Fri, Oct 24, 2025 at 12:17 AM Sergey Prokhorenko
<sergeyprokhorenko@yahoo.com.au> wrote:Masahiko,
Developers will still be able to use the long canonical 'hex' UUID format for compatibility. But the short format is not a developer choice, but a convention. We mustn't allow a situation where 25% of systems use base32hex, 25% use Crocksford's Base32, 25% use base36, and 25% even use erroneously sorted base64. That's a very real nightmare. You, too, have every reason not to want to increase the number of built-in functions in PostgreSQL.
But here is a solution that I hope will satisfy everyone:
encode('019535d9-3df7-79fb-b466-fa907fa17f9e', 'uuid_to_base32hex') -> 06AJBM9TUTSVND36VA87V8BVJO
Does it mean the first argument is uuid type data and when
'uuid_to_base32hex' is specified as the format the function requires a
uuid data at the first argument?Yes, that's right.
PostgreSQL will automatically cast the string '019535d9-3df7-79fb-b466-fa907fa17f9e' to the uuid type, since the format is correct.I could not understand the difference
between specifying 'based32hex' and 'uuid_to_base32hex' when encoding
UUID data with base32hex encoding.1. Specifying 'based32hex' in encode function means the first parameter is of bytea type as usual.
But specifying 'uuid_to_base32hex' means the first parameter is of uuid type.
2. The encode function does not yet support format based32hex. Therefore, it is not known whether padding ===== should be used.
But padding ===== is not used when specifying 'uuid_to_base32hex'.decode('06AJBM9TUTSVND36VA87V8BVJO', 'base32hex_to_uuid') -> 019535d9-3df7-79fb-b466-fa907fa17f9e
Suppose that the decode() takes text data at the first argument and
returns UUID data, the function signature would be decode(text, text)
-> uuid. But we cannot create two functions that have the same name
and the same argument types.Yes, you're right. This is a problem that can't be solved without composite return values. We clearly took the wrong approach by coupling UUID conversion with encode/decode functions, which only apply to bytea. UUID and bytea are fundamentally different data types. Meanwhile, PostgreSQL has over 30 other type conversion functions that deal with other data types. For example, array_to_string, string_to_array, jsonb_to_record, to_char, to_timestamp, and to_hex. In this situation, the best solution would be to revert to the original uuid_to_base32hex() and base32hex_to_uuid() functions rather than deal with type incompatibility issues.
I think that type conversions and data encodings serve different
purposes. Type conversions express semantic transformations between
data types (e.g., text -> timestamp, jsonb -> record), while encodings
are simply representations of binary data as text. For the latter,
PostgreSQL already provides a well-defined abstraction through
encode()/decode(). Mixing encoding logic with type-specific
conversions would blur that boundary.
Also, if we start adding dedicated functions for each supported
encoding (uuid_to_base32hex, uuid_to_hex etc.), the number of
functions could easily multiply. That’s exactly what encode() and
decode() were designed to avoid.
While I agree that base32hex should be the recommended, I'm really not
sure it's a good design that PostgreSQL core should enforce it as the
only built-in method. It seems better to me to provide flexible
primitives, encode()/decode() plus UUID <-> bytea casts, and document
base32hex as the canonical convention (if necessary). Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure. I'd like to
hear opinions from other hackers too.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.
I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
Best regards, Andrey Borodin.
Attachments:
v2-0001-Add-uuid_encode-and-uuid_decode-functions.patchapplication/octet-stream; name=v2-0001-Add-uuid_encode-and-uuid_decode-functions.patch; x-unix-mode=0644Download+478-1
On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:>
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.
On Saturday 25 October 2025 at 09:07:39 pm GMT+3, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
Best regards, Andrey Borodin.
If base32hex becomes the default string representation for UUIDs in PostgreSQL, then the canonical UUID string representation may be added into these functions for backward compatibility.
Best regards,
Sergey Prokhorenko
Hi,
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
I see no reason why we should forbid the use of base32 encoding with
bytea. Or have different functions for this e.g. uuid_encode() and
encode(). To me it looks like a poor API design.
--
Best regards,
Aleksander Alekseev
Hi,
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
I see no reason why we should forbid the use of base32 encoding with
bytea. Or have different functions for this e.g. uuid_encode() and
encode(). To me it looks like a poor API design.
--
Best regards,
Aleksander Alekseev
____________________________________________
It seems that bytea is your personal interest, since you continue to impose your bytea when a better solution has already been found with uuid_encode() and uuid_decode().
The bytea proposal has a lot of drawbacks:
1. It requires unnecessary casting in addition to encoding/decoding. This complicates the interface and creates unnecessary cognitive load on developers. It also creates additional CPU load, although perhaps only a small amount.
2. The encoding function encourages developers to use the slightly more compact base64 encoding (see https://www.postgresql.org/docs/current/functions-binarystring.html), which doesn't preserve sort order, isn't URL-safe, is case-sensitive, and requires specifying the case of letters when dictating. This also creates a serious problem of incompatibility between UUID encodings.
3. The hashing functions used by bytea create a temptation to implement popular, idiotic ideas for hashing UUIDs to obscure their creation date and to hide internal keys from clients.
4. Other various functions for bytea allow the construction of Frankenstein identifiers that compete with UUIDv7, which could negatively impact the reputation of UUIDs.
The bytea type has nothing in common with the uuid type other than the binary encoding. Therefore, the bytea <-> uuid cast can only encourage abuse and errors, creating the illusion of unlimited developer power.
The bytea proposal has no merit whatsoever. It's the worst, most insafe, and most harmful design, undermining efforts to widely adopt UUIDv7 and improve PostgreSQL.
Hi Sergey,
It seems that bytea is your personal interest, since you continue to impose your bytea when a better solution has already been found with uuid_encode() and uuid_decode().
In the previous messages Masahiko Sawada wrote:
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure. I'd like to
hear opinions from other hackers too.
I merely shared my personal opinion on why I think this is a bad idea.
Let's see what other people think.
--
Best regards,
Aleksander Alekseev
Hi Sergey,
It seems that bytea is your personal interest, since you continue to impose your bytea when a better solution has already been found with uuid_encode() and uuid_decode().
In the previous messages Masahiko Sawada wrote:
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure. I'd like to
hear opinions from other hackers too.
I merely shared my personal opinion on why I think this is a bad idea.
Let's see what other people think.
--
Best regards,
Aleksander Alekseev
_____________________
You didn't give any arguments in favor of your opinion
On Sat, Oct 25, 2025 at 11:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
Thank you for drafting the patch! But I find it potentially confusing
to have different encoding methods for bytea and UUID types. I don't
see a compelling reason why the core should support base32hex
exclusively for the UUID data type, nor why base32hex should be the
only encoding method that the core provides for UUIDs (while we can
use it by default).
If we implement uuid_encode() and uuid_decode(), we might end up
creating similar encoding and decoding functions for other data types
as well, which doesn't seem like the best approach. I still believe
that extending the existing encode() and decode() functions is a
better starting point.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Sat, Oct 25, 2025 at 11:07 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
On 25 Oct 2025, at 04:31, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Or providing
'uuid_encode(uuid, format text) -> text' and 'uuid_decode(text, format
text) -> uuid' might make sense too, but I'm not sure.I like the idea, so I drafted a prototype for discussion.
Though I do not see what else methods should be provided along with added one...
Thank you for drafting the patch! But I find it potentially confusing
to have different encoding methods for bytea and UUID types. I don't
see a compelling reason why the core should support base32hex
exclusively for the UUID data type, nor why base32hex should be the
only encoding method that the core provides for UUIDs (while we can
use it by default).
If we implement uuid_encode() and uuid_decode(), we might end up
creating similar encoding and decoding functions for other data types
as well, which doesn't seem like the best approach. I still believe
that extending the existing encode() and decode() functions is a
better starting point.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
________________________________________________
Masahiko,
I wanted to highlight an important discussion among the authors and contributors of RFC 9562 regarding UUID text encoding:
https://github.com/uuid6/new-uuid-encoding-techniques-ietf-draft/discussions/17#discussioncomment-10614817
The RFC 9562 authors and contributors reached consensus that standardizing an alternate short text format for UUIDs is important. While the community debated between base32hex (RFC 4648) and Crockford's Base32, both were recognized for preserving lexicographical sort order, a critical property for database primary keys and URL-safe identifiers. Time constraints prevented inclusion in RFC 9562, but the discussion established that base32hex is the existing standard format already defined in RFC 4648, Section 7, specifically designed for sort-preserving encoding.
This context is crucial because it underscores that the uuid type, as a first-class concept, deserves its own standardized text encoding.
Regarding the proposal to couple UUID encoding with the bytea type through encode()/decode() functions: I understand the appeal of reusing existing infrastructure, but this creates a conceptual mismatch. UUID is a distinct semantic type in PostgreSQL, not merely binary data. The bytea type has existed for decades without base32hex encoding, and that's worked fine, because bytea represents arbitrary binary data, not universally unique identifiers with specific structural properties and needs.
Consider PostgreSQL's own design philosophy. The documentation states:
"9.5. Binary String Functions and Operators This section describes functions and operators for examining and manipulating binary strings, that is values of type bytea. Many of these are equivalent, in purpose and syntax, to the text-string functions described in the previous section."
PostgreSQL maintains parallel function sets for text strings and bytea precisely because they serve different purposes, despite the implementation overhead. The uuid type deserves the same treatment: it's not just another binary blob, but a type with specific semantics (uniqueness, version bits, variant encoding) and use cases (distributed identifiers, sortable keys, URL-safe representations).
Why should uuid be treated as a second-class citizen and forced through bytea conversion, when text and bytea each have their own dedicated function families?
You've been very careful in your previous arguments to separate data type conversion from encoding/decoding operations. I appreciate that rigor. However, the current proposal to route UUID encoding through bytea contradicts that principle. It merges two fundamentally different data types for convenience rather than correctness.
If someone wants to add base32hex encoding/decoding to bytea for general binary data operations, that's a worthwhile but separate discussion. The uuid type, however, needs native base32hex support to fulfill its role as a first-class PostgreSQL type with a standardized compact text representation, as recommended by the RFC 9562 community.
I would value your thoughts on these arguments.
Best regards,
Sergey Prokhorenko