encode/decode support for base64url
Hello,
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?
--
Przemysław Sztoch | Mobile +48 509 99 00 66
On 4 Mar 2025, at 09:54, Przemysław Sztoch <przemyslaw@sztoch.pl> wrote:
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?
While not a frequent ask, it has been mentioned in the past. I think it would
make sense to add so please do submit a patch for it for consideration.
--
Daniel Gustafsson
Hi,
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?While not a frequent ask, it has been mentioned in the past. I think it would
make sense to add so please do submit a patch for it for consideration.
IMO it would be nice to have.
Would you like to submit such a patch or are you merely suggesting an
idea for others to implement?
--
Best regards,
Aleksander Alekseev
On 7 Mar 2025, at 4:40 PM, Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi,
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?While not a frequent ask, it has been mentioned in the past. I think it would
make sense to add so please do submit a patch for it for consideration.IMO it would be nice to have.
Would you like to submit such a patch or are you merely suggesting an
idea for others to implement?--
Best regards,
Aleksander Alekseev
Just to confirm:
In a plan SQL flavor, we’re talking about something like this, correct?
CREATE FUNCTION base64url_encode(input bytea) RETURNS text AS $$
SELECT regexp_replace(
replace(replace(encode(input, 'base64'), '+', '-'), '/', '_'),
'=+$', '', 'g'
);
$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION base64url_decode(input text) RETURNS bytea AS $$
SELECT decode(
rpad(replace(replace(input, '-', '+'), '_', '/'), (length(input) + 3) & ~3, '='),
'base64'
);
$$ LANGUAGE sql IMMUTABLE;
With minimal testing, this yields the same results with https://base64.guru/standards/base64url/encode
select base64url_encode('post+gres')
base64url_encode
------------------
cG9zdCtncmVz
(1 row)
On Sun, Mar 9, 2025 at 12:28 AM Florents Tselai <florents.tselai@gmail.com>
wrote:
On 7 Mar 2025, at 4:40 PM, Aleksander Alekseev <aleksander@timescale.com>
wrote:Hi,
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?While not a frequent ask, it has been mentioned in the past. I think it
would
make sense to add so please do submit a patch for it for consideration.IMO it would be nice to have.
Would you like to submit such a patch or are you merely suggesting an
idea for others to implement?--
Best regards,
Aleksander AlekseevJust to confirm:
In a plan SQL flavor, we’re talking about something like this, correct?
CREATE FUNCTION base64url_encode(input bytea) RETURNS text AS $$
SELECT regexp_replace(
replace(replace(encode(input, 'base64'), '+', '-'), '/', '_'),
'=+$', '', 'g'
);
$$ LANGUAGE sql IMMUTABLE;CREATE FUNCTION base64url_decode(input text) RETURNS bytea AS $$
SELECT decode(
rpad(replace(replace(input, '-', '+'), '_', '/'), (length(input) + 3)
& ~3, '='),
'base64'
);
$$ LANGUAGE sql IMMUTABLE;With minimal testing, this yields the same results with
https://base64.guru/standards/base64url/encodeselect base64url_encode('post+gres')
base64url_encode
------------------
cG9zdCtncmVz
(1 row)
Here's a C implementation for this, along with some tests and documentation.
Tests are copied from cpython's implementation of urlsafe_b64encode and
urlsafe_b64decode.
The signatures look like this:
SELECT base64url_encode('www.postgresql.org'::bytea) →
d3d3LnBvc3RncmVzcWwub3Jn
SELECT convert_from(base64url_decode('d3d3LnBvc3RncmVzcWwub3Jn'), 'UTF8') →
http://www.postgresql.org
Attachments:
v1-0001-Add-base64url_encode-base64url_decode-functions-a.patchapplication/octet-stream; name=v1-0001-Add-base64url_encode-base64url_decode-functions-a.patchDownload
From 012d585897684a414263dd7f7be7a08ac6761d2d Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Mon, 10 Mar 2025 12:55:53 +0200
Subject: [PATCH v1] Add base64url_encode, base64url_decode functions along
with some tests nddocumentation
---
doc/src/sgml/func.sgml | 44 +++++++++++-
src/backend/utils/adt/encode.c | 92 +++++++++++++++++++++++++
src/include/catalog/pg_proc.dat | 7 ++
src/test/regress/expected/strings.out | 97 +++++++++++++++++++++++++++
src/test/regress/sql/strings.sql | 32 +++++++++
5 files changed, 271 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 51dd8ad6571..4c50e9537ff 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4794,7 +4794,49 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
<returnvalue>\x5678</returnvalue>
</para></entry>
</row>
- </tbody>
+
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm>
+ <primary>base64url_encode</primary>
+ </indexterm>
+ <function>base64url_encode</function> ( <parameter>input</parameter> <type>bytea</type> )
+ <returnvalue>text</returnvalue>
+ </para>
+ <para>
+ Encodes the given <parameter>input</parameter> in Base64URL format, replacing
+ '+' with '-', '/' with '_', and removing padding ('='). This is useful for
+ safe encoding in URLs and filenames.
+ </para>
+ <para>
+ <literal>base64url_encode('www.postgresql.org'::bytea)</literal>
+ <returnvalue>d3d3LnBvc3RncmVzcWwub3Jn</returnvalue>
+ </para>
+ </entry>
+ </row>
+
+ <row>
+ <entry role="func_table_entry">
+ <para role="func_signature">
+ <indexterm>
+ <primary>base64url_decode</primary>
+ </indexterm>
+ <function>base64url_decode</function> ( <parameter>input</parameter> <type>text</type> )
+ <returnvalue>bytea</returnvalue>
+ </para>
+ <para>
+ Decodes the given Base64URL-encoded <parameter>input</parameter> back into
+ its original binary format. Converts '-' to '+', '_' to '/', and restores
+ padding if needed.
+ </para>
+ <para>
+ <literal>convert_from(base64url_decode('d3d3LnBvc3RncmVzcWwub3Jn'), 'UTF8')</literal>
+ <returnvalue>www.postgresql.org</returnvalue>
+ </para>
+ </entry>
+ </row>
+ </tbody>
</tgroup>
</table>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..2d537bf3096 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -19,6 +19,7 @@
#include "utils/builtins.h"
#include "utils/memutils.h"
#include "varatt.h"
+#include "common/base64.h"
/*
@@ -140,6 +141,97 @@ binary_decode(PG_FUNCTION_ARGS)
PG_RETURN_BYTEA_P(result);
}
+/* Base64URL Encode */
+Datum
+base64url_encode(PG_FUNCTION_ARGS)
+{
+ bytea *input;
+ int input_len;
+ int base64_len;
+ char *base64;
+ int encoded_len;
+ char *base64url;
+ int j, i;
+
+ /* Get input */
+ input = PG_GETARG_BYTEA_P(0);
+ input_len = VARSIZE(input) - VARHDRSZ;
+
+ /* Compute required buffer size */
+ base64_len = pg_b64_enc_len(input_len);
+ base64 = palloc(base64_len);
+
+ /* Encode the data */
+ encoded_len = pg_b64_encode(VARDATA(input), input_len, base64, base64_len);
+ if (encoded_len < 0)
+ ereport(ERROR, (errmsg("base64 encoding failed")));
+
+ /* Convert to Base64URL format (replace '+' → '-', '/' → '_', remove '=') */
+ base64url = palloc(encoded_len); /* Allocate same size */
+ j = 0;
+ for (i = 0; i < encoded_len; i++)
+ {
+ if (base64[i] == '+')
+ base64url[j++] = '-';
+ else if (base64[i] == '/')
+ base64url[j++] = '_';
+ else if (base64[i] != '=')
+ base64url[j++] = base64[i];
+ }
+
+ /* Return properly sized text datum */
+ PG_RETURN_TEXT_P(cstring_to_text_with_len(base64url, j));
+}
+
+Datum
+base64url_decode(PG_FUNCTION_ARGS)
+{
+ text *input;
+ char *input_str;
+ int input_len;
+ int decoded_len, actual_decoded_len;
+ int pad_len;
+ char *base64;
+ int i;
+ bytea *decoded;
+
+ /* Get input */
+ input = PG_GETARG_TEXT_P(0);
+ input_str = text_to_cstring(input);
+ input_len = strlen(input_str);
+
+ /* Prepare a Base64 string with padding */
+ pad_len = (4 - (input_len % 4)) % 4;
+ base64 = palloc(input_len + pad_len);
+
+ /* Convert Base64URL to standard Base64 */
+ for (i = 0; i < input_len; i++)
+ {
+ if (input_str[i] == '-')
+ base64[i] = '+';
+ else if (input_str[i] == '_')
+ base64[i] = '/';
+ else
+ base64[i] = input_str[i];
+ }
+
+ /* Add necessary padding */
+ while (pad_len-- > 0)
+ base64[i++] = '=';
+
+ /* Decode Base64 */
+ decoded_len = pg_b64_dec_len(i);
+ decoded = (bytea *) palloc(VARHDRSZ + decoded_len);
+ actual_decoded_len = pg_b64_decode(base64, i, VARDATA(decoded), decoded_len);
+
+ if (actual_decoded_len < 0)
+ ereport(ERROR, (errmsg("invalid base64url input")));
+
+ /* Set correct size */
+ SET_VARSIZE(decoded, VARHDRSZ + actual_decoded_len);
+
+ PG_RETURN_BYTEA_P(decoded);
+}
/*
* HEX
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cede992b6e2..bdccf7ac16d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1184,6 +1184,13 @@
proname => 'int8', prorettype => 'int8',
proargtypes => 'bytea', prosrc => 'bytea_int8' },
+{ oid => '8590', descr => 'base64url_encode',
+ proname => 'base64url_encode', prorettype => 'text',
+ proargtypes => 'bytea', prosrc => 'base64url_encode' },
+{ oid => '8591', descr => 'base64url_decode',
+ proname => 'base64url_decode', prorettype => 'bytea',
+ proargtypes => 'text', prosrc => 'base64url_decode' },
+
{ oid => '449', descr => 'hash',
proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f8cba9f5b24..5fa12023b3f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2792,3 +2792,100 @@ ERROR: invalid Unicode code point: 2FFFFF
SELECT unistr('wrong: \xyz');
ERROR: invalid Unicode escape
HINT: Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+-- base64url_encode
+SELECT base64url_encode('www.postgresql.org'::bytea);
+ base64url_encode
+--------------------------
+ d3d3LnBvc3RncmVzcWwub3Jn
+(1 row)
+
+SELECT base64url_encode(E'\\x00'::bytea); -- Expected: 'AA'
+ base64url_encode
+------------------
+ AA
+(1 row)
+
+SELECT base64url_encode('a'::bytea); -- Expected: 'YQ'
+ base64url_encode
+------------------
+ YQ
+(1 row)
+
+SELECT base64url_encode('ab'::bytea); -- Expected: 'YWI'
+ base64url_encode
+------------------
+ YWI
+(1 row)
+
+SELECT base64url_encode('abc'::bytea); -- Expected: 'YWJj'
+ base64url_encode
+------------------
+ YWJj
+(1 row)
+
+SELECT base64url_encode(''::bytea); -- Expected: ''
+ base64url_encode
+------------------
+
+(1 row)
+
+SELECT base64url_encode(
+ 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#0^&*();:<>,. []{}'::bytea
+ );
+ base64url_encode
+----------------------------------------------------------------------------------------------------------------
+ YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODkhQCMwXiYqKCk7Ojw-LC4gW117fQ
+(1 row)
+
+-- Expected:
+-- 'YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODkhQCMwXiYqKCk7Ojw-'
+-- Note: The last character is '-' instead of '/' due to Base64URL encoding.
+-- base64url_decode
+SELECT base64url_decode('d3d3LnBvc3RncmVzcWwub3Jn'); -- Expected: 'www.postgresql.org'
+ base64url_decode
+--------------------
+ www.postgresql.org
+(1 row)
+
+SELECT base64url_decode('AA'); -- Expected: E'\\x00'
+ base64url_decode
+------------------
+ \000
+(1 row)
+
+SELECT base64url_decode('YQ'); -- Expected: 'a'
+ base64url_decode
+------------------
+ a
+(1 row)
+
+SELECT base64url_decode('YWI'); -- Expected: 'ab'
+ base64url_decode
+------------------
+ ab
+(1 row)
+
+SELECT base64url_decode('YWJj'); -- Expected: 'abc'
+ base64url_decode
+------------------
+ abc
+(1 row)
+
+SELECT base64url_decode(''); -- Expected: ''
+ base64url_decode
+------------------
+
+(1 row)
+
+SELECT convert_from(
+ base64url_decode(
+ 'YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODkhQCMwXiYqKCk7Ojw-'
+ ),
+ 'UTF8'
+ );
+ convert_from
+-----------------------------------------------------------------------------
+ abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#0^&*();:<>
+(1 row)
+
+-- Expected: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#0^&*();:<>,. []{}'
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 4deb0683d57..5a199fcc83a 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -877,3 +877,35 @@ SELECT unistr('wrong: \udb99\u0061');
SELECT unistr('wrong: \U0000db99\U00000061');
SELECT unistr('wrong: \U002FFFFF');
SELECT unistr('wrong: \xyz');
+
+-- base64url_encode
+SELECT base64url_encode('www.postgresql.org'::bytea);
+SELECT base64url_encode(E'\\x00'::bytea); -- Expected: 'AA'
+SELECT base64url_encode('a'::bytea); -- Expected: 'YQ'
+SELECT base64url_encode('ab'::bytea); -- Expected: 'YWI'
+SELECT base64url_encode('abc'::bytea); -- Expected: 'YWJj'
+SELECT base64url_encode(''::bytea); -- Expected: ''
+SELECT base64url_encode(
+ 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#0^&*();:<>,. []{}'::bytea
+ );
+-- Expected:
+-- 'YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODkhQCMwXiYqKCk7Ojw-'
+-- Note: The last character is '-' instead of '/' due to Base64URL encoding.
+
+-- base64url_decode
+SELECT base64url_decode('d3d3LnBvc3RncmVzcWwub3Jn'); -- Expected: 'www.postgresql.org'
+
+SELECT base64url_decode('AA'); -- Expected: E'\\x00'
+SELECT base64url_decode('YQ'); -- Expected: 'a'
+SELECT base64url_decode('YWI'); -- Expected: 'ab'
+SELECT base64url_decode('YWJj'); -- Expected: 'abc'
+SELECT base64url_decode(''); -- Expected: ''
+SELECT convert_from(
+ base64url_decode(
+ 'YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXpBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWjAxMjM0NTY3ODkhQCMwXiYqKCk7Ojw-'
+ ),
+ 'UTF8'
+ );
+-- Expected: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#0^&*();:<>,. []{}'
+
+
--
2.48.1
On 10 Mar 2025, at 12:28, Florents Tselai <florents.tselai@gmail.com> wrote:
Here's a C implementation for this, along with some tests and documentation.
Tests are copied from cpython's implementation of urlsafe_b64encode and urlsafe_b64decode.
+ <function>base64url_encode</function> ( <parameter>input</parameter> <type>bytea</type> )
Shouldn't this be modelled around how base64 works with the encode() and
decode() functions, ie encode('123\001', 'base64')?
https://www.postgresql.org/docs/devel/functions-binarystring.html
--
Daniel Gustafsson
On Mon, Mar 10, 2025, 14:32 Daniel Gustafsson <daniel@yesql.se> wrote:
On 10 Mar 2025, at 12:28, Florents Tselai <florents.tselai@gmail.com>
wrote:
Here's a C implementation for this, along with some tests and
documentation.
Tests are copied from cpython's implementation of urlsafe_b64encode and
urlsafe_b64decode.
+ <function>base64url_encode</function> (
<parameter>input</parameter> <type>bytea</type> )Shouldn't this be modelled around how base64 works with the encode() and
decode() functions, ie encode('123\001', 'base64')?https://www.postgresql.org/docs/devel/functions-binarystring.html
--
Daniel Gustafsson
Oh well - you're probably right.
I guess I was blinded by my convenience.
Adding a 'base64url' option there is more appropriate.
Oh well - you're probably right.
I guess I was blinded by my convenience.
Adding a 'base64url' option there is more appropriate.
I agree with it too. It is neater to add "base64url" as a new option for
encode() and decode() SQL functions in encode.c.
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls so that
the frontend, backend applications and extensions can also have access
to these base64url conversions.
Cary Huang
-------------
HighGo Software Inc. (Canada)
cary.huang@highgo.ca
www.highgo.ca
On 07.03.2025 15:40, Aleksander Alekseev wrote:
Hi,
Sometimes support for base64url from RFC 4648 would be useful.
Does anyone else need a patch like this?While not a frequent ask, it has been mentioned in the past. I think it would
make sense to add so please do submit a patch for it for consideration.IMO it would be nice to have.
Would you like to submit such a patch or are you merely suggesting an
idea for others to implement?
1. It is my current workaround:
SELECT convert_from(decode(rpad(translate(jwt_data, E'-_\n', '+/'),
(ceil(length(translate(jwt_data, E'-_\n', '+/')) / 4::float) *
4)::integer, '='::text), 'base64'), 'UTF-8')::jsonb AS jwt_json
But it's not very elegant. I won't propose my own patch, but if someone
does it, I'll be very grateful for it. :-)
2. My colleagues also have a proposal to add hex_space, dec and dec_space.
hex_space and dec_space for obvious readability in some conditions.
dec and dec_space are also sometimes much more convenient for debugging
and interpreting binary data by humans. 3. In addition to base64,
sometimes base32 would be useful (both from rfc4648), which doesn't have
such problems:
The resulting character set is all one case, which can often be
beneficial when using a case-insensitive filesystem, DNS names, spoken
language, or human memory. The result can be used as a file name because
it cannot possibly contain the '/' symbol, which is the Unix path
separator.
--
Przemysław Sztoch | Mobile +48 509 99 00 66
On Tue, Mar 11, 2025 at 12:51 AM Cary Huang <cary.huang@highgo.ca> wrote:
Oh well - you're probably right.
I guess I was blinded by my convenience.
Adding a 'base64url' option there is more appropriate.I agree with it too. It is neater to add "base64url" as a new option for
encode() and decode() SQL functions in encode.c.
Attaching a v2 with that.
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls so that
the frontend, backend applications and extensions can also have access
to these base64url conversions.
We could expose this in base64.c - it'll need some more checking
A few more test cases, especially around padding, are necessary.
I'll come back to this.
Attachments:
v2-0001-Add-base64url-in-encode-decode-functions.patchapplication/octet-stream; name=v2-0001-Add-base64url-in-encode-decode-functions.patchDownload
From 08c1fa81429b3422b6c313f6117f891377ec717e Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Tue, 11 Mar 2025 09:39:22 +0200
Subject: [PATCH v2 1/2] Add "base64url" in encode / decode functions.
Documentation included. Additional tests probably needed especially wrt to
padding. The same for docs.
---
doc/src/sgml/func.sgml | 22 ++++++++
src/backend/utils/adt/encode.c | 81 +++++++++++++++++++++++++++
src/test/regress/expected/strings.out | 52 +++++++++++++++++
src/test/regress/sql/strings.sql | 13 ++++-
4 files changed, 167 insertions(+), 1 deletion(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 51dd8ad6571..5df09db633e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4934,6 +4934,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -4991,6 +4992,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink>. Unlike standard <literal>base64</literal>, it replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and <literal>'/'</literal> with <literal>'_'</literal>
+ to ensure safe usage in URLs and filenames. Additionally, the padding character
+ <literal>'='</literal> is omitted.
+ The <function>decode</function> function automatically handles missing padding
+ by appending the necessary '=' characters before decoding. If the input contains
+ invalid characters or has incorrect padding (when explicitly provided), an error
+ is raised.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..2547e746653 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -415,6 +415,81 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ return ((uint64) srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ size_t rem = srclen % 4;
+
+ uint64 len = (srclen / 4) * 3;
+
+ /* Adjust for missing padding */
+ if (rem == 2)
+ len += 1; /* 2 extra chars → 1 decoded byte */
+ else if (rem == 3)
+ len += 2; /* 3 extra chars → 2 decoded bytes */
+
+ return len;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ uint64 encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++)
+ {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
+
+ /* Trim '=' padding */
+ while (encoded_len > 0 && dst[encoded_len - 1] == '=')
+ encoded_len--;
+
+ /* Ensure null termination */
+ dst[encoded_len] = '\0';
+
+ return encoded_len;
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ char *base64 = palloc(len + 4); /* Allocate extra space */
+ size_t i;
+ size_t pad_len;
+
+ /* Convert Base64URL back to Base64 */
+ for (i = 0; i < len; i++)
+ {
+ if (src[i] == '-')
+ base64[i] = '+';
+ else if (src[i] == '_')
+ base64[i] = '/';
+ else
+ base64[i] = src[i];
+ }
+
+ /* Restore padding if needed */
+ pad_len = (4 - (len % 4)) % 4;
+ while (pad_len--)
+ base64[i++] = '=';
+
+ uint64 decoded_len = pg_base64_decode(base64, i, dst);
+ pfree(base64); /* Free allocated memory */
+
+ return decoded_len;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +681,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index f8cba9f5b24..579840ae394 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2323,6 +2323,58 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+-- base64url
+SELECT encode('\x1234567890abcdef00', 'base64url');
+ encode
+--------------
+ EjRWeJCrze8A
+(1 row)
+
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url');
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Special characters in binary data
+SELECT encode('\x00ff88ee77', 'base64url'); -- Includes null byte (\x00) and high-bit characters
+ encode
+---------
+ AP-I7nc
+(1 row)
+
+SELECT decode(encode('\x00ff88ee77', 'base64url'), 'base64url');
+ decode
+--------------
+ \x00ff88ee77
+(1 row)
+
+-- Padding edge case (length % 3 = 1)
+SELECT encode('\x66', 'base64url'); -- Expected to remove padding
+ encode
+--------
+ Zg
+(1 row)
+
+SELECT decode(encode('\x66', 'base64url'), 'base64url');
+ decode
+--------
+ \x66
+(1 row)
+
+-- Padding edge case (length % 3 = 2)
+SELECT encode('\x6666', 'base64url'); -- Another case where padding is removed
+ encode
+--------
+ ZmY
+(1 row)
+
+SELECT decode(encode('\x6666', 'base64url'), 'base64url'); -- Should match original bytea
+ decode
+--------
+ \x6666
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 4deb0683d57..baa7bb32a76 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -738,7 +738,18 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
'base64'), 'base64');
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
-
+-- base64url
+SELECT encode('\x1234567890abcdef00', 'base64url');
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url');
+-- Special characters in binary data
+SELECT encode('\x00ff88ee77', 'base64url'); -- Includes null byte (\x00) and high-bit characters
+SELECT decode(encode('\x00ff88ee77', 'base64url'), 'base64url');
+-- Padding edge case (length % 3 = 1)
+SELECT encode('\x66', 'base64url'); -- Expected to remove padding
+SELECT decode(encode('\x66', 'base64url'), 'base64url');
+-- Padding edge case (length % 3 = 2)
+SELECT encode('\x6666', 'base64url'); -- Another case where padding is removed
+SELECT decode(encode('\x6666', 'base64url'), 'base64url'); -- Should match original bytea
--
-- get_bit/set_bit etc
--
--
2.48.1
v2-0002-Fix-declaration-after-statement.patchapplication/octet-stream; name=v2-0002-Fix-declaration-after-statement.patchDownload
From 00bf1865d3109d8907eeb9fa58e0d7fa4f548464 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Tue, 11 Mar 2025 09:50:23 +0200
Subject: [PATCH v2 2/2] Fix declaration-after-statement
---
src/backend/utils/adt/encode.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 2547e746653..4887a601917 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -467,6 +467,7 @@ pg_base64url_decode(const char *src, size_t len, char *dst)
char *base64 = palloc(len + 4); /* Allocate extra space */
size_t i;
size_t pad_len;
+ uint64 decoded_len;
/* Convert Base64URL back to Base64 */
for (i = 0; i < len; i++)
@@ -484,7 +485,7 @@ pg_base64url_decode(const char *src, size_t len, char *dst)
while (pad_len--)
base64[i++] = '=';
- uint64 decoded_len = pg_base64_decode(base64, i, dst);
+ decoded_len = pg_base64_decode(base64, i, dst);
pfree(base64); /* Free allocated memory */
return decoded_len;
--
2.48.1
On Tue, Mar 11, 2025 at 10:08 AM Florents Tselai <florents.tselai@gmail.com>
wrote:
On Tue, Mar 11, 2025 at 12:51 AM Cary Huang <cary.huang@highgo.ca> wrote:
Oh well - you're probably right.
I guess I was blinded by my convenience.
Adding a 'base64url' option there is more appropriate.I agree with it too. It is neater to add "base64url" as a new option for
encode() and decode() SQL functions in encode.c.Attaching a v2 with that.
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls so that
the frontend, backend applications and extensions can also have access
to these base64url conversions.We could expose this in base64.c - it'll need some more checking
A few more test cases, especially around padding, are necessary.
I'll come back to this.
Here's a v3 with some (hopefully) better test cases.
Attachments:
v3-base64url.patchapplication/octet-stream; name=v3-base64url.patchDownload
From a3d31f4fe330761840d3e874223ae118926047b2 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Fri, 14 Mar 2025 20:51:33 +0200
Subject: [PATCH] base64url support for encode/decode functions. Refactored and
with better test cases
---
doc/src/sgml/func.sgml | 18 ++++
src/backend/utils/adt/encode.c | 126 ++++++++++++++++++++++++++
src/test/regress/expected/strings.out | 57 ++++++++++++
src/test/regress/sql/strings.sql | 18 ++++
4 files changed, 219 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1c3810e1a04..b47a09dbe05 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4951,6 +4951,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5008,6 +5009,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink>. Unlike standard <literal>base64</literal>, it replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and <literal>'/'</literal> with <literal>'_'</literal>
+ to ensure safe usage in URLs and filenames. Additionally, the padding character
+ <literal>'='</literal> is omitted.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..9522eecd4be 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -415,6 +415,126 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+/*
+ * Calculate the length of base64url encoded output for given input length
+ * Base64url encoding: 3 bytes -> 4 chars, padding to multiple of 4
+ */
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ uint64 result;
+
+ /*
+ * Base64 encoding converts 3 bytes into 4 characters
+ * Formula: ceil(srclen / 3) * 4
+ *
+ * Unlike standard base64, base64url doesn't use padding characters
+ * when the input length is not divisible by 3
+ */
+ result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by 4 */
+
+ return result;
+}
+
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /* For Base64, each 4 characters of input produce at most 3 bytes of output */
+ /* For Base64URL without padding, we need to round up to the nearest 4 */
+ size_t adjusted_len = srclen;
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ uint64 encoded_len;
+ if (len == 0)
+ return 0;
+
+ encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++) {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
+
+ /* Trim '=' padding */
+ while (encoded_len > 0 && dst[encoded_len - 1] == '=')
+ encoded_len--;
+
+ return encoded_len;
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ size_t i, pad_len, base64_len;
+ uint64 decoded_len;
+ char *base64;
+
+ /* Handle empty input specially */
+ if (len == 0)
+ return 0;
+
+ /* Calculate padding needed for standard base64 */
+ pad_len = 0;
+ if (len % 4 != 0)
+ pad_len = 4 - (len % 4);
+
+ /* Allocate memory for converted string */
+ base64_len = len + pad_len;
+ base64 = palloc(base64_len + 1); /* +1 for null terminator */
+
+ /* Convert Base64URL to Base64 */
+ for (i = 0; i < len; i++)
+ {
+ char c = src[i];
+ if (c == '-')
+ base64[i] = '+'; /* Convert '-' to '+' */
+ else if (c == '_')
+ base64[i] = '/'; /* Convert '_' to '/' */
+ else if ((c >= 'A' && c <= 'Z') ||
+ (c >= 'a' && c <= 'z') ||
+ (c >= '0' && c <= '9'))
+ base64[i] = c; /* Keep alphanumeric chars unchanged */
+ else if (c == '=')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url input"),
+ errhint("Base64URL encoding should not contain padding '='.")));
+ else if (c == '+' || c == '/')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c),
+ errhint("Base64URL should use '-' instead of '+' and '_' instead of '/'.")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c)));
+ }
+
+ /* Add padding if necessary */
+ for (i = 0; i < pad_len; i++)
+ base64[len + i] = '=';
+
+ base64[base64_len] = '\0'; /* Null-terminate for safety */
+
+ /* Decode using the standard Base64 decoder */
+ decoded_len = pg_base64_decode(base64, base64_len, dst);
+
+ /* Free allocated memory */
+ pfree(base64);
+ return decoded_len;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +726,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fbe7d7be71f..80e2ff8426c 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2341,6 +2341,63 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+ encode
+----------
+ abc+/w==
+(1 row)
+
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+ encode
+--------------
+ EjRWeJCrze8A
+(1 row)
+
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Test with empty input
+SELECT encode('', 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url');
+ decode
+--------
+ \x
+(1 row)
+
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+ encode
+------------------
+ SGVsbG8gV29ybGQh
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index ed054e6e99c..d14c1cac28f 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -743,6 +743,24 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+-- Test with empty input
+SELECT encode('', 'base64url');
+SELECT decode('', 'base64url');
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+
--
-- get_bit/set_bit etc
--
--
2.48.1
Hi Florents,
Here's a v3 with some (hopefully) better test cases.
Thanks for the new version of the patch.
```
+ encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++) {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
```
Although it is a possible implementation, wouldn't it be better to
parametrize pg_base64_encode instead of traversing the string twice?
Same for pg_base64_decode. You can refactor pg_base64_encode and make
it a wrapper for pg_base64_encode_impl if needed.
```
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+ encode
+----------
+ abc+/w==
+(1 row)
+
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
```
I get the idea, but calling base64 is redundant IMO. It only takes
several CPU cycles during every test run without much value. I suggest
removing it and testing corner cases for base64url instead, which is
missing at the moment. Particularly there should be tests for
encoding/decoding strings of 0/1/2/3/4 characters and making sure that
decode(encode(x)) = x, always. On top of that you should cover with
tests the cases of invalid output for decode().
--
Best regards,
Aleksander Alekseev
Hi,
In the strings.sql file there is such code
SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
In the strings.out file
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+ encode
+----------
+ abc+/w==
+(1 row)
+
maybe you should remove the additional description of the expected value in this way?
strings.sql
SELECT encode('\x69b73eff', 'base64') = "abc+/w=="
strings.out
SELECT encode('\x69b73eff', 'base64') = "abc+/w=="
----------
t
(1 row)
Regards,
Pavel
Thanks for the review Aleksander;
On Mon, Mar 31, 2025 at 5:37 PM Aleksander Alekseev <
aleksander@timescale.com> wrote:
Hi Florents,
Here's a v3 with some (hopefully) better test cases.
Thanks for the new version of the patch.
``` + encoded_len = pg_base64_encode(src, len, dst); + + /* Convert Base64 to Base64URL */ + for (uint64 i = 0; i < encoded_len; i++) { + if (dst[i] == '+') + dst[i] = '-'; + else if (dst[i] == '/') + dst[i] = '_'; + } ```Although it is a possible implementation, wouldn't it be better to
parametrize pg_base64_encode instead of traversing the string twice?
Same for pg_base64_decode. You can refactor pg_base64_encode and make
it a wrapper for pg_base64_encode_impl if needed.``` +-- Flaghsip Test case against base64. +-- Notice the = padding removed at the end and special chars. +SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w== + encode +---------- + abc+/w== +(1 row) + +SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w + encode +-------- + abc-_w +(1 row) ```I get the idea, but calling base64 is redundant IMO. It only takes
several CPU cycles during every test run without much value. I suggest
removing it and testing corner cases for base64url instead, which is
missing at the moment. Particularly there should be tests for
encoding/decoding strings of 0/1/2/3/4 characters and making sure that
decode(encode(x)) = x, always. On top of that you should cover with
tests the cases of invalid output for decode().--
Best regards,
Aleksander Alekseev
here's a v4 patch set
- Extracted pg_base64_{en,de}_internal with an additional bool url param,
to be used by other functions
- Added a few more test cases
Cary mentioned above
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls
Haven't done that, but I could;
Although I think it'd probably be best to do it in a separate patch.
GH PR View https://github.com/Florents-Tselai/postgres/pull/23
Attachments:
v4-0001-base64url-support-for-encode-decode-functions.-Re.patchapplication/octet-stream; name=v4-0001-base64url-support-for-encode-decode-functions.-Re.patchDownload
From a3d31f4fe330761840d3e874223ae118926047b2 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Fri, 14 Mar 2025 20:51:33 +0200
Subject: [PATCH v4 1/3] base64url support for encode/decode functions.
Refactored and with better test cases
---
doc/src/sgml/func.sgml | 18 ++++
src/backend/utils/adt/encode.c | 126 ++++++++++++++++++++++++++
src/test/regress/expected/strings.out | 57 ++++++++++++
src/test/regress/sql/strings.sql | 18 ++++
4 files changed, 219 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1c3810e1a04..b47a09dbe05 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4951,6 +4951,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5008,6 +5009,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink>. Unlike standard <literal>base64</literal>, it replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and <literal>'/'</literal> with <literal>'_'</literal>
+ to ensure safe usage in URLs and filenames. Additionally, the padding character
+ <literal>'='</literal> is omitted.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..9522eecd4be 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -415,6 +415,126 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+/*
+ * Calculate the length of base64url encoded output for given input length
+ * Base64url encoding: 3 bytes -> 4 chars, padding to multiple of 4
+ */
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ uint64 result;
+
+ /*
+ * Base64 encoding converts 3 bytes into 4 characters
+ * Formula: ceil(srclen / 3) * 4
+ *
+ * Unlike standard base64, base64url doesn't use padding characters
+ * when the input length is not divisible by 3
+ */
+ result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by 4 */
+
+ return result;
+}
+
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /* For Base64, each 4 characters of input produce at most 3 bytes of output */
+ /* For Base64URL without padding, we need to round up to the nearest 4 */
+ size_t adjusted_len = srclen;
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ uint64 encoded_len;
+ if (len == 0)
+ return 0;
+
+ encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++) {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
+
+ /* Trim '=' padding */
+ while (encoded_len > 0 && dst[encoded_len - 1] == '=')
+ encoded_len--;
+
+ return encoded_len;
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ size_t i, pad_len, base64_len;
+ uint64 decoded_len;
+ char *base64;
+
+ /* Handle empty input specially */
+ if (len == 0)
+ return 0;
+
+ /* Calculate padding needed for standard base64 */
+ pad_len = 0;
+ if (len % 4 != 0)
+ pad_len = 4 - (len % 4);
+
+ /* Allocate memory for converted string */
+ base64_len = len + pad_len;
+ base64 = palloc(base64_len + 1); /* +1 for null terminator */
+
+ /* Convert Base64URL to Base64 */
+ for (i = 0; i < len; i++)
+ {
+ char c = src[i];
+ if (c == '-')
+ base64[i] = '+'; /* Convert '-' to '+' */
+ else if (c == '_')
+ base64[i] = '/'; /* Convert '_' to '/' */
+ else if ((c >= 'A' && c <= 'Z') ||
+ (c >= 'a' && c <= 'z') ||
+ (c >= '0' && c <= '9'))
+ base64[i] = c; /* Keep alphanumeric chars unchanged */
+ else if (c == '=')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url input"),
+ errhint("Base64URL encoding should not contain padding '='.")));
+ else if (c == '+' || c == '/')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c),
+ errhint("Base64URL should use '-' instead of '+' and '_' instead of '/'.")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c)));
+ }
+
+ /* Add padding if necessary */
+ for (i = 0; i < pad_len; i++)
+ base64[len + i] = '=';
+
+ base64[base64_len] = '\0'; /* Null-terminate for safety */
+
+ /* Decode using the standard Base64 decoder */
+ decoded_len = pg_base64_decode(base64, base64_len, dst);
+
+ /* Free allocated memory */
+ pfree(base64);
+ return decoded_len;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +726,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fbe7d7be71f..80e2ff8426c 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2341,6 +2341,63 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+ encode
+----------
+ abc+/w==
+(1 row)
+
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+ encode
+--------------
+ EjRWeJCrze8A
+(1 row)
+
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Test with empty input
+SELECT encode('', 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url');
+ decode
+--------
+ \x
+(1 row)
+
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+ encode
+------------------
+ SGVsbG8gV29ybGQh
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index ed054e6e99c..d14c1cac28f 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -743,6 +743,24 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+-- Test with empty input
+SELECT encode('', 'base64url');
+SELECT decode('', 'base64url');
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+
--
-- get_bit/set_bit etc
--
--
2.49.0
v4-0003-Add-more-test-cases-for-shorter-inputs-and-errors.patchapplication/octet-stream; name=v4-0003-Add-more-test-cases-for-shorter-inputs-and-errors.patchDownload
From 9e8d835669705f7e13da3b401a272995c4225531 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 4 Jun 2025 10:56:23 +0300
Subject: [PATCH v4 3/3] Add more test cases for shorter inputs and errors.
---
src/test/regress/expected/strings.out | 147 +++++++++++++++++++++-----
src/test/regress/sql/strings.sql | 64 ++++++++---
2 files changed, 170 insertions(+), 41 deletions(-)
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 80e2ff8426c..28ab4b52222 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2342,60 +2342,153 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
(1 row)
--
--- encode/decode Base64URL
+-- Base64URL encoding/decoding
--
SET bytea_output TO hex;
--- Flaghsip Test case against base64.
--- Notice the = padding removed at the end and special chars.
-SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
- encode
-----------
- abc+/w==
-(1 row)
-
-SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
encode
--------
abc-_w
(1 row)
-SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
decode
------------
\x69b73eff
(1 row)
--- Test basic encoding/decoding
-SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
- encode
---------------
- EjRWeJCrze8A
-(1 row)
-
-SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
decode
----------------------
\x1234567890abcdef00
(1 row)
--- Test with empty input
-SELECT encode('', 'base64url');
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
encode
--------
(1 row)
-SELECT decode('', 'base64url');
+SELECT decode('', 'base64url'); -- ''
decode
--------
\x
(1 row)
--- Test round-trip conversion
-SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
- encode
-------------------
- SGVsbG8gV29ybGQh
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64 sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
(1 row)
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d14c1cac28f..51b4de5bc90 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -744,22 +744,58 @@ SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
--
--- encode/decode Base64URL
+-- Base64URL encoding/decoding
--
SET bytea_output TO hex;
--- Flaghsip Test case against base64.
--- Notice the = padding removed at the end and special chars.
-SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
-SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
-SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
--- Test basic encoding/decoding
-SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
-SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
--- Test with empty input
-SELECT encode('', 'base64url');
-SELECT decode('', 'base64url');
--- Test round-trip conversion
-SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
--
-- get_bit/set_bit etc
--
2.49.0
v4-0002-Extract-pg_base64_-en-de-code_internal-with-an-ad.patchapplication/octet-stream; name=v4-0002-Extract-pg_base64_-en-de-code_internal-with-an-ad.patchDownload
From b25fa01a5a9624c44574d2f00e2795f1e1a7058e Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 4 Jun 2025 10:40:31 +0300
Subject: [PATCH v4 2/3] Extract pg_base64_{en,de}code_internal with an
additional bool url param, to be used by other functions.
---
src/backend/utils/adt/encode.c | 206 +++++++++++++++------------------
1 file changed, 91 insertions(+), 115 deletions(-)
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 9522eecd4be..3f2dd448e2a 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,17 +288,15 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
- char *p,
- *lend = dst + 76;
- const char *s,
- *end = src + len;
- int pos = 2;
- uint32 buf = 0;
-
- s = src;
- p = dst;
+ const char *alphabet = url ? _base64url : _base64;
+ const char *end = src + len;
+ const char *s = src;
+ char *p = dst;
+ int pos = 2;
+ uint32 buf = 0;
+ char *lend = dst + 76;
while (s < end)
{
@@ -306,53 +307,84 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else
+ {
+ if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
- const char *srcend = src + len,
- *s = src;
- char *p = dst;
- char c;
- int b = 0;
- uint32 buf = 0;
- int pos = 0,
- end = 0;
+ const char *srcend = src + len;
+ const char *s = src;
+ char *p = dst;
+ char c;
+ int b = 0;
+ uint32 buf = 0;
+ int pos = 0;
+ int end = 0;
while (s < srcend)
{
c = *s++;
+ /* skip whitespace */
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert Base64URL to Base64 if needed */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
- /* end sequence */
if (!end)
{
if (pos == 2)
@@ -377,30 +409,49 @@ pg_base64_decode(const char *src, size_t len, char *dst)
errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
pg_mblen(s - 1), s - 1)));
}
- /* add it to buffer */
+
buf = (buf << 6) + b;
pos++;
+
if (pos == 4)
{
- *p++ = (buf >> 16) & 255;
+ *p++ = (buf >> 16) & 0xFF;
if (end == 0 || end > 1)
- *p++ = (buf >> 8) & 255;
+ *p++ = (buf >> 8) & 0xFF;
if (end == 0 || end > 2)
- *p++ = buf & 255;
+ *p++ = buf & 0xFF;
buf = 0;
pos = 0;
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12; /* 2 * 6 = 12 bits, pad remaining to 24 */
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6; /* 3 * 6 = 18 bits */
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -436,7 +487,6 @@ pg_base64url_enc_len(const char *src, size_t srclen)
return result;
}
-
static uint64
pg_base64url_dec_len(const char *src, size_t srclen)
{
@@ -452,87 +502,13 @@ pg_base64url_dec_len(const char *src, size_t srclen)
static uint64
pg_base64url_encode(const char *src, size_t len, char *dst)
{
- uint64 encoded_len;
- if (len == 0)
- return 0;
-
- encoded_len = pg_base64_encode(src, len, dst);
-
- /* Convert Base64 to Base64URL */
- for (uint64 i = 0; i < encoded_len; i++) {
- if (dst[i] == '+')
- dst[i] = '-';
- else if (dst[i] == '/')
- dst[i] = '_';
- }
-
- /* Trim '=' padding */
- while (encoded_len > 0 && dst[encoded_len - 1] == '=')
- encoded_len--;
-
- return encoded_len;
+ return pg_base64_encode_internal(src, len, dst, true);
}
static uint64
pg_base64url_decode(const char *src, size_t len, char *dst)
{
- size_t i, pad_len, base64_len;
- uint64 decoded_len;
- char *base64;
-
- /* Handle empty input specially */
- if (len == 0)
- return 0;
-
- /* Calculate padding needed for standard base64 */
- pad_len = 0;
- if (len % 4 != 0)
- pad_len = 4 - (len % 4);
-
- /* Allocate memory for converted string */
- base64_len = len + pad_len;
- base64 = palloc(base64_len + 1); /* +1 for null terminator */
-
- /* Convert Base64URL to Base64 */
- for (i = 0; i < len; i++)
- {
- char c = src[i];
- if (c == '-')
- base64[i] = '+'; /* Convert '-' to '+' */
- else if (c == '_')
- base64[i] = '/'; /* Convert '_' to '/' */
- else if ((c >= 'A' && c <= 'Z') ||
- (c >= 'a' && c <= 'z') ||
- (c >= '0' && c <= '9'))
- base64[i] = c; /* Keep alphanumeric chars unchanged */
- else if (c == '=')
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url input"),
- errhint("Base64URL encoding should not contain padding '='.")));
- else if (c == '+' || c == '/')
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url character: '%c'", c),
- errhint("Base64URL should use '-' instead of '+' and '_' instead of '/'.")));
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url character: '%c'", c)));
- }
-
- /* Add padding if necessary */
- for (i = 0; i < pad_len; i++)
- base64[len + i] = '=';
-
- base64[base64_len] = '\0'; /* Null-terminate for safety */
-
- /* Decode using the standard Base64 decoder */
- decoded_len = pg_base64_decode(base64, base64_len, dst);
-
- /* Free allocated memory */
- pfree(base64);
- return decoded_len;
+ return pg_base64_decode_internal(src, len, dst, true);
}
/*
--
2.49.0
Hi Florents,
Thanks for the update!
here's a v4 patch set
- Extracted pg_base64_{en,de}_internal with an additional bool url param, to be used by other functions
- Added a few more test casesCary mentioned above
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls
Haven't done that, but I could;
Although I think it'd probably be best to do it in a separate patch.
I reviewed and tested v4. To me it looks as good as it will get.
Personally I would change a few minor things here and there and
probably merge all three patches into a single commit. This however is
up to the committer to decide.
Changing the CF entry status to "RfC".
Thanks for the review Aleksander,
On 9 Jul 2025, at 10:45 PM, Aleksander Alekseev <aleksander@tigerdata.com> wrote:
Hi Florents,
Thanks for the update!
here's a v4 patch set
- Extracted pg_base64_{en,de}_internal with an additional bool url param, to be used by other functions
- Added a few more test casesCary mentioned above
In addition, you may also want to add the C versions of base64rul encode
and decode functions to "src/common/base64.c" as new API calls
Haven't done that, but I could;
Although I think it'd probably be best to do it in a separate patch.I reviewed and tested v4. To me it looks as good as it will get.
Personally I would change a few minor things here and there and
probably merge all three patches into a single commit. This however is
up to the committer to decide.
Attaching a single-file patch
Attachments:
v4-base64url.patchapplication/octet-stream; name=v4-base64url.patch; x-unix-mode=0644Download
From a3d31f4fe330761840d3e874223ae118926047b2 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Fri, 14 Mar 2025 20:51:33 +0200
Subject: [PATCH v4 1/3] base64url support for encode/decode functions.
Refactored and with better test cases
---
doc/src/sgml/func.sgml | 18 ++++
src/backend/utils/adt/encode.c | 126 ++++++++++++++++++++++++++
src/test/regress/expected/strings.out | 57 ++++++++++++
src/test/regress/sql/strings.sql | 18 ++++
4 files changed, 219 insertions(+)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1c3810e1a04..b47a09dbe05 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4951,6 +4951,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5008,6 +5009,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink>. Unlike standard <literal>base64</literal>, it replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and <literal>'/'</literal> with <literal>'_'</literal>
+ to ensure safe usage in URLs and filenames. Additionally, the padding character
+ <literal>'='</literal> is omitted.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..9522eecd4be 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -415,6 +415,126 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+/*
+ * Calculate the length of base64url encoded output for given input length
+ * Base64url encoding: 3 bytes -> 4 chars, padding to multiple of 4
+ */
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ uint64 result;
+
+ /*
+ * Base64 encoding converts 3 bytes into 4 characters
+ * Formula: ceil(srclen / 3) * 4
+ *
+ * Unlike standard base64, base64url doesn't use padding characters
+ * when the input length is not divisible by 3
+ */
+ result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by 4 */
+
+ return result;
+}
+
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /* For Base64, each 4 characters of input produce at most 3 bytes of output */
+ /* For Base64URL without padding, we need to round up to the nearest 4 */
+ size_t adjusted_len = srclen;
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ uint64 encoded_len;
+ if (len == 0)
+ return 0;
+
+ encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++) {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
+
+ /* Trim '=' padding */
+ while (encoded_len > 0 && dst[encoded_len - 1] == '=')
+ encoded_len--;
+
+ return encoded_len;
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ size_t i, pad_len, base64_len;
+ uint64 decoded_len;
+ char *base64;
+
+ /* Handle empty input specially */
+ if (len == 0)
+ return 0;
+
+ /* Calculate padding needed for standard base64 */
+ pad_len = 0;
+ if (len % 4 != 0)
+ pad_len = 4 - (len % 4);
+
+ /* Allocate memory for converted string */
+ base64_len = len + pad_len;
+ base64 = palloc(base64_len + 1); /* +1 for null terminator */
+
+ /* Convert Base64URL to Base64 */
+ for (i = 0; i < len; i++)
+ {
+ char c = src[i];
+ if (c == '-')
+ base64[i] = '+'; /* Convert '-' to '+' */
+ else if (c == '_')
+ base64[i] = '/'; /* Convert '_' to '/' */
+ else if ((c >= 'A' && c <= 'Z') ||
+ (c >= 'a' && c <= 'z') ||
+ (c >= '0' && c <= '9'))
+ base64[i] = c; /* Keep alphanumeric chars unchanged */
+ else if (c == '=')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url input"),
+ errhint("Base64URL encoding should not contain padding '='.")));
+ else if (c == '+' || c == '/')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c),
+ errhint("Base64URL should use '-' instead of '+' and '_' instead of '/'.")));
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid base64url character: '%c'", c)));
+ }
+
+ /* Add padding if necessary */
+ for (i = 0; i < pad_len; i++)
+ base64[len + i] = '=';
+
+ base64[base64_len] = '\0'; /* Null-terminate for safety */
+
+ /* Decode using the standard Base64 decoder */
+ decoded_len = pg_base64_decode(base64, base64_len, dst);
+
+ /* Free allocated memory */
+ pfree(base64);
+ return decoded_len;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +726,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index fbe7d7be71f..80e2ff8426c 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2341,6 +2341,63 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+ encode
+----------
+ abc+/w==
+(1 row)
+
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+ encode
+--------------
+ EjRWeJCrze8A
+(1 row)
+
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Test with empty input
+SELECT encode('', 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url');
+ decode
+--------
+ \x
+(1 row)
+
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+ encode
+------------------
+ SGVsbG8gV29ybGQh
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index ed054e6e99c..d14c1cac28f 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -743,6 +743,24 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- encode/decode Base64URL
+--
+SET bytea_output TO hex;
+-- Flaghsip Test case against base64.
+-- Notice the = padding removed at the end and special chars.
+SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
+SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+-- Test basic encoding/decoding
+SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+-- Test with empty input
+SELECT encode('', 'base64url');
+SELECT decode('', 'base64url');
+-- Test round-trip conversion
+SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+
--
-- get_bit/set_bit etc
--
--
2.49.0
From b25fa01a5a9624c44574d2f00e2795f1e1a7058e Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 4 Jun 2025 10:40:31 +0300
Subject: [PATCH v4 2/3] Extract pg_base64_{en,de}code_internal with an
additional bool url param, to be used by other functions.
---
src/backend/utils/adt/encode.c | 206 +++++++++++++++------------------
1 file changed, 91 insertions(+), 115 deletions(-)
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 9522eecd4be..3f2dd448e2a 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,17 +288,15 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
- char *p,
- *lend = dst + 76;
- const char *s,
- *end = src + len;
- int pos = 2;
- uint32 buf = 0;
-
- s = src;
- p = dst;
+ const char *alphabet = url ? _base64url : _base64;
+ const char *end = src + len;
+ const char *s = src;
+ char *p = dst;
+ int pos = 2;
+ uint32 buf = 0;
+ char *lend = dst + 76;
while (s < end)
{
@@ -306,53 +307,84 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else
+ {
+ if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
- const char *srcend = src + len,
- *s = src;
- char *p = dst;
- char c;
- int b = 0;
- uint32 buf = 0;
- int pos = 0,
- end = 0;
+ const char *srcend = src + len;
+ const char *s = src;
+ char *p = dst;
+ char c;
+ int b = 0;
+ uint32 buf = 0;
+ int pos = 0;
+ int end = 0;
while (s < srcend)
{
c = *s++;
+ /* skip whitespace */
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert Base64URL to Base64 if needed */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
- /* end sequence */
if (!end)
{
if (pos == 2)
@@ -377,30 +409,49 @@ pg_base64_decode(const char *src, size_t len, char *dst)
errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
pg_mblen(s - 1), s - 1)));
}
- /* add it to buffer */
+
buf = (buf << 6) + b;
pos++;
+
if (pos == 4)
{
- *p++ = (buf >> 16) & 255;
+ *p++ = (buf >> 16) & 0xFF;
if (end == 0 || end > 1)
- *p++ = (buf >> 8) & 255;
+ *p++ = (buf >> 8) & 0xFF;
if (end == 0 || end > 2)
- *p++ = buf & 255;
+ *p++ = buf & 0xFF;
buf = 0;
pos = 0;
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12; /* 2 * 6 = 12 bits, pad remaining to 24 */
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6; /* 3 * 6 = 18 bits */
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -436,7 +487,6 @@ pg_base64url_enc_len(const char *src, size_t srclen)
return result;
}
-
static uint64
pg_base64url_dec_len(const char *src, size_t srclen)
{
@@ -452,87 +502,13 @@ pg_base64url_dec_len(const char *src, size_t srclen)
static uint64
pg_base64url_encode(const char *src, size_t len, char *dst)
{
- uint64 encoded_len;
- if (len == 0)
- return 0;
-
- encoded_len = pg_base64_encode(src, len, dst);
-
- /* Convert Base64 to Base64URL */
- for (uint64 i = 0; i < encoded_len; i++) {
- if (dst[i] == '+')
- dst[i] = '-';
- else if (dst[i] == '/')
- dst[i] = '_';
- }
-
- /* Trim '=' padding */
- while (encoded_len > 0 && dst[encoded_len - 1] == '=')
- encoded_len--;
-
- return encoded_len;
+ return pg_base64_encode_internal(src, len, dst, true);
}
static uint64
pg_base64url_decode(const char *src, size_t len, char *dst)
{
- size_t i, pad_len, base64_len;
- uint64 decoded_len;
- char *base64;
-
- /* Handle empty input specially */
- if (len == 0)
- return 0;
-
- /* Calculate padding needed for standard base64 */
- pad_len = 0;
- if (len % 4 != 0)
- pad_len = 4 - (len % 4);
-
- /* Allocate memory for converted string */
- base64_len = len + pad_len;
- base64 = palloc(base64_len + 1); /* +1 for null terminator */
-
- /* Convert Base64URL to Base64 */
- for (i = 0; i < len; i++)
- {
- char c = src[i];
- if (c == '-')
- base64[i] = '+'; /* Convert '-' to '+' */
- else if (c == '_')
- base64[i] = '/'; /* Convert '_' to '/' */
- else if ((c >= 'A' && c <= 'Z') ||
- (c >= 'a' && c <= 'z') ||
- (c >= '0' && c <= '9'))
- base64[i] = c; /* Keep alphanumeric chars unchanged */
- else if (c == '=')
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url input"),
- errhint("Base64URL encoding should not contain padding '='.")));
- else if (c == '+' || c == '/')
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url character: '%c'", c),
- errhint("Base64URL should use '-' instead of '+' and '_' instead of '/'.")));
- else
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64url character: '%c'", c)));
- }
-
- /* Add padding if necessary */
- for (i = 0; i < pad_len; i++)
- base64[len + i] = '=';
-
- base64[base64_len] = '\0'; /* Null-terminate for safety */
-
- /* Decode using the standard Base64 decoder */
- decoded_len = pg_base64_decode(base64, base64_len, dst);
-
- /* Free allocated memory */
- pfree(base64);
- return decoded_len;
+ return pg_base64_decode_internal(src, len, dst, true);
}
/*
--
2.49.0
From 9e8d835669705f7e13da3b401a272995c4225531 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 4 Jun 2025 10:56:23 +0300
Subject: [PATCH v4 3/3] Add more test cases for shorter inputs and errors.
---
src/test/regress/expected/strings.out | 147 +++++++++++++++++++++-----
src/test/regress/sql/strings.sql | 64 ++++++++---
2 files changed, 170 insertions(+), 41 deletions(-)
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 80e2ff8426c..28ab4b52222 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2342,60 +2342,153 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
(1 row)
--
--- encode/decode Base64URL
+-- Base64URL encoding/decoding
--
SET bytea_output TO hex;
--- Flaghsip Test case against base64.
--- Notice the = padding removed at the end and special chars.
-SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
- encode
-----------
- abc+/w==
-(1 row)
-
-SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
encode
--------
abc-_w
(1 row)
-SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
decode
------------
\x69b73eff
(1 row)
--- Test basic encoding/decoding
-SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
- encode
---------------
- EjRWeJCrze8A
-(1 row)
-
-SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
decode
----------------------
\x1234567890abcdef00
(1 row)
--- Test with empty input
-SELECT encode('', 'base64url');
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
encode
--------
(1 row)
-SELECT decode('', 'base64url');
+SELECT decode('', 'base64url'); -- ''
decode
--------
\x
(1 row)
--- Test round-trip conversion
-SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
- encode
-------------------
- SGVsbG8gV29ybGQh
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64 sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
(1 row)
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index d14c1cac28f..51b4de5bc90 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -744,22 +744,58 @@ SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
--
--- encode/decode Base64URL
+-- Base64URL encoding/decoding
--
SET bytea_output TO hex;
--- Flaghsip Test case against base64.
--- Notice the = padding removed at the end and special chars.
-SELECT encode('\x69b73eff', 'base64'); -- Expected: abc+/w==
-SELECT encode('\x69b73eff', 'base64url'); -- Expected: abc-_w
-SELECT decode(encode('\x69b73eff', 'base64url'), 'base64url');
--- Test basic encoding/decoding
-SELECT encode('\x1234567890abcdef00', 'base64url'); -- Expected: EjRWeJCrze8A
-SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- Expected: \x1234567890abcdef00
--- Test with empty input
-SELECT encode('', 'base64url');
-SELECT decode('', 'base64url');
--- Test round-trip conversion
-SELECT encode(decode('SGVsbG8gV29ybGQh', 'base64url'), 'base64url'); -- Expected: SGVsbG8gV29ybGQh (decodes to "Hello World!")
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
--
-- get_bit/set_bit etc
--
2.49.0
Hi Florents,
On Jul 9, 2025, at 23:25, Florents Tselai <florents.tselai@gmail.com> wrote:
I reviewed and tested v4. To me it looks as good as it will get.
Personally I would change a few minor things here and there and
probably merge all three patches into a single commit. This however is
up to the committer to decide.Attaching a single-file patch
Somehow missed this thread previously. Had a quick look and had the same question Aleksander asked up-thread:
Although it is a possible implementation, wouldn't it be better to
parametrize pg_base64_encode instead of traversing the string twice?
Same for pg_base64_decode. You can refactor pg_base64_encode and make
it a wrapper for pg_base64_encode_impl if needed.
It looks as though there could be complements to _base64 and b64urllookup:
```patch
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -284,6 +287,18 @@ static const int8 b64lookup[128] = {
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
};
+static const int8 b64urllookup[128] = {
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1,
+ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1,
+ -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
+ 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, 62,
+ -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
+ 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
+};
+
+
static uint64
pg_base64_encode(const char *src, size_t len, char *dst)
{
```
And then add the implementation functions that take argument with the proper lookup tables.
Best,
David
On 10 Jul 2025, at 10:07 PM, David E. Wheeler <david@justatheory.com> wrote:
Hi Florents,
On Jul 9, 2025, at 23:25, Florents Tselai <florents.tselai@gmail.com> wrote:
I reviewed and tested v4. To me it looks as good as it will get.
Personally I would change a few minor things here and there and
probably merge all three patches into a single commit. This however is
up to the committer to decide.Attaching a single-file patch
Somehow missed this thread previously. Had a quick look and had the same question Aleksander asked up-thread:
Although it is a possible implementation, wouldn't it be better to
parametrize pg_base64_encode instead of traversing the string twice?
Same for pg_base64_decode. You can refactor pg_base64_encode and make
it a wrapper for pg_base64_encode_impl if needed.It looks as though there could be complements to _base64 and b64urllookup:
```patch
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";+static const char _base64url[] = +"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"; + static const int8 b64lookup[128] = { -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, @@ -284,6 +287,18 @@ static const int8 b64lookup[128] = { 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1, };+static const int8 b64urllookup[128] = { + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, + 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, + -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, + 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, 62, + -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, + 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1, +}; + + static uint64 pg_base64_encode(const char *src, size_t len, char *dst) { ```And then add the implementation functions that take argument with the proper lookup tables.
Best,
David
Why isn’t this sufficient?
static uint64
pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *alphabet = url ? _base64url : _base64;
There’s already a a bool url param and the alphabet is toggled based on that
On Jul 10, 2025, at 16:38, Florents Tselai <florents.tselai@gmail.com> wrote:
Why isn’t this sufficient?
static uint64
pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *alphabet = url ? _base64url : _base64;
Ah, it is. I hadn’t got that far. I was tripped up to see this in your patch:
```patch
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ uint64 encoded_len;
+ if (len == 0)
+ return 0;
+
+ encoded_len = pg_base64_encode(src, len, dst);
+
+ /* Convert Base64 to Base64URL */
+ for (uint64 i = 0; i < encoded_len; i++) {
+ if (dst[i] == '+')
+ dst[i] = '-';
+ else if (dst[i] == '/')
+ dst[i] = '_';
+ }
+
+ /* Trim '=' padding */
+ while (encoded_len > 0 && dst[encoded_len - 1] == '=')
+ encoded_len--;
+
+ return encoded_len;
+}
```
I didn’t realize it was a set of patches for stuff you did and then later undid. Could you flatten the patch into just what’s changed at the end?
Best,
David
On Thu, Jul 10, 2025 at 11:55 PM David E. Wheeler <david@justatheory.com>
wrote:
On Jul 10, 2025, at 16:38, Florents Tselai <florents.tselai@gmail.com>
wrote:Why isn’t this sufficient?
static uint64
pg_base64_encode_internal(const char *src, size_t len, char *dst, boolurl)
{
const char *alphabet = url ? _base64url : _base64;Ah, it is. I hadn’t got that far. I was tripped up to see this in your
patch:```patch +static uint64 +pg_base64url_encode(const char *src, size_t len, char *dst) +{ + uint64 encoded_len; + if (len == 0) + return 0; + + encoded_len = pg_base64_encode(src, len, dst); + + /* Convert Base64 to Base64URL */ + for (uint64 i = 0; i < encoded_len; i++) { + if (dst[i] == '+') + dst[i] = '-'; + else if (dst[i] == '/') + dst[i] = '_'; + } + + /* Trim '=' padding */ + while (encoded_len > 0 && dst[encoded_len - 1] == '=') + encoded_len--; + + return encoded_len; +} ```I didn’t realize it was a set of patches for stuff you did and then later
undid. Could you flatten the patch into just what’s changed at the end?
Attached
Attachments:
v4-0001-Add-base64url.patchapplication/octet-stream; name=v4-0001-Add-base64url.patchDownload
From ed46c1d172297c5238d8446c6f51eb33587d2d4a Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Fri, 11 Jul 2025 11:22:08 +0300
Subject: [PATCH v4] Add base64url
---
doc/src/sgml/func.sgml | 18 +++
src/backend/utils/adt/encode.c | 178 ++++++++++++++++++++------
src/test/regress/expected/strings.out | 150 ++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 362 insertions(+), 38 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 810b2b50f0d..8d0bce29d5e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4999,6 +4999,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5056,6 +5057,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink>. Unlike standard <literal>base64</literal>, it replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and <literal>'/'</literal> with <literal>'_'</literal>
+ to ensure safe usage in URLs and filenames. Additionally, the padding character
+ <literal>'='</literal> is omitted.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..3f2dd448e2a 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,17 +288,15 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
- char *p,
- *lend = dst + 76;
- const char *s,
- *end = src + len;
- int pos = 2;
- uint32 buf = 0;
-
- s = src;
- p = dst;
+ const char *alphabet = url ? _base64url : _base64;
+ const char *end = src + len;
+ const char *s = src;
+ char *p = dst;
+ int pos = 2;
+ uint32 buf = 0;
+ char *lend = dst + 76;
while (s < end)
{
@@ -306,53 +307,84 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else
+ {
+ if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
- const char *srcend = src + len,
- *s = src;
- char *p = dst;
- char c;
- int b = 0;
- uint32 buf = 0;
- int pos = 0,
- end = 0;
+ const char *srcend = src + len;
+ const char *s = src;
+ char *p = dst;
+ char c;
+ int b = 0;
+ uint32 buf = 0;
+ int pos = 0;
+ int end = 0;
while (s < srcend)
{
c = *s++;
+ /* skip whitespace */
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert Base64URL to Base64 if needed */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
- /* end sequence */
if (!end)
{
if (pos == 2)
@@ -377,30 +409,49 @@ pg_base64_decode(const char *src, size_t len, char *dst)
errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
pg_mblen(s - 1), s - 1)));
}
- /* add it to buffer */
+
buf = (buf << 6) + b;
pos++;
+
if (pos == 4)
{
- *p++ = (buf >> 16) & 255;
+ *p++ = (buf >> 16) & 0xFF;
if (end == 0 || end > 1)
- *p++ = (buf >> 8) & 255;
+ *p++ = (buf >> 8) & 0xFF;
if (end == 0 || end > 2)
- *p++ = buf & 255;
+ *p++ = buf & 0xFF;
buf = 0;
pos = 0;
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12; /* 2 * 6 = 12 bits, pad remaining to 24 */
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6; /* 3 * 6 = 18 bits */
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +466,51 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+/*
+ * Calculate the length of base64url encoded output for given input length
+ * Base64url encoding: 3 bytes -> 4 chars, padding to multiple of 4
+ */
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ uint64 result;
+
+ /*
+ * Base64 encoding converts 3 bytes into 4 characters
+ * Formula: ceil(srclen / 3) * 4
+ *
+ * Unlike standard base64, base64url doesn't use padding characters
+ * when the input length is not divisible by 3
+ */
+ result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by 4 */
+
+ return result;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /* For Base64, each 4 characters of input produce at most 3 bytes of output */
+ /* For Base64URL without padding, we need to round up to the nearest 4 */
+ size_t adjusted_len = srclen;
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +702,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 788844abd20..e76f30a63eb 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2462,6 +2462,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64 sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 2577a42987d..ac26d892006 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -774,6 +774,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ==
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
On Jul 11, 2025, at 04:26, Florents Tselai <florents.tselai@gmail.com> wrote:
Attached
Thank you! This looks great. The attached revision makes a a couple of minor changes:
* Change the line wrap of the docs to be more like the rest of func.sgml
* Remove an unnecessary nested if statement
* Removed `==` from one of the test comments
* Ran pgindent to create the attached patch
A few other brief comments, entirely stylistic:
* I kind of expected pg_base64url_encode to appear immediate after pg_base64_encode. In other words, to see the two uses of pg_base64_encode_internal adjacent to each other. I think that’s more typical of the project standard. Same for the functions that call pg_base64_decode_internal.
* There are a few places where variable definition has been changed without changing the meaning, for example:
- const char *srcend = src + len,
- *s = src;
+ const char *srcend = src + len;
+ const char *s = src;
Even if this is desirable, it might make sense to defer pure formatting changes to a separate patch.
* You define return variables in functions like pg_base64url_enc_len rather than just returning the outcome of an expression. The latter is what I see in pg_base64_enc_len, so I think would be more consistent. Io other words:
```patch
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -470,8 +470,6 @@ pg_base64_dec_len(const char *src, size_t srclen)
static uint64
pg_base64url_enc_len(const char *src, size_t srclen)
{
- uint64 result;
-
/*
* Base64 encoding converts 3 bytes into 4 characters Formula: ceil(srclen
* / 3) * 4
@@ -479,10 +477,8 @@ pg_base64url_enc_len(const char *src, size_t srclen)
* Unlike standard base64, base64url doesn't use padding characters when
* the input length is not divisible by 3
*/
- result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by
+ return (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by
* 4 */
-
- return result;
}
static uint64
```
I suspect these are the sorts of things a committer would tweak/adjust before committing, just thinking about getting ahead of that. I think it’s ready.
Best,
David
Attachments:
v4-0001-Add-base64url.patchapplication/octet-stream; name=v4-0001-Add-base64url.patch; x-unix-mode=0644Download
From 3398db2e87d8e4658655aa8cacc6a94974fa1b2d Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Sat, 12 Jul 2025 15:12:17 -0400
Subject: [PATCH v4] Add base64url
---
doc/src/sgml/func.sgml | 19 +++
src/backend/utils/adt/encode.c | 168 +++++++++++++++++++++-----
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 +++++++++
4 files changed, 359 insertions(+), 32 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c28aa71f570..34c8d4990c2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4999,6 +4999,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5056,6 +5057,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is a URL-safe variant of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">RFC 4648
+ Section 5</ulink> <literal>base64</literal>, that replaces
+ <literal>'+'</literal> with <literal>'-'</literal> and
+ <literal>'/'</literal> with <literal>'_'</literal> to ensure safe usage
+ in URLs and filenames. It also omits the <literal>'='</literal> padding
+ character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..9359800ff14 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,17 +288,15 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
- char *p,
- *lend = dst + 76;
- const char *s,
- *end = src + len;
+ const char *alphabet = url ? _base64url : _base64;
+ const char *end = src + len;
+ const char *s = src;
+ char *p = dst;
int pos = 2;
uint32 buf = 0;
-
- s = src;
- p = dst;
+ char *lend = dst + 76;
while (s < end)
{
@@ -306,53 +307,81 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
{
- const char *srcend = src + len,
- *s = src;
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
+{
+ const char *srcend = src + len;
+ const char *s = src;
char *p = dst;
char c;
int b = 0;
uint32 buf = 0;
- int pos = 0,
- end = 0;
+ int pos = 0;
+ int end = 0;
while (s < srcend)
{
c = *s++;
+ /* skip whitespace */
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert Base64URL to Base64 if needed */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
- /* end sequence */
if (!end)
{
if (pos == 2)
@@ -377,30 +406,49 @@ pg_base64_decode(const char *src, size_t len, char *dst)
errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
pg_mblen(s - 1), s - 1)));
}
- /* add it to buffer */
+
buf = (buf << 6) + b;
pos++;
+
if (pos == 4)
{
- *p++ = (buf >> 16) & 255;
+ *p++ = (buf >> 16) & 0xFF;
if (end == 0 || end > 1)
- *p++ = (buf >> 8) & 255;
+ *p++ = (buf >> 8) & 0xFF;
if (end == 0 || end > 2)
- *p++ = buf & 255;
+ *p++ = buf & 0xFF;
buf = 0;
pos = 0;
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12; /* 2 * 6 = 12 bits, pad remaining to 24 */
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6; /* 3 * 6 = 18 bits */
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +463,56 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+/*
+ * Calculate the length of base64url encoded output for given input length
+ * Base64url encoding: 3 bytes -> 4 chars, padding to multiple of 4
+ */
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ uint64 result;
+
+ /*
+ * Base64 encoding converts 3 bytes into 4 characters Formula: ceil(srclen
+ * / 3) * 4
+ *
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ result = (srclen + 2) / 3 * 4; /* ceiling division by 3, then multiply by
+ * 4 */
+
+ return result;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output
+ */
+ /* For Base64URL without padding, we need to round up to the nearest 4 */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +704,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 788844abd20..ae5da7bde82 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2462,6 +2462,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64 sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 2577a42987d..fb49f564936 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -774,6 +774,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com> wrote:
Thank you! This looks great. The attached revision makes a a couple of minor changes:
I also had a look at this today and agree that it looks pretty close to being
done, and a feature we IMHO would like to have.
* I kind of expected pg_base64url_encode to appear immediate after pg_base64_encode. In other words, to see the two uses of pg_base64_encode_internal adjacent to each other. I think that’s more typical of the project standard. Same for the functions that call pg_base64_decode_internal.
+1, done in the attached.
* There are a few places where variable definition has been changed without changing the meaning, for example:
...
Even if this is desirable, it might make sense to defer pure formatting changes to a separate patch.
Agreed, the attached reverts stylistic changes.
* You define return variables in functions like pg_base64url_enc_len rather than just returning the outcome of an expression. The latter is what I see in pg_base64_enc_len, so I think would be more consistent.
+1, done in the attached.
The attached version also adds a commit message, tweaks the documentation along
with a few small changes to error message handling etc.
The base64 code this extends is the RFC 2045 variant while base64url is based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a problem
here but has anyone else verified this?
--
Daniel Gustafsson
Attachments:
v5-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v5-0001-Add-support-for-base64url-encoding-and-decoding.patch; x-unix-mode=0644Download
From fd34ff635fadc82312ea5b095d494caf76e636d6 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Sat, 12 Jul 2025 15:12:17 -0400
Subject: [PATCH v5] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with with '-' and '/' with
'_', thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func.sgml | 19 ++++
src/backend/utils/adt/encode.c | 131 ++++++++++++++++++----
src/test/regress/expected/strings.out | 150 ++++++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++++
4 files changed, 336 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index de5b5929ee0..cad6b29dcd1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4999,6 +4999,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -5056,6 +5057,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..5b56ede016e 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,7 +288,7 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +296,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +310,58 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +379,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -374,8 +412,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +431,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +478,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output. For Base64URL without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +695,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 1bfd33de3f3..1fa120d3e04 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2474,6 +2474,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 92c445c2439..3b9385a5fec 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -776,6 +776,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.39.3 (Apple Git-146)
On Jul 29, 2025, at 08:25, Daniel Gustafsson <daniel@yesql.se> wrote:
The attached version also adds a commit message, tweaks the documentation along
with a few small changes to error message handling etc.
This looks great. One nit my editor noticed: This line:
+-- Round-trip test for all lengths from 0–4
Uses U+2013 "–" but maybe we want ASCII character U+002d "-".
Best,
David
On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com>
wrote:
Thank you! This looks great. The attached revision makes a a couple of
minor changes:
I also had a look at this today and agree that it looks pretty close to
being
done, and a feature we IMHO would like to have.
Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation
along
with a few small changes to error message handling etc.
In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/' and
also omits the '=' padding character.
Should be
The base64url alphabet use*s* '-' instead of '+' and '_' instead of '/'*,
*and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is
based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a
problem
here but has anyone else verified this?
I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past
behavior.
On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com> wrote:
On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com <mailto:david@justatheory.com>> wrote:
Thank you! This looks great. The attached revision makes a a couple of minor changes:
I also had a look at this today and agree that it looks pretty close to being
done, and a feature we IMHO would like to have.Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation along
with a few small changes to error message handling etc.In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/' and also omits the '=' padding character.
Should be
The base64url alphabet uses '-' instead of '+' and '_' instead of '/', and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a problem
here but has anyone else verified this?I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past behavior.
Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.

Attachments:
v6-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v6-0001-Add-support-for-base64url-encoding-and-decoding.patch; x-unix-mode=0644Download
From 7ec8135ae6f132e511f93adf665fa9698061bb12 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Tue, 5 Aug 2025 12:36:59 +0300
Subject: [PATCH v6] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 131 +++++++++++++++++---
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 336 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..22df995ec54 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..5b56ede016e 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,7 +288,7 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +296,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +310,58 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +379,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -374,8 +412,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +431,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +478,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output. For Base64URL without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +695,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 1bfd33de3f3..c804a6199ae 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2474,6 +2474,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 92c445c2439..3b9385a5fec 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -776,6 +776,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
Attaching v6 again because it wasn't picked up the last time.
Trying from Gmail's web page this time.
On Tue, Aug 5, 2025 at 12:40 PM Florents Tselai <florents.tselai@gmail.com>
wrote:
Show quoted text
On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com>
wrote:On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com>
wrote:
Thank you! This looks great. The attached revision makes a a couple of
minor changes:
I also had a look at this today and agree that it looks pretty close to
being
done, and a feature we IMHO would like to have.Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation
along
with a few small changes to error message handling etc.In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/' and
also omits the '=' padding character.
Should be
The base64url alphabet use*s* '-' instead of '+' and '_' instead of '/'*,
*and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is
based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a
problem
here but has anyone else verified this?I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past
behavior.Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.
Attachments:
v6-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v6-0001-Add-support-for-base64url-encoding-and-decoding.patchDownload
From 7ec8135ae6f132e511f93adf665fa9698061bb12 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Tue, 5 Aug 2025 12:36:59 +0300
Subject: [PATCH v6] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 131 +++++++++++++++++---
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 336 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..22df995ec54 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..5b56ede016e 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,7 +288,7 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +296,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +310,58 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +379,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -374,8 +412,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +431,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +478,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output. For Base64URL without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +695,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 1bfd33de3f3..c804a6199ae 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2474,6 +2474,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 92c445c2439..3b9385a5fec 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -776,6 +776,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
On Wed, Aug 6, 2025 at 4:34 PM Florents Tselai <florents.tselai@gmail.com>
wrote:
Attaching v6 again because it wasn't picked up the last time.
Trying from Gmail's web page this time.On Tue, Aug 5, 2025 at 12:40 PM Florents Tselai <florents.tselai@gmail.com>
wrote:On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com>
wrote:On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se>
wrote:On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com>
wrote:
Thank you! This looks great. The attached revision makes a a couple of
minor changes:
I also had a look at this today and agree that it looks pretty close to
being
done, and a feature we IMHO would like to have.Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation
along
with a few small changes to error message handling etc.In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead
of '/' and also omits the '=' padding character.
Should be
The base64url alphabet use*s* '-' instead of '+' and '_' instead of '/'*,
*and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is
based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a
problem
here but has anyone else verified this?I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past
behavior.Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.
v6 introduced some whitespace errors in the regression files.
Here's a v7 that fixes that
Attachments:
v7-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v7-0001-Add-support-for-base64url-encoding-and-decoding.patchDownload
From 029276804c7ae74833886023728478e104e2f16d Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 6 Aug 2025 21:58:26 +0300
Subject: [PATCH v7] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 131 +++++++++++++++++---
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 336 insertions(+), 18 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..22df995ec54 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..5b56ede016e 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,7 +288,7 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +296,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +310,58 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +379,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -374,8 +412,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +431,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +478,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output. For Base64URL without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +695,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index ba302da51e7..ac5ded4e26e 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2508,6 +2508,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+ERROR: invalid base64 end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64 sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b94004cc08c..2247bd74751 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -796,6 +796,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64 end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
On Wed, Aug 6, 2025 at 12:43 PM Florents Tselai
<florents.tselai@gmail.com> wrote:
On Wed, Aug 6, 2025 at 4:34 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Attaching v6 again because it wasn't picked up the last time.
Trying from Gmail's web page this time.On Tue, Aug 5, 2025 at 12:40 PM Florents Tselai <florents.tselai@gmail.com> wrote:
On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com> wrote:
On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com> wrote:
Thank you! This looks great. The attached revision makes a a couple of minor changes:
I also had a look at this today and agree that it looks pretty close to being
done, and a feature we IMHO would like to have.Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation along
with a few small changes to error message handling etc.In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/' and also omits the '=' padding character.
Should be
The base64url alphabet uses '-' instead of '+' and '_' instead of '/', and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a problem
here but has anyone else verified this?I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past behavior.Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.v6 introduced some whitespace errors in the regression files.
Here's a v7 that fixes that
While the patch looks good to me I have one question:
- errmsg("invalid symbol \"%.*s\" found while
decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while
decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
The above change makes the error message mention the encoding name
properly. On the other hand, in pg_base64_decode_internal() there are
two places where we report invalid data and always mention 'based64'
in the error message:
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unexpected \"=\" while decoding base64 sequence")));
and
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is
otherwise corrupted.")));
Do we need to have a similar change for these messages?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Wed, Sep 17, 2025 at 12:56 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
On Wed, Aug 6, 2025 at 12:43 PM Florents Tselai
<florents.tselai@gmail.com> wrote:On Wed, Aug 6, 2025 at 4:34 PM Florents Tselai <
florents.tselai@gmail.com> wrote:
Attaching v6 again because it wasn't picked up the last time.
Trying from Gmail's web page this time.On Tue, Aug 5, 2025 at 12:40 PM Florents Tselai <
florents.tselai@gmail.com> wrote:
On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com>
wrote:
On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se>
wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com>
wrote:
Thank you! This looks great. The attached revision makes a a couple
of minor changes:
I also had a look at this today and agree that it looks pretty close
to being
done, and a feature we IMHO would like to have.
Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the
documentation along
with a few small changes to error message handling etc.
In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/'
and also omits the '=' padding character.
Should be
The base64url alphabet uses '-' instead of '+' and '_' instead of
'/', and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url
is based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not
a problem
here but has anyone else verified this?
I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't changepast behavior.
Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.v6 introduced some whitespace errors in the regression files.
Here's a v7 that fixes that
While the patch looks good to me I have one question:
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence", - pg_mblen(s - 1), s - 1))); + errmsg("invalid symbol \"%.*s\" found while decoding %s sequence", + pg_mblen(s - 1), s - 1, + url ? "base64url" : "base64")));The above change makes the error message mention the encoding name
properly. On the other hand, in pg_base64_decode_internal() there are
two places where we report invalid data and always mention 'based64'
in the error message:ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unexpected \"=\" while decoding base64 sequence")));and
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is
otherwise corrupted.")));Do we need to have a similar change for these messages?
Good catch, Masahiko-san. They shouldn't be hardcoded either.
I've updated that and also the wording in the regression tests, too.
Attachments:
v8-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v8-0001-Add-support-for-base64url-encoding-and-decoding.patchDownload
From 313b1b454f0d50fa8fe397d6cb880121ddaaf85c Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Wed, 17 Sep 2025 15:43:25 +0300
Subject: [PATCH v8] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 135 +++++++++++++++++---
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 338 insertions(+), 20 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..22df995ec54 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..94ea907ef0e 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -273,6 +273,9 @@ hex_dec_len(const char *src, size_t srclen)
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -285,7 +288,7 @@ static const int8 b64lookup[128] = {
};
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +296,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +310,58 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +379,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -362,7 +400,7 @@ pg_base64_decode(const char *src, size_t len, char *dst)
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unexpected \"=\" while decoding base64 sequence")));
+ errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64")));
}
b = 0;
}
@@ -374,8 +412,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +431,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64 end sequence"),
+ errmsg("invalid %s end sequence", url ? "base64url" : "base64"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +478,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For Base64, each 4 characters of input produce at most 3 bytes of
+ * output. For Base64URL without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +695,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 2d6cb02ad60..cc551e355cb 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2517,6 +2517,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+ERROR: invalid base64url end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64url sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 5ed421d6205..3b72d8b69e8 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -799,6 +799,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- Base64URL encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.49.0
On Wed, Sep 17, 2025 at 5:57 AM Florents Tselai
<florents.tselai@gmail.com> wrote:
On Wed, Sep 17, 2025 at 12:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Wed, Aug 6, 2025 at 12:43 PM Florents Tselai
<florents.tselai@gmail.com> wrote:On Wed, Aug 6, 2025 at 4:34 PM Florents Tselai <florents.tselai@gmail.com> wrote:
Attaching v6 again because it wasn't picked up the last time.
Trying from Gmail's web page this time.On Tue, Aug 5, 2025 at 12:40 PM Florents Tselai <florents.tselai@gmail.com> wrote:
On 1 Aug 2025, at 1:13 PM, Florents Tselai <florents.tselai@gmail.com> wrote:
On Tue, Jul 29, 2025 at 3:25 PM Daniel Gustafsson <daniel@yesql.se> wrote:
On 12 Jul 2025, at 21:40, David E. Wheeler <david@justatheory.com> wrote:
Thank you! This looks great. The attached revision makes a a couple of minor changes:
I also had a look at this today and agree that it looks pretty close to being
done, and a feature we IMHO would like to have.Thanks for having a look Daniel!
The attached version also adds a commit message, tweaks the documentation along
with a few small changes to error message handling etc.In the doc snippet
The base64url alphabet use '-' instead of '+' and '_' instead of '/' and also omits the '=' padding character.
Should be
The base64url alphabet uses '-' instead of '+' and '_' instead of '/', and also omits the '=' padding character.
I'd also add a comma before "and also"
The base64 code this extends is the RFC 2045 variant while base64url is based
on base64 from RFC 3548 (obsoleted by RFC 4648). AFAICT this is not a problem
here but has anyone else verified this?I don't see how this can be a problem in practice.
The conversions are straightforward,
and the codepath used with url=true is a new one and doesn't change past behavior.Here’s a v6; necessary because func.sgml was split .
No other changes compared to v5.v6 introduced some whitespace errors in the regression files.
Here's a v7 that fixes that
While the patch looks good to me I have one question:
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence", - pg_mblen(s - 1), s - 1))); + errmsg("invalid symbol \"%.*s\" found while decoding %s sequence", + pg_mblen(s - 1), s - 1, + url ? "base64url" : "base64")));The above change makes the error message mention the encoding name
properly. On the other hand, in pg_base64_decode_internal() there are
two places where we report invalid data and always mention 'based64'
in the error message:ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("unexpected \"=\" while decoding base64 sequence")));and
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("invalid base64 end sequence"),
errhint("Input data is missing padding, is truncated, or is
otherwise corrupted.")));Do we need to have a similar change for these messages?
Good catch, Masahiko-san. They shouldn't be hardcoded either.
I've updated that and also the wording in the regression tests, too.
Thank you for updating the patch! I've done additional tests in my
environment and all test cases passed. One very minor comment is that
we might want to add 'BASE64URL' to:
/*
* BASE64
*/
Overall, the patch looks good to me. I'll wait for Daniel as he has
polished this patch and might have some comments or want to take it.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 17 Sep 2025, at 19:51, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Thank you for updating the patch! I've done additional tests in my
environment and all test cases passed. One very minor comment is that
we might want to add 'BASE64URL' to:/*
* BASE64
*/
I did that, and polished a few comments which had various version of case on
"base64url". The RFC only mentions it in all lowercase so I went with that
apart from the in the comment mentioned above.
The attached v9 has this and
Overall, the patch looks good to me. I'll wait for Daniel as he has
polished this patch and might have some comments or want to take it.
Agreed, I think this is ready to go in. If you want to push it then feel free,
else I'll take care of it tomorrow.
--
Daniel Gustafsson
Attachments:
v9-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v9-0001-Add-support-for-base64url-encoding-and-decoding.patch; x-unix-mode=0644Download
From 0681c94189e70506017835e55d86527fe71ff7ad Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Thu, 18 Sep 2025 21:16:04 +0200
Subject: [PATCH v9] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 150 +++++++++++++++++++----
src/test/regress/expected/strings.out | 150 +++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 352 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..9bab965f288 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..aa209d233c2 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -267,12 +267,15 @@ hex_dec_len(const char *src, size_t srclen)
}
/*
- * BASE64
+ * BASE64 and BASE64URL
*/
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -284,8 +287,15 @@ static const int8 b64lookup[128] = {
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
};
+/*
+ * pg_base64_encode_internal
+ *
+ * Helper for decoding base64 or base64url. When url is passed as true the
+ * input will be encoded using base64url. len bytes in src is encoded into
+ * dst.
+ */
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +303,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +317,64 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* handle remainder */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+/*
+ * pg_base64_decode_internal
+ *
+ * Helper for decoding base64 or base64url. When url is passed as true the
+ * input will be assumed to be encoded using base64url.
+ */
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +392,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -362,7 +413,7 @@ pg_base64_decode(const char *src, size_t len, char *dst)
else
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unexpected \"=\" while decoding base64 sequence")));
+ errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64")));
}
b = 0;
}
@@ -374,8 +425,9 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (b < 0)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +444,39 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64 end sequence"),
+ errmsg("invalid %s end sequence", url ? "base64url" : "base64"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +491,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For base64, each 4 characters of input produce at most 3 bytes of
+ * output. For base64url without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +708,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 2d6cb02ad60..691e475bce3 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2517,6 +2517,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- base64url encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+ERROR: invalid base64url end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64url sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 5ed421d6205..c05f3413699 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -799,6 +799,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- base64url encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.39.3 (Apple Git-146)
I reviewed and tested this patch. Overall looks good to me. Actually, I think this patched fixed a bug of current implementation of base64 encoding by moving the logic of handling newline into “if (pos<0)”.
Just a few small comments:
On Sep 19, 2025, at 03:19, Daniel Gustafsson <daniel@yesql.se> wrote:
<v9-0001-Add-support-for-base64url-encoding-and-decoding.patch>
1.
```
+ * Helper for decoding base64 or base64url. When url is passed as true the
+ * input will be encoded using base64url. len bytes in src is encoded into
+ * dst.
+ */
```
It’s not common to use two white-spaces after “.”, usually we need only one.
2.
```
+ /* handle remainder */
if (pos != 2)
```
The comment is understandable, but slightly vague: remainder of what?
Maybe rephrase to “handle remaining bytes in buf”.
3.
```
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unexpected \"=\" while decoding base64 sequence")));
+ errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64")));
```
This is a normal usage that injects sub-strings based on condition. However, PG doesn’t like that, see here: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 19 Sep 2025, at 6:50 AM, Chao Li <li.evan.chao@gmail.com> wrote:
Great to see you around Evan!
I reviewed and tested this patch. Overall looks good to me. Actually, I think this patched fixed a bug of current implementation of base64 encoding by moving the logic of handling newline into “if (pos<0)”.
IIUC what you mean, I can’t confirm that.
Both existing and new implementation handle new lines the same
SELECT decode(E'QUFB\nQUFB', 'base64url');
decode
----------------
\x414141414141
(1 row)
Just a few small comments:
On Sep 19, 2025, at 03:19, Daniel Gustafsson <daniel@yesql.se> wrote:
<v9-0001-Add-support-for-base64url-encoding-and-decoding.patch>
1. ``` + * Helper for decoding base64 or base64url. When url is passed as true the + * input will be encoded using base64url. len bytes in src is encoded into + * dst. + */ ```It’s not common to use two white-spaces after “.”, usually we need only one.
I agree with this
2.
```
+ /* handle remainder */
if (pos != 2)
```The comment is understandable, but slightly vague: remainder of what?
Maybe rephrase to “handle remaining bytes in buf”.
Agree too.
3. ``` ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unexpected \"=\" while decoding base64 sequence"))); + errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64"))); ```This is a normal usage that injects sub-strings based on condition. However, PG doesn’t like that, see here: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Well, that’s a very interesting catch.
I’ll let a comitter confirm & advise.
On Sep 19, 2025, at 14:45, Florents Tselai <florents.tselai@gmail.com> wrote:
I reviewed and tested this patch. Overall looks good to me. Actually, I think this patched fixed a bug of current implementation of base64 encoding by moving the logic of handling newline into “if (pos<0)”.
IIUC what you mean, I can’t confirm that.
Both existing and new implementation handle new lines the same
SELECT decode(E'QUFB\nQUFB', 'base64url');
decode
----------------
\x414141414141
(1 row)
The current implementation isn’t actually wrong, but at least not optimized as your version. Because we don’t need to check “if (p >= lend)” after p is bumped, and only when “if (pos <0)”, p is bumped.
3. ``` ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unexpected \"=\" while decoding base64 sequence"))); + errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64"))); ```This is a normal usage that injects sub-strings based on condition. However, PG doesn’t like that, see here: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Well, that’s a very interesting catch.
I’ll let a comitter confirm & advise.
I got to know this because once I reviewed a Tom Lane’s patch, it had the similarly situation, but Tom wrote code like:
```
If (something)
Ereport(“function xxx”)
Else
Ereport(“procedure xxx”)
```
I raised a comment to suggest avoid duplicate code in the way like your code do, and I got a response with “no” and the link.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 19 Sep 2025, at 08:45, Florents Tselai <florents.tselai@gmail.com> wrote:
On 19 Sep 2025, at 6:50 AM, Chao Li <li.evan.chao@gmail.com> wrote:
It’s not common to use two white-spaces after “.”, usually we need only one.
I agree with this
This might date me (and others) but double-space after period was the norm for
monospaced typesetting back in the clackety-clack typewriter days, and that
carried over into monospace font text in computers. The fmt program still use
double-space after period (which is what formatted my reply here, thus the use
of double-space in my emails). While there is no hard rule in postgres
(AFAIK), a quick regex shows that it's 2.5x more common for sentences in
comments to have two space after punctuation.
The comment is understandable, but slightly vague: remainder of what?
Maybe rephrase to “handle remaining bytes in buf”.
Agree too.
I don't think the comment was all that vague in the context of reading the
code, but expanding won't hurt so done.
ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("unexpected \"=\" while decoding base64 sequence"))); + errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64"))); ```This is a normal usage that injects sub-strings based on condition. However, PG doesn’t like that, see here: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Well, that’s a very interesting catch.
I’ll let a comitter confirm & advise.
Yes and no, the recommendation against constructing sentences at runtime is to
aid translators since the injected string isn't available to them. In this
(and I hope all other) case the injected string should not be translated as it
is a name of an encoding scheme. What we can do is to add a /* translator: ..
*/ comment which will end up in the translation file and give the translator
context on what %s will be replaced by. Done in the attached.
--
Daniel Gustafsson
Attachments:
v10-0001-Add-support-for-base64url-encoding-and-decoding.patchapplication/octet-stream; name=v10-0001-Add-support-for-base64url-encoding-and-decoding.patch; x-unix-mode=0644Download
From 5d4b2bc9b8eed5628512285b19b36b03d038045a Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Fri, 19 Sep 2025 09:14:48 +0200
Subject: [PATCH v10] Add support for base64url encoding and decoding
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This adds support for base64url encoding and decoding, a base64
variant which is safe to use in filenames and URLs. base64url
replaces '+' in the base64 alphabet with '-' and '/' with '_',
thus making it safe for URL addresses and file systems.
Support for base64url was originally suggested by Przemysław Sztoch.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: David E. Wheeler <david@justatheory.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Chao Li (Evan) <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/70f2b6a8-486a-4fdb-a951-84cef35e22ab@sztoch.pl
---
doc/src/sgml/func/func-binarystring.sgml | 19 +++
src/backend/utils/adt/encode.c | 157 ++++++++++++++++++++---
src/test/regress/expected/strings.out | 150 ++++++++++++++++++++++
src/test/regress/sql/strings.sql | 54 ++++++++
4 files changed, 359 insertions(+), 21 deletions(-)
diff --git a/doc/src/sgml/func/func-binarystring.sgml b/doc/src/sgml/func/func-binarystring.sgml
index 78814ee0685..9bab965f288 100644
--- a/doc/src/sgml/func/func-binarystring.sgml
+++ b/doc/src/sgml/func/func-binarystring.sgml
@@ -728,6 +728,7 @@
Encodes binary data into a textual representation; supported
<parameter>format</parameter> values are:
<link linkend="encode-format-base64"><literal>base64</literal></link>,
+ <link linkend="encode-format-base64url"><literal>base64url</literal></link>,
<link linkend="encode-format-escape"><literal>escape</literal></link>,
<link linkend="encode-format-hex"><literal>hex</literal></link>.
</para>
@@ -785,6 +786,24 @@
</listitem>
</varlistentry>
+ <varlistentry id="encode-format-base64url">
+ <term>base64url
+ <indexterm>
+ <primary>base64url format</primary>
+ </indexterm></term>
+ <listitem>
+ <para>
+ The <literal>base64url</literal> format is that of
+ <ulink url="https://datatracker.ietf.org/doc/html/rfc4648#section-5">
+ RFC 4648 Section 5</ulink>, a <literal>base64</literal> variant safe to
+ use in filenames and URLs. The <literal>base64url</literal> alphabet
+ use <literal>'-'</literal> instead of <literal>'+'</literal> and
+ <literal>'_'</literal> instead of <literal>'/'</literal> and also omits
+ the <literal>'='</literal> padding character.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="encode-format-escape">
<term>escape
<indexterm>
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 4ccaed815d1..589c34f6365 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -267,12 +267,15 @@ hex_dec_len(const char *src, size_t srclen)
}
/*
- * BASE64
+ * BASE64 and BASE64URL
*/
static const char _base64[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+static const char _base64url[] =
+"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
+
static const int8 b64lookup[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -284,8 +287,15 @@ static const int8 b64lookup[128] = {
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1,
};
+/*
+ * pg_base64_encode_internal
+ *
+ * Helper for decoding base64 or base64url. When url is passed as true the
+ * input will be encoded using base64url. len bytes in src is encoded into
+ * dst.
+ */
static uint64
-pg_base64_encode(const char *src, size_t len, char *dst)
+pg_base64_encode_internal(const char *src, size_t len, char *dst, bool url)
{
char *p,
*lend = dst + 76;
@@ -293,6 +303,7 @@ pg_base64_encode(const char *src, size_t len, char *dst)
*end = src + len;
int pos = 2;
uint32 buf = 0;
+ const char *alphabet = url ? _base64url : _base64;
s = src;
p = dst;
@@ -306,33 +317,64 @@ pg_base64_encode(const char *src, size_t len, char *dst)
/* write it out */
if (pos < 0)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = _base64[(buf >> 6) & 0x3f];
- *p++ = _base64[buf & 0x3f];
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ *p++ = alphabet[buf & 0x3f];
pos = 2;
buf = 0;
- }
- if (p >= lend)
- {
- *p++ = '\n';
- lend = p + 76;
+
+ if (!url && p >= lend)
+ {
+ *p++ = '\n';
+ lend = p + 76;
+ }
}
}
+
+ /* Handle remaining bytes in buf */
if (pos != 2)
{
- *p++ = _base64[(buf >> 18) & 0x3f];
- *p++ = _base64[(buf >> 12) & 0x3f];
- *p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
- *p++ = '=';
+ *p++ = alphabet[(buf >> 18) & 0x3f];
+ *p++ = alphabet[(buf >> 12) & 0x3f];
+
+ if (pos == 0)
+ {
+ *p++ = alphabet[(buf >> 6) & 0x3f];
+ if (!url)
+ *p++ = '=';
+ }
+ else if (!url)
+ {
+ *p++ = '=';
+ *p++ = '=';
+ }
}
return p - dst;
}
static uint64
-pg_base64_decode(const char *src, size_t len, char *dst)
+pg_base64_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_encode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_encode_internal(src, len, dst, true);
+}
+
+/*
+ * pg_base64_decode_internal
+ *
+ * Helper for decoding base64 or base64url. When url is passed as true the
+ * input will be assumed to be encoded using base64url.
+ */
+static uint64
+pg_base64_decode_internal(const char *src, size_t len, char *dst, bool url)
{
const char *srcend = src + len,
*s = src;
@@ -350,6 +392,15 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
continue;
+ /* convert base64url to base64 */
+ if (url)
+ {
+ if (c == '-')
+ c = '+';
+ else if (c == '_')
+ c = '/';
+ }
+
if (c == '=')
{
/* end sequence */
@@ -360,9 +411,12 @@ pg_base64_decode(const char *src, size_t len, char *dst)
else if (pos == 3)
end = 2;
else
+ {
+ /* translator: %s is a encoding scheme, either base64 or base64url */
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("unexpected \"=\" while decoding base64 sequence")));
+ errmsg("unexpected \"=\" while decoding %s sequence", url ? "base64url" : "base64")));
+ }
}
b = 0;
}
@@ -372,10 +426,14 @@ pg_base64_decode(const char *src, size_t len, char *dst)
if (c > 0 && c < 127)
b = b64lookup[(unsigned char) c];
if (b < 0)
+ {
+ /* translator: %s is a encoding scheme, either base64 or base64url */
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid symbol \"%.*s\" found while decoding base64 sequence",
- pg_mblen(s - 1), s - 1)));
+ errmsg("invalid symbol \"%.*s\" found while decoding %s sequence",
+ pg_mblen(s - 1), s - 1,
+ url ? "base64url" : "base64")));
+ }
}
/* add it to buffer */
buf = (buf << 6) + b;
@@ -392,15 +450,40 @@ pg_base64_decode(const char *src, size_t len, char *dst)
}
}
- if (pos != 0)
+ if (pos == 2)
+ {
+ buf <<= 12;
+ *p++ = (buf >> 16) & 0xFF;
+ }
+ else if (pos == 3)
+ {
+ buf <<= 6;
+ *p++ = (buf >> 16) & 0xFF;
+ *p++ = (buf >> 8) & 0xFF;
+ }
+ else if (pos != 0)
+ {
+ /* translator: %s is a encoding scheme, either base64 or base64url */
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("invalid base64 end sequence"),
+ errmsg("invalid %s end sequence", url ? "base64url" : "base64"),
errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
+ }
return p - dst;
}
+static uint64
+pg_base64_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, false);
+}
+
+static uint64
+pg_base64url_decode(const char *src, size_t len, char *dst)
+{
+ return pg_base64_decode_internal(src, len, dst, true);
+}
static uint64
pg_base64_enc_len(const char *src, size_t srclen)
@@ -415,6 +498,32 @@ pg_base64_dec_len(const char *src, size_t srclen)
return ((uint64) srclen * 3) >> 2;
}
+static uint64
+pg_base64url_enc_len(const char *src, size_t srclen)
+{
+ /*
+ * Unlike standard base64, base64url doesn't use padding characters when
+ * the input length is not divisible by 3
+ */
+ return (srclen + 2) / 3 * 4;
+}
+
+static uint64
+pg_base64url_dec_len(const char *src, size_t srclen)
+{
+ /*
+ * For base64, each 4 characters of input produce at most 3 bytes of
+ * output. For base64url without padding, we need to round up to the
+ * nearest 4
+ */
+ size_t adjusted_len = srclen;
+
+ if (srclen % 4 != 0)
+ adjusted_len += 4 - (srclen % 4);
+
+ return (adjusted_len * 3) / 4;
+}
+
/*
* Escape
* Minimally escape bytea to text.
@@ -606,6 +715,12 @@ static const struct
pg_base64_enc_len, pg_base64_dec_len, pg_base64_encode, pg_base64_decode
}
},
+ {
+ "base64url",
+ {
+ pg_base64url_enc_len, pg_base64url_dec_len, pg_base64url_encode, pg_base64url_decode
+ }
+ },
{
"escape",
{
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 2d6cb02ad60..691e475bce3 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2517,6 +2517,156 @@ SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
\x1234567890abcdef00
(1 row)
+--
+-- base64url encoding/decoding
+--
+SET bytea_output TO hex;
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+ encode
+--------
+ abc-_w
+(1 row)
+
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+ decode
+----------------------
+ \x1234567890abcdef00
+(1 row)
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+ encode
+--------
+
+(1 row)
+
+SELECT decode('', 'base64url'); -- ''
+ decode
+--------
+ \x
+(1 row)
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+ encode
+--------
+ AQ
+(1 row)
+
+SELECT decode('AQ', 'base64url'); -- \x01
+ decode
+--------
+ \x01
+(1 row)
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+ encode
+--------
+ AQI
+(1 row)
+
+SELECT decode('AQI', 'base64url'); -- \x0102
+ decode
+--------
+ \x0102
+(1 row)
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+ encode
+--------
+ AQID
+(1 row)
+
+SELECT decode('AQID', 'base64url'); -- \x010203
+ decode
+----------
+ \x010203
+(1 row)
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+ encode
+--------
+ 3q2-7w
+(1 row)
+
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+ decode
+------------
+ \xdeadbeef
+(1 row)
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AA
+(1 row)
+
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAE
+(1 row)
+
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAEC
+(1 row)
+
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+ encode
+--------
+ AAECAw
+(1 row)
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+ERROR: invalid symbol "@" found while decoding base64url sequence
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+ decode
+--------
+ \x41
+(1 row)
+
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+ decode
+--------
+ \x4102
+(1 row)
+
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+ERROR: invalid base64url end sequence
+HINT: Input data is missing padding, is truncated, or is otherwise corrupted.
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+ERROR: unexpected "=" while decoding base64url sequence
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+ decode
+------------
+ \x69b73eff
+(1 row)
+
--
-- get_bit/set_bit etc
--
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 5ed421d6205..c05f3413699 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -799,6 +799,60 @@ SELECT decode(encode(('\x' || repeat('1234567890abcdef0001', 7))::bytea,
SELECT encode('\x1234567890abcdef00', 'escape');
SELECT decode(encode('\x1234567890abcdef00', 'escape'), 'escape');
+--
+-- base64url encoding/decoding
+--
+SET bytea_output TO hex;
+
+-- Simple encoding/decoding
+SELECT encode('\x69b73eff', 'base64url'); -- abc-_w
+SELECT decode('abc-_w', 'base64url'); -- \x69b73eff
+
+-- Round-trip: decode(encode(x)) = x
+SELECT decode(encode('\x1234567890abcdef00', 'base64url'), 'base64url'); -- \x1234567890abcdef00
+
+-- Empty input
+SELECT encode('', 'base64url'); -- ''
+SELECT decode('', 'base64url'); -- ''
+
+-- 1 byte input
+SELECT encode('\x01', 'base64url'); -- AQ
+SELECT decode('AQ', 'base64url'); -- \x01
+
+-- 2 byte input
+SELECT encode('\x0102'::bytea, 'base64url'); -- AQI
+SELECT decode('AQI', 'base64url'); -- \x0102
+
+-- 3 byte input (no padding needed)
+SELECT encode('\x010203'::bytea, 'base64url'); -- AQID
+SELECT decode('AQID', 'base64url'); -- \x010203
+
+-- 4 byte input (results in 6 base64 chars)
+SELECT encode('\xdeadbeef'::bytea, 'base64url'); -- 3q2-7w
+SELECT decode('3q2-7w', 'base64url'); -- \xdeadbeef
+
+-- Round-trip test for all lengths from 0–4
+SELECT encode(decode(encode(E'\\x', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x0001', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x000102', 'base64url'), 'base64url'), 'base64url');
+SELECT encode(decode(encode(E'\\x00010203', 'base64url'), 'base64url'), 'base64url');
+
+-- Invalid inputs (should ERROR)
+-- invalid character '@'
+SELECT decode('QQ@=', 'base64url');
+
+-- missing characters (incomplete group)
+SELECT decode('QQ', 'base64url'); -- ok (1 byte)
+SELECT decode('QQI', 'base64url'); -- ok (2 bytes)
+SELECT decode('QQIDQ', 'base64url'); -- ERROR: invalid base64url end sequence
+
+-- unexpected '=' at start
+SELECT decode('=QQQ', 'base64url');
+
+-- valid base64 padding in base64url (optional, but accepted)
+SELECT decode('abc-_w==', 'base64url'); -- should decode to \x69b73eff
+
--
-- get_bit/set_bit etc
--
--
2.39.3 (Apple Git-146)
On 19 Sep 2025, at 08:56, Chao Li <li.evan.chao@gmail.com> wrote:
On Sep 19, 2025, at 14:45, Florents Tselai <florents.tselai@gmail.com> wrote:
This is a normal usage that injects sub-strings based on condition. However, PG doesn’t like that, see here: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES
Well, that’s a very interesting catch.
I’ll let a comitter confirm & advise.I got to know this because once I reviewed a Tom Lane’s patch, it had the similarly situation, but Tom wrote code like:
```
If (something)
Ereport(“function xxx”)
Else
Ereport(“procedure xxx”)
```I raised a comment to suggest avoid duplicate code in the way like your code do, and I got a response with “no” and the link.
Tom is right (unsurprisingly) here, since "function" and "procedure" are terms
which are translated and depending on which is used it may change the sentence
structure in the target language.
In this case we inject a name which isn't to be translated, and that will
instead help the translator since they otherwise need to translate two strings
instead of just one (and they can move the %s to position the injected name
into the right place according to grammar rules).
--
Daniel Gustafsson
On 18 Sep 2025, at 21:19, Daniel Gustafsson <daniel@yesql.se> wrote:
.. else I'll take care of it tomorrow.
FWIW since there were new reviews and comments I wanted to allow some more time
for additional comments, so will do this over the weekend insteead.
--
Daniel Gustafsson
On 19 Sep 2025, at 23:04, Daniel Gustafsson <daniel@yesql.se> wrote:
On 18 Sep 2025, at 21:19, Daniel Gustafsson <daniel@yesql.se> wrote:
.. else I'll take care of it tomorrow.
FWIW since there were new reviews and comments I wanted to allow some more time
for additional comments, so will do this over the weekend insteead.
And, done.
--
Daniel Gustafsson