Error in calculating length of encoded base64 string
Greetings, everyone!
While working on an extension I've found an error in how length of
encoded base64 string is calulated;
This error is present in 3 files across all supported versions:
/src/common/base64.c, function pg_b64_enc_len;
/src/backend/utils/adt/encode.c, function pg_base64_enc_len;
/contrib/pgcrypto/pgp-armor.c, function pg_base64_enc_len (copied from
encode.c).
In all three cases the length is calculated as follows:
(srclen + 2) * 4 / 3; (plus linefeed in latter two cases)
There's also a comment /* 3 bytes will be converted to 4 */
This formula is wrong. Let's calculate encoded length for different
starting lengths:
starting length 2: (2 + 2) * 4 / 3 = 5,
starting length 3: (3 + 2) * 4 / 3 = 6,
starting length 4: (4 + 2) * 4 / 3 = 8,
starting length 6: (6 + 2) * 4 / 3 = 10,
starting length 10: (10 + 2) * 4 / 3 = 16,
when it should be 4, 4, 8, 8, 16.
So the suggestion is to change the formula to a right one: (srclen + 2)
/ 3 * 4;
The patch is attached.
Oleg Tselebrovskiy, Postgres Pro
Attachments:
base64_encoded_length.patchtext/x-diff; name=base64_encoded_length.patchDownload
diff --git a/contrib/pgcrypto/pgp-armor.c b/contrib/pgcrypto/pgp-armor.c
index 679779a6ac..9128756647 100644
--- a/contrib/pgcrypto/pgp-armor.c
+++ b/contrib/pgcrypto/pgp-armor.c
@@ -165,7 +165,7 @@ pg_base64_enc_len(unsigned srclen)
/*
* 3 bytes will be converted to 4, linefeed after 76 chars
*/
- return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
+ return (srclen + 2) / 3 * 4 + srclen / (76 * 3 / 4);
}
static unsigned
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index b92191de81..e5ac3ad23d 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -385,7 +385,7 @@ static uint64
pg_base64_enc_len(const char *src, size_t srclen)
{
/* 3 bytes will be converted to 4, linefeed after 76 chars */
- return ((uint64) srclen + 2) * 4 / 3 + (uint64) srclen / (76 * 3 / 4);
+ return ((uint64) srclen + 2) / 3 * 4 + (uint64) srclen / (76 * 3 / 4);
}
static uint64
diff --git a/src/common/base64.c b/src/common/base64.c
index 2943ac7652..ec4eb49382 100644
--- a/src/common/base64.c
+++ b/src/common/base64.c
@@ -224,7 +224,7 @@ int
pg_b64_enc_len(int srclen)
{
/* 3 bytes will be converted to 4 */
- return (srclen + 2) * 4 / 3;
+ return (srclen + 2) / 3 * 4;
}
/*
o.tselebrovskiy@postgrespro.ru writes:
While working on an extension I've found an error in how length of
encoded base64 string is calulated;
Yeah, I think you're right. It's not of huge significance, because
it just overestimates by 1 or 2 bytes, but we might as well get
it right. Thanks for the report and patch!
regards, tom lane
On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
o.tselebrovskiy@postgrespro.ru writes:
While working on an extension I've found an error in how length of
encoded base64 string is calulated;Yeah, I think you're right. It's not of huge significance, because
it just overestimates by 1 or 2 bytes, but we might as well get
it right. Thanks for the report and patch!
From your commit d98ed080bb
This bug is very ancient, dating to commit 79d78bb26 which
added encode.c. (The other instances were presumably copied
from there.) Still, it doesn't quite seem worth back-patching.
Is it worth investing time in trying to unify these 3 occurrences of
base64 length (and possibly other relevant) code to one place? If yes,
I can volunteer for it.
The common code facility under src/common/ did not exist back when
pgcrypto was added, but since it does now, it may be worth it make
others depend on implementation in src/common/ code.
Best regards,
Gurjeet
http://Gurje.et
Gurjeet Singh <gurjeet@singh.im> writes:
On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This bug is very ancient, dating to commit 79d78bb26 which
added encode.c. (The other instances were presumably copied
from there.) Still, it doesn't quite seem worth back-patching.
Is it worth investing time in trying to unify these 3 occurrences of
base64 length (and possibly other relevant) code to one place? If yes,
I can volunteer for it.
I wondered about that too. It seems really silly that we made
a copy in src/common and did not replace the others with calls
to that.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
Gurjeet Singh <gurjeet@singh.im> writes:
On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This bug is very ancient, dating to commit 79d78bb26 which
added encode.c. (The other instances were presumably copied
from there.) Still, it doesn't quite seem worth back-patching.Is it worth investing time in trying to unify these 3 occurrences of
base64 length (and possibly other relevant) code to one place? If yes,
I can volunteer for it.I wondered about that too. It seems really silly that we made
a copy in src/common and did not replace the others with calls
to that.
Also, while we're at it, how about some unit tests that both encode and
calculate the encoded length of strings of various lengths and check
that they match?
- ilmari
On 2023-Jun-09, Tom Lane wrote:
Gurjeet Singh <gurjeet@singh.im> writes:
On Thu, Jun 8, 2023 at 7:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
This bug is very ancient, dating to commit 79d78bb26 which
added encode.c. (The other instances were presumably copied
from there.) Still, it doesn't quite seem worth back-patching.Is it worth investing time in trying to unify these 3 occurrences of
base64 length (and possibly other relevant) code to one place? If yes,
I can volunteer for it.I wondered about that too. It seems really silly that we made
a copy in src/common and did not replace the others with calls
to that.
I looked into this. It turns out that there is a difference in newline
handling in the other routines compared to what was added for SCRAM,
which doesn't have any (and complains if you supply them). Peter E
did suggest to unify them at the time:
/messages/by-id/947b9aff-8fdb-dbf5-a99c-0ffd4523a73f@2ndquadrant.com
We could add a boolean "whitespace" flag to both of
src/common/base64.c's pg_b64_encode() and pg_b64_decode(); with that I
think it could serve the three places that need it.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/