9.4. String Functions and Operators page does not document that encode adds line breaks

Started by PG Bug reporting formabout 6 years ago6 messagesdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/functions-string.html
Description:

It took me a long time to discover why a base 64 encoded SHA-512 hash was 89
characters long instead of the expected 88. The documentation does not
mention that the encode function inserts newlines after 76 characters.
Please make a note of this behavior.

By the way, this is a very poor design decision. The function has no
knowledge of how the string is going to be used. If it is going to be
displayed on an 80-character terminal, then the newline makes sense. If it
is going to be written to a PEM-encoded file, then the newline is to be
expected. But I'm inserting the result into a VARCHAR(88) column and
comparing with base-64 encoded strings from Node.js. There is no reason for
the results to be terminal or file friendly. Instead, they should be machine
friendly. The decision to add newlines should have been made on display or
on creation of the PEM file, where that information becomes available. The
workaround of trimming whitespace characters from the encoded string is ugly
and unacceptable.

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

On Sat, Feb 8, 2020 at 12:10 PM PG Doc comments form <noreply@postgresql.org>
wrote:

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/12/functions-string.html
Description:

It took me a long time to discover why a base 64 encoded SHA-512 hash was
89
characters long instead of the expected 88. The documentation does not
mention that the encode function inserts newlines after 76 characters.
Please make a note of this behavior.

Patch submissions are welcomed. Though there is an argument for this being
an implementation detail one shouldn't rely upon and therefore should not
be described in user-facing documentation.

By the way, this is a very poor design decision.

It seems to be something we inherited some 20 years ago and are not likely
to change even though I suspect you will find general agreement with your
position. Though since its isn't documented maybe changing it would be ok.

The workaround of trimming whitespace characters from the encoded string is

ugly
and unacceptable.

It may be a bit ugly but when dealing with base64, specifically when
decoding, whitespace of this nature is expressly allowed. Its historical
presence here, based upon MIME requirements prevalent at the time the code
was written, doesn't alter its meaning and so it somewhat rightfully
considered an implementation detail that is not necessary to document.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

PG Doc comments form <noreply@postgresql.org> writes:

It took me a long time to discover why a base 64 encoded SHA-512 hash was 89
characters long instead of the expected 88. The documentation does not
mention that the encode function inserts newlines after 76 characters.
Please make a note of this behavior.

That was done a few weeks ago in HEAD:

https://www.postgresql.org/docs/devel/functions-binarystring.html

The base64 format is that of RFC 2045 Section 6.8. As per the RFC,
encoded lines are broken at 76 characters. However instead of the MIME
CRLF end-of-line marker, only a newline is used for end-of-line. The
decode function ignores carriage-return, newline, space, and tab
characters. Otherwise, an error is raised when decode is supplied
invalid base64 data — including when trailing padding is incorrect.

By the way, this is a very poor design decision.

So far as I can tell, that RFC's requirement for line breaks has not
been removed by any later RFC. So you're complaining to the wrong
people.

regards, tom lane

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#3)
Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

On Sun, Feb 9, 2020 at 9:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Doc comments form <noreply@postgresql.org> writes:

The base64 format is that of RFC 2045 Section 6.8. As per the RFC,
encoded lines are broken at 76 characters

By the way, this is a very poor design decision.

So far as I can tell, that RFC's requirement for line breaks has not
been removed by any later RFC. So you're complaining to the wrong
people.

Stating direct RFC4648 compliance would require us to drop the line breaks
that are only being added due to using MIME rules which ideally our general
encoding function would not do. Greenfield we probably would want base64
to be general RFC4648 and add something like base64-mime which performs the
line breaking for the user per RFC 2045, base64-pem which would use that
specific environments RFC rules. Now, maybe we can add "base64-4648" or
"base64-general" while leaving "base64" alone and using the MIME version of
the rules?

David J.

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David G. Johnston (#4)
Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

On 2020-Feb-09, David G. Johnston wrote:

Stating direct RFC4648 compliance would require us to drop the line breaks
that are only being added due to using MIME rules which ideally our general
encoding function would not do. Greenfield we probably would want base64
to be general RFC4648 and add something like base64-mime which performs the
line breaking for the user per RFC 2045, base64-pem which would use that
specific environments RFC rules. Now, maybe we can add "base64-4648" or
"base64-general" while leaving "base64" alone and using the MIME version of
the rules?

Patches welcome.

I'm not sure that we *need* to preserve the historical behavior. Many
people would probably be okay with encode('base64') returning no
newlines (since they are useless most of the time anyway), and the
minority that does can use encode('base64-rfc2045').

Another idea might be to add an optional 'flags' option to encode(),
which are given to the encoder/decoder functions.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#5)
Re: 9.4. String Functions and Operators page does not document that encode adds line breaks

On Thu, Feb 27, 2020 at 03:32:56PM -0300, Alvaro Herrera wrote:

On 2020-Feb-09, David G. Johnston wrote:

Stating direct RFC4648 compliance would require us to drop the line breaks
that are only being added due to using MIME rules which ideally our general
encoding function would not do. Greenfield we probably would want base64
to be general RFC4648 and add something like base64-mime which performs the
line breaking for the user per RFC 2045, base64-pem which would use that
specific environments RFC rules. Now, maybe we can add "base64-4648" or
"base64-general" while leaving "base64" alone and using the MIME version of
the rules?

Patches welcome.

I'm not sure that we *need* to preserve the historical behavior. Many
people would probably be okay with encode('base64') returning no
newlines (since they are useless most of the time anyway), and the
minority that does can use encode('base64-rfc2045').

Another idea might be to add an optional 'flags' option to encode(),
which are given to the encoder/decoder functions.

I have had this force-wrap problem using Linux command-line tools. You
can see it when using xxd here on page 54:

https://momjian.us/main/writings/tls.pdf#page=54

xxd allows you to specify a maximum length, so I used -cols 999 to avoid
the wrap. Other times I used a tool to remove the newlines from the
output. I think you should just use the existing Postgres SQL string
functions to remove the newlines.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +