Patch to document base64 encoding

Started by Karl O. Pincalmost 7 years ago48 messages

Karl O. Pinc

kop@meme.com

almost 7 years ago

1 attachment(s)

Hi,

Doc patch, against master. Documents encode() and decode() base64
format.

Builds for me.

Attached: doc_base64_v1.patch

References RFC2045 section 6.8 to define base64.

Because encode() and decode() show up in both the string
functions section and the binary string functions section
I documented in only the string functions section and hyperlinked
"base64" in both sections to the new text.

Note that XML output can also generate base64 data. I suspect
this is done via the (different, src/common/base64.c)
pg_b64_encode() function which does not limit line length.
In any case this patch does not touch the XML documentation.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v1.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..f6b4b2c5a0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,13 +1772,16 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
         <literal>escape</literal> converts zero bytes and high-bit-set bytes to
         octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
         doubles backslashes.
@@ -2365,6 +2371,20 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>base64</primary>
+   </indexterm>
+
+   <para id="base64-encoding">
+     The base64 encoding of the <function>encode</function>
+     and <function>decode</function> functions is that of RFC2045 section 6.8.
+     As per the RFC, encoded lines are broken at 76 characters.  However
+     instead of the MIME CRLF end-of-line marker, only a newline is used for
+     end-of-line.  The carriage-return, newline, space, and tab characters are
+     ignored by <function>decode</function>.  Otherwise, an error is raised
+     when <function>decode</function> is supplied invalid base64 data.
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,13 +3597,16 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+       formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
        doubles backslashes.

Fabien COELHO

coelho@cri.ensmp.fr

almost 7 years ago

In reply to: Karl O. Pinc (#1)

Re: Patch to document base64 encoding

Hello Karl,

Doc patch, against master. Documents encode() and decode() base64
format.

It is already documented. Enhance documentation, though.

Builds for me.

For me as well. Looks ok.

Attached: doc_base64_v1.patch

References RFC2045 section 6.8 to define base64.

Did you consider referencing RFC 4648 instead?

--
Fabien.

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Fabien COELHO (#2)

1 attachment(s)

Re: Patch to document base64 encoding

Hi Fabien,

On Tue, 5 Mar 2019 07:09:01 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Doc patch, against master. Documents encode() and decode() base64
format.

It is already documented. Enhance documentation, though.

Right. I was thinking that there are various implementations
of the base64 data format and so it needed more than
just to be named.

Attached: doc_base64_v1.patch

References RFC2045 section 6.8 to define base64.

Did you consider referencing RFC 4648 instead?

Not really. What drew me to document was the line
breaks every 76 characters. So I pretty much went
straight to the MIME RFC which says there should
be breaks at 76 characters.

I can see advantages and disadvantages either way.
More or less extraneous information either semi
or not base64 related in either RFC.
Which RFC do you think should be referenced?

Attached: doc_base64_v2.patch

This new version adds a phrase clarifying that
decode errors are raised when trailing padding
is wrong. Seemed like I may as well be explicit.

(I am not entirely pleased with the double dash
but can't come up with anything better. And
can't make an emdash entity work either.)

Thanks for taking a look.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v2.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..95b8f5cd5a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,13 +1772,16 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
         <literal>escape</literal> converts zero bytes and high-bit-set bytes to
         octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
         doubles backslashes.
@@ -2365,6 +2371,21 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>base64</primary>
+   </indexterm>
+
+   <para id="base64-encoding">
+     The base64 encoding of the <function>encode</function>
+     and <function>decode</function> functions is that of RFC2045 section 6.8.
+     As per the RFC, encoded lines are broken at 76 characters.  However
+     instead of the MIME CRLF end-of-line marker, only a newline is used for
+     end-of-line.  The carriage-return, newline, space, and tab characters are
+     ignored by <function>decode</function>.  Otherwise, an error is raised
+     when <function>decode</function> is supplied invalid base64 data --
+     including when trailing padding is incorrect.
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,13 +3598,16 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+       formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
        doubles backslashes.

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Karl O. Pinc (#3)

1 attachment(s)

Re: Patch to document base64 encoding

On Tue, 5 Mar 2019 07:26:17 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

(I am not entirely pleased with the double dash
but can't come up with anything better. And
can't make an emdash entity work either.)

Attached: doc_base64_v3.patch

There is an mdash entity. This patch uses that.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v3.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..e5ae322faa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,13 +1772,16 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
         <literal>escape</literal> converts zero bytes and high-bit-set bytes to
         octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
         doubles backslashes.
@@ -2365,6 +2371,21 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>base64</primary>
+   </indexterm>
+
+   <para id="base64-encoding">
+     The base64 encoding of the <function>encode</function>
+     and <function>decode</function> functions is that of RFC2045 section 6.8.
+     As per the RFC, encoded lines are broken at 76 characters.  However
+     instead of the MIME CRLF end-of-line marker, only a newline is used for
+     end-of-line.  The carriage-return, newline, space, and tab characters are
+     ignored by <function>decode</function>.  Otherwise, an error is raised
+     when <function>decode</function> is supplied invalid base64 data &mdash;
+     including when trailing padding is incorrect.
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,13 +3598,16 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+       formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
        doubles backslashes.

Fabien COELHO

coelho@cri.ensmp.fr

almost 7 years ago

In reply to: Karl O. Pinc (#4)

Re: Patch to document base64 encoding

Hello Karl,

Attached: doc_base64_v3.patch

I'm ok with referencing the historical MIME RFC.

"RFC2045 section 6.8" -> "RFC 2045 Section 6.8"

you can link to the RFC directly with:

<ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC 2045
Section 6.8</ulink>

--
Fabien.

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Fabien COELHO (#5)

1 attachment(s)

Re: Patch to document base64 encoding

Hi Fabien,

On Tue, 5 Mar 2019 23:02:26 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Attached: doc_base64_v3.patch

I'm ok with referencing the historical MIME RFC.

For the record, RFC 2045 is updated but not
yet obsolete. The updates don't invalidate
section 6.8.

"RFC2045 section 6.8" -> "RFC 2045 Section 6.8"

you can link to the RFC directly with:

<ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC 2045
Section 6.8</ulink>

Done.

Attached: doc_base64_v4.patch

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v4.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..06dfd84d22 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,13 +1772,16 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
         <literal>escape</literal> converts zero bytes and high-bit-set bytes to
         octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
         doubles backslashes.
@@ -2365,6 +2371,23 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>base64</primary>
+   </indexterm>
+
+   <para id="base64-encoding">
+     The base64 encoding of the <function>encode</function>
+     and <function>decode</function> functions is that
+     of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC 2045
+     section 6.8</ulink>.  As per the RFC, encoded lines are broken at 76
+     characters.  However instead of the MIME CRLF end-of-line marker, only a
+     newline is used for end-of-line.  The carriage-return, newline, space,
+     and tab characters are ignored by <function>decode</function>.
+     Otherwise, an error is raised when <function>decode</function> is
+     supplied invalid base64 data &mdash; including when trailing padding is
+     incorrect.
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,13 +3600,16 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+       formats are: <link linkend="base64-encoding"><literal>base64</literal></link>, <literal>hex</literal>, <literal>escape</literal>.
        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
        doubles backslashes.

Michael Paquier

michael@paquier.xyz

almost 7 years ago

In reply to: Karl O. Pinc (#6)

Re: Patch to document base64 encoding

On Tue, Mar 05, 2019 at 07:55:22PM -0600, Karl O. Pinc wrote:

Attached: doc_base64_v4.patch

Details about the "escape" mode are already available within the
description of function "encode". Wouldn't we want to consolidate a
description for all the modes at the same place, including some words
for hex? Your patch only includes the description of base64, which is
a good addition, still not consistent with the rest. A paragraph
after all the functions listed is fine I think as the description is
long so it would bloat the table if included directly.
--
Michael

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Michael Paquier (#7)

1 attachment(s)

Re: Patch to document base64 encoding

On Wed, 6 Mar 2019 11:27:38 +0900
Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Mar 05, 2019 at 07:55:22PM -0600, Karl O. Pinc wrote:

Attached: doc_base64_v4.patch

Details about the "escape" mode are already available within the
description of function "encode". Wouldn't we want to consolidate a
description for all the modes at the same place, including some words
for hex? Your patch only includes the description of base64, which is
a good addition, still not consistent with the rest. A paragraph
after all the functions listed is fine I think as the description is
long so it would bloat the table if included directly.

Makes sense. (As did hyperlinking to the RFC.)

(No matter how simple I think a patch is going to be it
always turns into a project. :)

Attached: doc_base64_v5.patch

Made index entries for hex and escape encodings.

Added word "encoding" to index entries.

Made <varlist> entries with terms for
base64, hex, and escape encodings.

Added documentation for hex and escape encodings,
including output formats and what are acceptable
inputs.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v5.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..bd337fd530 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,16 +1772,25 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>.
        </entry>
        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
        <entry><literal>MTIzAAE=</literal></entry>
@@ -2365,6 +2377,84 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para>
+     The string and the binary <function>encode</function>
+     and <function>decode</function> functions support the following
+     encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The carriage-return, newline, space, and tab characters are
+             ignored by <function>decode</function>.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             <literal>hex</literal> represents each 4 bits of data as a single
+             hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             <literal>escape</literal> converts zero bytes and high-bit-set
+             bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts any number of
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,16 +3667,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
+       formats are:
+       <link linkend="base64-encoding"><literal>base64</literal></link>,
+       <link linkend="hex-encoding"><literal>hex</literal></link>,
+       <link linkend="escape-encoding"><literal>escape</literal></link>.
       </entry>
       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
       <entry><literal>123\000456</literal></entry>

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Karl O. Pinc (#8)

1 attachment(s)

Re: Patch to document base64 encoding

On Tue, 5 Mar 2019 23:23:20 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

Added documentation for hex and escape encodings,
including output formats and what are acceptable
inputs.

Attached: doc_base64_v6.patch

Added index entries for encode and decode functions
above the encoding documentation. As index entries
are currently generated this does not make any
difference. But this paragraph is very far
from the other encode/decode index targets on
the page.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v6.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..bf724d3dee 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,16 +1772,25 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>.
        </entry>
        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
        <entry><literal>MTIzAAE=</literal></entry>
@@ -2365,6 +2377,90 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para>
+     The string and the binary <function>encode</function>
+     and <function>decode</function> functions support the following
+     encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The carriage-return, newline, space, and tab characters are
+             ignored by <function>decode</function>.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             <literal>hex</literal> represents each 4 bits of data as a single
+             hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             <literal>escape</literal> converts zero bytes and high-bit-set
+             bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts any number of
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,16 +3673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
+       formats are:
+       <link linkend="base64-encoding"><literal>base64</literal></link>,
+       <link linkend="hex-encoding"><literal>hex</literal></link>,
+       <link linkend="escape-encoding"><literal>escape</literal></link>.
       </entry>
       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
       <entry><literal>123\000456</literal></entry>

#10

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Karl O. Pinc (#9)

1 attachment(s)

Re: Patch to document base64 encoding

On Wed, 6 Mar 2019 07:09:48 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

On Tue, 5 Mar 2019 23:23:20 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

Added documentation for hex and escape encodings,
including output formats and what are acceptable
inputs.

Attached: doc_base64_v7.patch

Improved escape decoding sentence.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v7.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..989344c98c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,16 +1772,25 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>.
        </entry>
        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
        <entry><literal>MTIzAAE=</literal></entry>
@@ -2365,6 +2377,90 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para>
+     The string and the binary <function>encode</function>
+     and <function>decode</function> functions support the following
+     encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The carriage-return, newline, space, and tab characters are
+             ignored by <function>decode</function>.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             <literal>hex</literal> represents each 4 bits of data as a single
+             hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             <literal>escape</literal> converts zero bytes and high-bit-set
+             bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,16 +3673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
+       formats are:
+       <link linkend="base64-encoding"><literal>base64</literal></link>,
+       <link linkend="hex-encoding"><literal>hex</literal></link>,
+       <link linkend="escape-encoding"><literal>escape</literal></link>.
       </entry>
       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
       <entry><literal>123\000456</literal></entry>

#11

Fabien COELHO

coelho@cri.ensmp.fr

almost 7 years ago

In reply to: Karl O. Pinc (#10)

Re: Patch to document base64 encoding

Attached: doc_base64_v7.patch

Patch applies cleanly, doc compiles, navigation tested and ok.

"... section 6.8" -> "... Section 6.8" (capital S).

"The string and the binary encode and decode functions..." sentence looks
strange to me, especially with the English article that I do not really
master, so maybe it is ok. I'd have written something more
straightforward, eg: "Functions encode and decode support the following
encodings:", and also I'd use a direct "Function <...>decode</...> ..."
rather than "The <function>decode</function> function ..." (twice).

Maybe I'd use the exact same grammatical structure for all 3 cases,
starting with "The <>whatever</> encoding converts bla bla bla" instead of
varying the sentences.

Otherwise, all explanations look both precise and useful to me.

--
Fabien.

#12

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Fabien COELHO (#11)

1 attachment(s)

Re: Patch to document base64 encoding

On Wed, 6 Mar 2019 19:30:16 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

"... section 6.8" -> "... Section 6.8" (capital S).

Fixed.

"The string and the binary encode and decode functions..." sentence
looks strange to me, especially with the English article that I do
not really master, so maybe it is ok. I'd have written something more
straightforward, eg: "Functions encode and decode support the
following encodings:",

It is an atypical construction because I want to draw attention that
this is documentation not only for the encode() and decode() in
section 9.4. String Functions and Operators but also for
the encode() and decode in section 9.5. Binary String Functions
and Operators. Although I can't think of a better approach
it makes me uncomfortable that documentation written in
one section applies equally to functions in a different section.

Do you think it would be useful to hyperlink the word "binary"
to section 9.5?

The idiomatic phrasing would be "Both the string and the binary
encode and decode functions..." but the word "both" adds
no information. Shorter is better.

and also I'd use a direct "Function
<...>decode</...> ..." rather than "The <function>decode</function>
function ..." (twice).

The straightforward English would be "Decode accepts...". The problem
is that this begins the sentence with the name of a function.
This does not work very well when the function name is all lower case,
and can have other problems where clarity is lost depending
on documentation output formatting.

I don't see a better approach.

Maybe I'd use the exact same grammatical structure for all 3 cases,
starting with "The <>whatever</> encoding converts bla bla bla"
instead of varying the sentences.

Agreed. Good idea. The first paragraph of each term has to
do with encoding and the second with decoding.
Uniformity in starting the second paragraphs helps make
this clear, even though the first paragraphs are not uniform.
With this I am not concerned that the first paragraphs
do not have a common phrasing that's very explicit about
being about encoding.

Adjusted.

Otherwise, all explanations look both precise and useful to me.

When writing I was slightly concerned about being overly precise;
permanently committing to behavior that might (possibly) be an artifact
of implementation. E.g., that hex decoding accepts both
upper and lower case A-F characters, what input is ignored
and what raises an error, etc. But it seems best
to document existing behavior, all of which has existed so long
anyway that changing it would be disruptive. If anybody cares
they can object.

I wrote the docs by reading the code and did only a little
actual testing to be sure that what I wrote is correct.
I also did not check for regression tests which confirm
the behavior I'm documenting. (It wouldn't hurt to have
such regression tests, if they don't already exist.
But writing regression tests is more than I want to take on
with this patch. Feel free to come up with tests. :-)

I'm confident that the behavior I documented is how PG behaves
but you should know what I did in case you want further
validation.

Attached: doc_base64_v8.patch

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v8.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..e756bf53ba 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,16 +1772,25 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>.
        </entry>
        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
        <entry><literal>MTIzAAE=</literal></entry>
@@ -2365,6 +2377,90 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para>
+     The string and the binary <function>encode</function>
+     and <function>decode</function> functions support the following
+     encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             <literal>hex</literal> represents each 4 bits of data as a single
+             hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             <literal>escape</literal> converts zero bytes and high-bit-set
+             bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,16 +3673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
+       formats are:
+       <link linkend="base64-encoding"><literal>base64</literal></link>,
+       <link linkend="hex-encoding"><literal>hex</literal></link>,
+       <link linkend="escape-encoding"><literal>escape</literal></link>.
       </entry>
       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
       <entry><literal>123\000456</literal></entry>

#13

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Karl O. Pinc (#12)

Re: Patch to document base64 encoding

Hi Fabien (and Michael),

On Wed, 6 Mar 2019 16:37:05 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

I'm confident that the behavior I documented is how PG behaves
but you should know what I did in case you want further
validation.

Attached: doc_base64_v8.patch

FYI. To avoid a stall in the patch submission process.

I notice that nobody has signed up as a reviewer for
this patch. When the patch looks "ready" it needs
to be marked as such at the PG commitfest website
and a committer will consider committing.

The commitfest URL is:

https://commitfest.postgresql.org/23/

No rush.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#14

Fabien COELHO

coelho@cri.ensmp.fr

almost 7 years ago

In reply to: Karl O. Pinc (#12)

Re: Patch to document base64 encoding

Hello Karl,

I registered as a reviewer in the CF app.

"The string and the binary encode and decode functions..." sentence
looks strange to me, especially with the English article that I do
not really master, so maybe it is ok. I'd have written something more
straightforward, eg: "Functions encode and decode support the
following encodings:",

It is an atypical construction because I want to draw attention that
this is documentation not only for the encode() and decode() in section
9.4. String Functions and Operators but also for the encode() and decode
in section 9.5. Binary String Functions and Operators. Although I can't
think of a better approach it makes me uncomfortable that documentation
written in one section applies equally to functions in a different
section.

People coming from the binary doc would have no reason to look at the
string paragraph anyway.

Do you think it would be useful to hyperlink the word "binary"
to section 9.5?

Hmmm... I think that the link is needed in the other direction.

I'd suggest (1) to use a simpler and direct sentence in the string
section, (2) to simplify/shorten the in cell description in the binary
section, and (3) to add an hyperlink from the binary section which would
point to the expanded explanation in the string section.

The idiomatic phrasing would be "Both the string and the binary
encode and decode functions..." but the word "both" adds
no information. Shorter is better.

Possibly, although "Both" would insist on the fact that it applies to the
two variants, which was your intention.

and also I'd use a direct "Function
<...>decode</...> ..." rather than "The <function>decode</function>
function ..." (twice).

The straightforward English would be "Decode accepts...". The problem
is that this begins the sentence with the name of a function.
This does not work very well when the function name is all lower case,
and can have other problems where clarity is lost depending
on documentation output formatting.

Yep.

I don't see a better approach.

I suggested "Function <>decode</> ...", which is the kind of thing we do
in academic writing to improve precision, because I thought it could be
better:-)

Maybe I'd use the exact same grammatical structure for all 3 cases,
starting with "The <>whatever</> encoding converts bla bla bla"
instead of varying the sentences.

Agreed. Good idea. The first paragraph of each term has to
do with encoding and the second with decoding.

Uniformity in starting the second paragraphs helps make
this clear, even though the first paragraphs are not uniform.
With this I am not concerned that the first paragraphs
do not have a common phrasing that's very explicit about
being about encoding.

Adjusted.

Cannot see it fully in the v8 patch:

- The <literal>base64</literal> encoding is
- <literal>hex</literal> represents
- <literal>escape</literal> converts

Otherwise, all explanations look both precise and useful to me.

When writing I was slightly concerned about being overly precise;

Hmmm. That is a technical documentation, a significant degree of precision
is expected.

--
Fabien.

#15

Karl O. Pinc

kop@meme.com

almost 7 years ago

In reply to: Fabien COELHO (#14)

3 attachment(s)

Re: Patch to document base64 encoding

Hi Fabien,

On Sun, 10 Mar 2019 08:15:35 +0100 (CET)
Fabien COELHO <coel

I registered as a reviewer in the CF app.

Thanks.

What's causing problems here is that the encode and decode
functions are listed in both the string functions section
and the binary functions section. A related but not-relevant
problem is that there are functions listed in the string
function section which take binary input.

I asked about this on IRC and the brief reply was
unflattering to the existing documentation.

So I'm going to fix this also. 3 patches attached:

doc_base64_part1_v9.patch

This moves functions taking bytea and other non-string
input into the binary string section, and vice versa.
Eliminates duplicate encode() and decode() documentation.

Affects: convert(bytea, name, name)
convert_from(bytea, name)
encode(bytea, text)
length(bytea, name)
quote_nullable(anytype)
to_hex(int or bigint)
decode(text, text)

Only moves, eliminates duplicates, and adjusts indentation.

doc_base64_part2_v9.patch

Cleanup wording after moving functions between sections.

doc_base64_part3_v9.patch

Documents base64, hex, and escape encode() and decode()
formats.

"The string and the binary encode and decode functions..." sentence
looks strange to me, especially with the English article that I do
not really master, so maybe it is ok. I'd have written something
more straightforward, eg: "Functions encode and decode support the
following encodings:",

It is an atypical construction because I want to draw attention
that this is documentation not only for the encode() and decode()
in section 9.4. String Functions and Operators but also for the
encode() and decode in section 9.5. Binary String Functions and
Operators. Although I can't think of a better approach it makes me
uncomfortable that documentation written in one section applies
equally to functions in a different section.

People coming from the binary doc would have no reason to look at the
string paragraph anyway.

Do you think it would be useful to hyperlink the word "binary"
to section 9.5?

Hmmm... I think that the link is needed in the other direction.

I'm not sure what you mean here or if it's still relevant.

I'd suggest (1) to use a simpler and direct sentence in the string
section, (2) to simplify/shorten the in cell description in the
binary section, and (3) to add an hyperlink from the binary section
which would point to the expanded explanation in the string section.

The idiomatic phrasing would be "Both the string and the binary
encode and decode functions..." but the word "both" adds
no information. Shorter is better.

Possibly, although "Both" would insist on the fact that it applies to
the two variants, which was your intention.

I think this is no longer relevant. Although I'm not sure what
you mean by 3. The format names already hyperlink back to the
string docs.

and also I'd use a direct "Function
<...>decode</...> ..." rather than "The <function>decode</function>
function ..." (twice).

The straightforward English would be "Decode accepts...". The
problem is that this begins the sentence with the name of a
function. This does not work very well when the function name is
all lower case, and can have other problems where clarity is lost
depending on documentation output formatting.

Yep.

I don't see a better approach.

I suggested "Function <>decode</> ...", which is the kind of thing we
do in academic writing to improve precision, because I thought it
could be better:-)

"Function <>decode</> ..." just does not work in English.

Maybe I'd use the exact same grammatical structure for all 3 cases,
starting with "The <>whatever</> encoding converts bla bla bla"
instead of varying the sentences.

Agreed. Good idea. The first paragraph of each term has to
do with encoding and the second with decoding.

Uniformity in starting the second paragraphs helps make
this clear, even though the first paragraphs are not uniform.
With this I am not concerned that the first paragraphs
do not have a common phrasing that's very explicit about
being about encoding.

Adjusted.

Cannot see it fully in the v8 patch:

- The <literal>base64</literal> encoding is
- <literal>hex</literal> represents
- <literal>escape</literal> converts

I did only the decode paras. I guess no reason not to make
the first paras uniform as well. Done.

I also alphabetized by format name.

I hope that 3 patches will make review easier.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_part3_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 22769e5031..14a587c281 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1711,13 +1711,18 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
+        <link linkend="encoding-options">Options
+        for <parameter>format</parameter></link> are same as
+        in <function>encode</function>.
        </entry>
        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
        <entry><literal>\x3132330001</literal></entry>
@@ -2263,6 +2268,89 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3499,22 +3587,31 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
       <row>
        <entry>
-         <indexterm>
-          <primary>encode</primary>
-         </indexterm>
+        <indexterm>
+         <primary>encode</primary>
+        </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>.
        </entry>
-       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-       <entry><literal>123\000456</literal></entry>
+       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+       <entry><literal>MTIzAAE=</literal></entry>
       </row>
 
       <row>

doc_base64_part1_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 03859a78ea..3d748b660f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1692,48 +1692,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>convert_to</primary>
         </indexterm>
         <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
@@ -1765,26 +1723,6 @@
       </row>
 
       <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
-      <row>
        <entry id="format">
         <indexterm>
          <primary>format</primary>
@@ -1852,19 +1790,6 @@
       </row>
 
       <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>lpad</primary>
@@ -2030,18 +1955,6 @@
       </row>
 
       <row>
-       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Coerce the given value to text and then quote it as a literal;
-        or, if the argument is null, return <literal>NULL</literal>.
-        Embedded single-quotes and backslashes are properly doubled.
-       </entry>
-       <entry><literal>quote_nullable(42.5)</literal></entry>
-       <entry><literal>'42.5'</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>regexp_match</primary>
@@ -2314,22 +2227,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>to_hex</primary>
-        </indexterm>
-        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
-        or <type>bigint</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
-        representation
-       </entry>
-       <entry><literal>to_hex(2147483647)</literal></entry>
-       <entry><literal>7fffffff</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>translate</primary>
         </indexterm>
         <literal><function>translate(<parameter>string</parameter> <type>text</type>,
@@ -3550,47 +3447,72 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         Remove the longest string containing only bytes appearing in
         <parameter>bytes</parameter> from the start and end of
         <parameter>string</parameter>
-      </entry>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>decode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert string to <parameter>dest_encoding</parameter>.  The
+        original encoding is specified by
+        <parameter>src_encoding</parameter>. The
+        <parameter>string</parameter> must be valid in this encoding.
+        Conversions can be defined by <command>CREATE CONVERSION</command>.
+        Also there are some predefined conversions. See <xref
+        linkend="conversion-names"/> for available conversions.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert_from</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Convert string to the database encoding.  The original encoding
+        is specified by <parameter>src_encoding</parameter>. The
+        <parameter>string</parameter> must be valid in this encoding.
+       </entry>
+       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+      </row>
+
+      <row>
+       <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
+        <parameter>format</parameter> <type>text</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Encode binary data into a textual representation.  Supported
+        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
+        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
+        doubles backslashes.
+       </entry>
+       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
+       <entry><literal>123\000456</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3622,45 +3544,70 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>string</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>string</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose', 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Calculates the MD5 hash of <parameter>string</parameter>,
+        returning the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Coerce the given value to text and then quote it as a literal;
+        or, if the argument is null, return <literal>NULL</literal>.
+        Embedded single-quotes and backslashes are properly doubled.
+       </entry>
+       <entry><literal>quote_nullable(42.5)</literal></entry>
+       <entry><literal>'42.5'</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3753,6 +3700,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>sha512('abc')</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>to_hex</primary>
+        </indexterm>
+        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
+        or <type>bigint</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
+        representation
+       </entry>
+       <entry><literal>to_hex(2147483647)</literal></entry>
+       <entry><literal>7fffffff</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>

doc_base64_part2_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3d748b660f..22769e5031 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1699,7 +1699,8 @@
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
+        Convert string to <parameter>dest_encoding</parameter>. See <xref
+        linkend="conversion-names"/> for available conversions.
        </entry>
        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
@@ -3284,7 +3285,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, and a few functions
+    which produce strings from other binary inputs.
    </para>
 
    <para>
@@ -3488,7 +3490,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry>
         Convert string to the database encoding.  The original encoding
         is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
+        <parameter>string</parameter> must be valid in this encoding. See
+        <xref linkend="conversion-names"/> for available conversions.
        </entry>
        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>

#16

Karl O. Pinc

kop@meme.com

over 6 years ago

In reply to: Karl O. Pinc (#15)

3 attachment(s)

Re: Patch to document base64 encoding

Er, ping. Nobody has reviewed the latest patchs.
They still apply to master...

I am re-attaching the patches. See descriptions
below.

On Mon, 11 Mar 2019 15:32:14 -0500
"Karl O. Pinc" <kop@meme.com> wrote:

On Sun, 10 Mar 2019 08:15:35 +0100 (CET)
Fabien COELHO <coel

What's causing problems here is that the encode and decode
functions are listed in both the string functions section
and the binary functions section. A related but not-relevant
problem is that there are functions listed in the string
function section which take binary input.

I asked about this on IRC and the brief reply was
unflattering to the existing documentation.

So I'm going to fix this also. 3 patches attached:

doc_base64_part1_v9.patch

This moves functions taking bytea and other non-string
input into the binary string section, and vice versa.
Eliminates duplicate encode() and decode() documentation.

Affects: convert(bytea, name, name)
convert_from(bytea, name)
encode(bytea, text)
length(bytea, name)
quote_nullable(anytype)
to_hex(int or bigint)
decode(text, text)

Only moves, eliminates duplicates, and adjusts indentation.

doc_base64_part2_v9.patch

Cleanup wording after moving functions between sections.

doc_base64_part3_v9.patch

Documents base64, hex, and escape encode() and decode()
formats.

"The string and the binary encode and decode functions..."
sentence looks strange to me, especially with the English
article that I do not really master, so maybe it is ok. I'd have
written something more straightforward, eg: "Functions encode
and decode support the following encodings:",

It is an atypical construction because I want to draw attention
that this is documentation not only for the encode() and decode()
in section 9.4. String Functions and Operators but also for the
encode() and decode in section 9.5. Binary String Functions and
Operators. Although I can't think of a better approach it makes me
uncomfortable that documentation written in one section applies
equally to functions in a different section.

People coming from the binary doc would have no reason to look at
the string paragraph anyway.

Do you think it would be useful to hyperlink the word "binary"
to section 9.5?

Hmmm... I think that the link is needed in the other direction.

I'm not sure what you mean here or if it's still relevant.

I'd suggest (1) to use a simpler and direct sentence in the string
section, (2) to simplify/shorten the in cell description in the
binary section, and (3) to add an hyperlink from the binary section
which would point to the expanded explanation in the string section.

The idiomatic phrasing would be "Both the string and the binary
encode and decode functions..." but the word "both" adds
no information. Shorter is better.

Possibly, although "Both" would insist on the fact that it applies
to the two variants, which was your intention.

I think this is no longer relevant. Although I'm not sure what
you mean by 3. The format names already hyperlink back to the
string docs.

and also I'd use a direct "Function
<...>decode</...> ..." rather than "The
<function>decode</function> function ..." (twice).

The straightforward English would be "Decode accepts...". The
problem is that this begins the sentence with the name of a
function. This does not work very well when the function name is
all lower case, and can have other problems where clarity is lost
depending on documentation output formatting.

Yep.

I don't see a better approach.

I suggested "Function <>decode</> ...", which is the kind of thing
we do in academic writing to improve precision, because I thought it
could be better:-)

"Function <>decode</> ..." just does not work in English.

Maybe I'd use the exact same grammatical structure for all 3
cases, starting with "The <>whatever</> encoding converts bla
bla bla" instead of varying the sentences.

Agreed. Good idea. The first paragraph of each term has to
do with encoding and the second with decoding.

Uniformity in starting the second paragraphs helps make
this clear, even though the first paragraphs are not uniform.
With this I am not concerned that the first paragraphs
do not have a common phrasing that's very explicit about
being about encoding.

Adjusted.

Cannot see it fully in the v8 patch:

- The <literal>base64</literal> encoding is
- <literal>hex</literal> represents
- <literal>escape</literal> converts

I did only the decode paras. I guess no reason not to make
the first paras uniform as well. Done.

I also alphabetized by format name.

I hope that 3 patches will make review easier.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_part1_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index da0f305981..5b3bc2496e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1795,48 +1795,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>convert_to</primary>
         </indexterm>
         <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
@@ -1868,26 +1826,6 @@
       </row>
 
       <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
-      <row>
        <entry id="format">
         <indexterm>
          <primary>format</primary>
@@ -1955,19 +1893,6 @@
       </row>
 
       <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>lpad</primary>
@@ -2133,18 +2058,6 @@
       </row>
 
       <row>
-       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Coerce the given value to text and then quote it as a literal;
-        or, if the argument is null, return <literal>NULL</literal>.
-        Embedded single-quotes and backslashes are properly doubled.
-       </entry>
-       <entry><literal>quote_nullable(42.5)</literal></entry>
-       <entry><literal>'42.5'</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>regexp_match</primary>
@@ -2417,22 +2330,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>to_hex</primary>
-        </indexterm>
-        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
-        or <type>bigint</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
-        representation
-       </entry>
-       <entry><literal>to_hex(2147483647)</literal></entry>
-       <entry><literal>7fffffff</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>translate</primary>
         </indexterm>
         <literal><function>translate(<parameter>string</parameter> <type>text</type>,
@@ -3653,47 +3550,72 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         Remove the longest string containing only bytes appearing in
         <parameter>bytes</parameter> from the start and end of
         <parameter>string</parameter>
-      </entry>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>decode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert string to <parameter>dest_encoding</parameter>.  The
+        original encoding is specified by
+        <parameter>src_encoding</parameter>. The
+        <parameter>string</parameter> must be valid in this encoding.
+        Conversions can be defined by <command>CREATE CONVERSION</command>.
+        Also there are some predefined conversions. See <xref
+        linkend="conversion-names"/> for available conversions.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert_from</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Convert string to the database encoding.  The original encoding
+        is specified by <parameter>src_encoding</parameter>. The
+        <parameter>string</parameter> must be valid in this encoding.
+       </entry>
+       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+      </row>
+
+      <row>
+       <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
+        <parameter>format</parameter> <type>text</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Encode binary data into a textual representation.  Supported
+        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
+        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
+        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
+        doubles backslashes.
+       </entry>
+       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
+       <entry><literal>123\000456</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3725,45 +3647,70 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>string</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>string</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose', 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Calculates the MD5 hash of <parameter>string</parameter>,
+        returning the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Coerce the given value to text and then quote it as a literal;
+        or, if the argument is null, return <literal>NULL</literal>.
+        Embedded single-quotes and backslashes are properly doubled.
+       </entry>
+       <entry><literal>quote_nullable(42.5)</literal></entry>
+       <entry><literal>'42.5'</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3856,6 +3803,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>sha512('abc')</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>to_hex</primary>
+        </indexterm>
+        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
+        or <type>bigint</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
+        representation
+       </entry>
+       <entry><literal>to_hex(2147483647)</literal></entry>
+       <entry><literal>7fffffff</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>

doc_base64_part2_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5b3bc2496e..af7b8284dc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1802,7 +1802,8 @@
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
+        Convert string to <parameter>dest_encoding</parameter>. See <xref
+        linkend="conversion-names"/> for available conversions.
        </entry>
        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
@@ -3387,7 +3388,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, and a few functions
+    which produce strings from other binary inputs.
    </para>
 
    <para>
@@ -3591,7 +3593,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry>
         Convert string to the database encoding.  The original encoding
         is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
+        <parameter>string</parameter> must be valid in this encoding. See
+        <xref linkend="conversion-names"/> for available conversions.
        </entry>
        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>

doc_base64_part3_v9.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index af7b8284dc..f81539e341 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1814,13 +1814,18 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
+        <link linkend="encoding-options">Options
+        for <parameter>format</parameter></link> are same as
+        in <function>encode</function>.
        </entry>
        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
        <entry><literal>\x3132330001</literal></entry>
@@ -2366,6 +2371,89 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3602,22 +3690,31 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
       <row>
        <entry>
-         <indexterm>
-          <primary>encode</primary>
-         </indexterm>
+        <indexterm>
+         <primary>encode</primary>
+        </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>.
        </entry>
-       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-       <entry><literal>123\000456</literal></entry>
+       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+       <entry><literal>MTIzAAE=</literal></entry>
       </row>
 
       <row>

#17

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Karl O. Pinc (#16)

Re: Patch to document base64 encoding

Er, ping. Nobody has reviewed the latest patchs.

Next CF is in July, two months away.

You might consider reviewing other people patches, that is expected to
make the overall process work. There are several documentation or
comment patches in the queue.

--
Fabien.

#18

Karl O. Pinc

kop@meme.com

over 6 years ago

In reply to: Fabien COELHO (#17)

Re: Patch to document base64 encoding

On Thu, 9 May 2019 06:50:12 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

You might consider reviewing other people patches, that is expected
to make the overall process work. There are several documentation or
comment patches in the queue.

Understood.

I thought I had built up some reviewing credit, from some time
ago. But perhaps that just made up for previous patches.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#19

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Karl O. Pinc (#15)

Re: Patch to document base64 encoding

Hello Karl,

doc_base64_part1_v9.patch
Only moves, eliminates duplicates, and adjusts indentation.

doc_base64_part2_v9.patch
Cleanup wording after moving functions between sections.

doc_base64_part3_v9.patch
Documents base64, hex, and escape encode() and decode()
formats.

I suggested "Function <>decode</> ...", which is the kind of thing we
do in academic writing to improve precision, because I thought it
could be better:-)

"Function <>decode</> ..." just does not work in English.

It really works in research papers: "Theorem X can be proven by applying
Proposition Y. See Figure 2 for details. Algorithm Z describes whatever,
which is listed in Table W..."

I also alphabetized by format name.

Good:-)

I hope that 3 patches will make review easier.

Not really. I'm reviewing the 3 patches put together rather than each one
individually, which would require more work.

Patch applies cleanly. Doc build ok.

I looked at the html output, and it seems ok, including navigating to
conversions or formats explanations.

This documentation patch is an overall improvement and clarifies things,
including some error conditions.

convert: I'd merge the 2 first sentences to state that if convert from X
to Y. The doc does not say explicitely what happens if a character cannot
be converted. After testing, an error is raised. The example comment could
add ", if possible".

to_hex: add "." at the end of the sentence?

Other descriptions seem ok.

Minor comment: you usually put two spaces between a "." and the first
world of then next sentence, but not always.

--
Fabien.

#20

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Fabien COELHO (#19)

1 attachment(s)

Re: Patch to document base64 encoding

Hi Fabien,

Attached is doc_base64_v10.patch

On Fri, 12 Jul 2019 15:58:21 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I suggested "Function <>decode</> ...", which is the kind of thing
we do in academic writing to improve precision, because I thought
it could be better:-)

"Function <>decode</> ..." just does not work in English.

It really works in research papers: "Theorem X can be proven by
applying Proposition Y. See Figure 2 for details. Algorithm Z
describes whatever, which is listed in Table W..."

I've not thought about it before but I suppose the difference
is between declarative and descriptive, the latter being
more inviting and better allows for flow between sentences.
Otherwise you're writing in bullet points. So it is a
question of balance between specification and narration.
In regular prose you're always going to see the "the"
unless the sentence starts with the name. The trouble
is that we can't start sentences with function names
because of capitalization confusion.

I hope that 3 patches will make review easier.

Not really. I'm reviewing the 3 patches put together rather than each
one individually, which would require more work.

I figured with e.g. a separate patch for moving and alphabetizing
that it'd be easier to check that nothing got lost. Anyhow,
Just one patch this time.

convert: I'd merge the 2 first sentences to state that if convert
from X to Y. The doc does not say explicitely what happens if a
character cannot be converted. After testing, an error is raised. The
example comment could add ", if possible".

Done. Good idea. I reworked the whole paragraph to shorten and
clarify since I was altering it anyway. This does introduce
some inconsistency with wording that appears elsewhere but it seems
worth it because the listentry box was getting overfull.

to_hex: add "." at the end of the sentence?

I left as-is, without a ".". The only consistent rule about
periods in the listentrys seems to be that if there's more than
one sentence then there's periods -- I think. In any case a
lot of them don't have periods and probably don't need
periods. I don't know what to do and since the original did
not have a period it seems better to leave well enough alone.

Minor comment: you usually put two spaces between a "." and the first
world of then next sentence, but not always.

I now always put 2 spaces after the end of a sentence. But
I've only done this where I've changed text, not when
moving pre-existing text around. Again, there seems
to be no consistency in the original. (I believe docbook
redoes all inter-sentence spacing anyway.)

Thanks for the help.

I'll be sure to sign up to review a patch (or patches) when life
permits.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v10.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a25c122ac8..131a63b36c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1795,48 +1795,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>convert_to</primary>
         </indexterm>
         <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
@@ -1844,7 +1802,8 @@
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
+        Convert string to <parameter>dest_encoding</parameter>.  See <xref
+        linkend="conversion-names"/> for available conversions.
        </entry>
        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
@@ -1855,39 +1814,24 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
+        <link linkend="encoding-options">Options
+        for <parameter>format</parameter></link> are same as
+        in <function>encode</function>.
        </entry>
        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
        <entry><literal>\x3132330001</literal></entry>
       </row>
 
       <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
-      <row>
        <entry id="format">
         <indexterm>
          <primary>format</primary>
@@ -1955,19 +1899,6 @@
       </row>
 
       <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>lpad</primary>
@@ -2133,18 +2064,6 @@
       </row>
 
       <row>
-       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Coerce the given value to text and then quote it as a literal;
-        or, if the argument is null, return <literal>NULL</literal>.
-        Embedded single-quotes and backslashes are properly doubled.
-       </entry>
-       <entry><literal>quote_nullable(42.5)</literal></entry>
-       <entry><literal>'42.5'</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>regexp_match</primary>
@@ -2417,22 +2336,6 @@
       <row>
        <entry>
         <indexterm>
-         <primary>to_hex</primary>
-        </indexterm>
-        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
-        or <type>bigint</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
-        representation
-       </entry>
-       <entry><literal>to_hex(2147483647)</literal></entry>
-       <entry><literal>7fffffff</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
          <primary>translate</primary>
         </indexterm>
         <literal><function>translate(<parameter>string</parameter> <type>text</type>,
@@ -2468,6 +2371,89 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3466,7 +3452,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, and a few functions
+    which produce strings from other binary inputs.
    </para>
 
    <para>
@@ -3629,47 +3616,81 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         Remove the longest string containing only bytes appearing in
         <parameter>bytes</parameter> from the start and end of
         <parameter>string</parameter>
-      </entry>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>decode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert string from <parameter>src_encoding</parameter>
+        to <parameter>dest_encoding</parameter>, or raise an error.  See
+        <xref linkend="conversion-names"/> for the predefined conversions.
+        The <parameter>string</parameter> must be valid
+        in <parameter>src_encoding</parameter>.  New conversions may be
+        defined with <command>CREATE CONVERSION</command>.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
-     <row>
-      <entry>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>convert_from</primary>
+        </indexterm>
+        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Convert string to the database encoding.  The original encoding
+        is specified by <parameter>src_encoding</parameter>. The
+        <parameter>string</parameter> must be valid in this encoding.  See
+        <xref linkend="conversion-names"/> for available conversions.
+       </entry>
+       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+      </row>
+
+      <row>
+       <entry>
         <indexterm>
          <primary>encode</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
+        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
+        <parameter>format</parameter> <type>text</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Encode binary data into a textual representation.  Supported
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>.
+       </entry>
+       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+       <entry><literal>MTIzAAE=</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3701,45 +3722,70 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>string</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>string</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose', 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>string</parameter>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Calculates the MD5 hash of <parameter>string</parameter>,
+        returning the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
+
+      <row>
+       <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Coerce the given value to text and then quote it as a literal;
+        or, if the argument is null, return <literal>NULL</literal>.
+        Embedded single-quotes and backslashes are properly doubled.
+       </entry>
+       <entry><literal>quote_nullable(42.5)</literal></entry>
+       <entry><literal>'42.5'</literal></entry>
+      </row>
 
       <row>
        <entry>
@@ -3832,6 +3878,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>sha512('abc')</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>to_hex</primary>
+        </indexterm>
+        <literal><function>to_hex(<parameter>number</parameter> <type>int</type>
+        or <type>bigint</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal
+        representation
+       </entry>
+       <entry><literal>to_hex(2147483647)</literal></entry>
+       <entry><literal>7fffffff</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>

#21

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Karl O. Pinc (#20)

Re: Patch to document base64 encoding

Hello Karl,

It really works in research papers: "Theorem X can be proven by
applying Proposition Y. See Figure 2 for details. Algorithm Z
describes whatever, which is listed in Table W..."

I've not thought about it before but I suppose the difference is between
declarative and descriptive, the latter being more inviting and better
allows for flow between sentences. Otherwise you're writing in bullet
points. So it is a question of balance between specification and
narration. In regular prose you're always going to see the "the" unless
the sentence starts with the name. The trouble is that we can't start
sentences with function names because of capitalization confusion.

Sure. For me "Function" would work as a title on its name, as in "Sir
Samuel", "Doctor Frankenstein", "Mister Bean", "Professor Layton"...
"Function sqrt" and solves the casing issue on the function name which is
better not capitalized.

--
Fabien.

#22

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Karl O. Pinc (#20)

Re: Patch to document base64 encoding

Hello Karl,

Attached is doc_base64_v10.patch

Patch applies cleanly. Doc gen ok.

The patch clarifies the documentation about encode/decode and other
text/binary string conversion functions.

No further comments beyond the title thing (Function x) already discussed,
which is not a stopper.

Patch marked as ready.

--
Fabien.

#23

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Fabien COELHO (#22)

Re: Patch to document base64 encoding

On Mon, 15 Jul 2019 23:00:55 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The patch clarifies the documentation about encode/decode and other
text/binary string conversion functions.

Other notable changes:

Corrects categorization of functions as string or binary.

Reorders functions alphabetically by function name.

Thanks very much Fabien.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#24

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Karl O. Pinc (#23)

Re: Patch to document base64 encoding

"Karl O. Pinc" <kop@karlpinc.com> writes:

On Mon, 15 Jul 2019 23:00:55 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The patch clarifies the documentation about encode/decode and other
text/binary string conversion functions.

Other notable changes:
Corrects categorization of functions as string or binary.
Reorders functions alphabetically by function name.

So I took a look at this, expecting that after so much discussion it
ought to just be committable ... but I am befuddled by your choices
about which functions to move where. It seems entirely crazy that
encode() and decode() are no longer in the same section, likewise that
convert_from() and convert_to() aren't documented together anymore.
I'm not sure what is the right dividing line between string and binary
functions, but I don't think that anyone is going to find this
division helpful.

I do agree that documenting some functions twice is a bad plan,
so we need to clean this up somehow.

After some thought, it seems like maybe a workable approach would be
to consider that all conversion functions going between text and
bytea belong in the binary-string-functions section. I think it's
reasonable to say that plain "string functions" just means stuff
dealing with text.

Possibly we could make a separate table in the binary-functions
section just for conversions, although that feels like it might be
overkill.

While we're on the subject, Table 9.11 (conversion names) seems
entirely misplaced, and I don't just mean that it would need to
migrate to the binary-functions page. I don't think it belongs
in func.sgml at all. Isn't it pretty duplicative of Table 23.2
(Client/Server Character Set Conversions)? I think we should
unify it with that table, or at least put it next to that one.
Perhaps that's material for a separate patch though.

regards, tom lane

#25

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Tom Lane (#24)

Re: Patch to document base64 encoding

On Tue, 30 Jul 2019 11:40:03 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Karl O. Pinc" <kop@karlpinc.com> writes:

On Mon, 15 Jul 2019 23:00:55 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The patch clarifies the documentation about encode/decode and
other text/binary string conversion functions.

Other notable changes:
Corrects categorization of functions as string or binary.
Reorders functions alphabetically by function name.

So I took a look at this, expecting that after so much discussion it
ought to just be committable ...

It started simple, just changing the base64 function descriptions,
but critique drew in additional issues.

but I am befuddled by your choices
about which functions to move where.

The grouping is by the data type on which each function operates,
the data type of the input.

If there's to be 2 categories, as in the
existing docs, it seems to me that you have to categorize either by
the data type input or data type output. To categorize by input
and output together would result 4 (or more?) categories, which
would be even crazier.

It seems entirely crazy that
encode() and decode() are no longer in the same section, likewise that
convert_from() and convert_to() aren't documented together anymore.

Awkward, yes. But findable if you know what the categories are.

I suppose there could be 3 different categories: those that input
and output strings, those that input and output binary, and those
that convert -- inputting one data type and outputting another.

I'm not sure that this would really address the issue of documenting,
say encode() and decode() together. It pretty much makes sense to
alphabetize the functions _within_ each category, because that's
about the only easily defined way to do it. Going "by feel" and
putting encode() and decode() together raises the question of
where they should be together in the overall ordering within
the category.

I'm not sure what is the right dividing line between string and binary
functions, but I don't think that anyone is going to find this
division helpful.

Maybe there's a way to make more clear what the categories are?
I could be explicit in the description of the section.

I do agree that documenting some functions twice is a bad plan,
so we need to clean this up somehow.

After some thought, it seems like maybe a workable approach would be
to consider that all conversion functions going between text and
bytea belong in the binary-string-functions section. I think it's
reasonable to say that plain "string functions" just means stuff
dealing with text.

Ok. Should the section title remain unchanged?
"Binary String Functions and Operators"

I think the summary description of the section will need
a little clarification.

Possibly we could make a separate table in the binary-functions
section just for conversions, although that feels like it might be
overkill.

I have no good answers. An advantage to a separate section
for conversions is that you _might_ be able to pair the functions,
so that encode() and decode() do show up right next to each other.

I'm not sure exactly how to structure "pairing". I would have to
play around and see what might look good.

While we're on the subject, Table 9.11 (conversion names) seems
entirely misplaced, and I don't just mean that it would need to
migrate to the binary-functions page. I don't think it belongs
in func.sgml at all. Isn't it pretty duplicative of Table 23.2
(Client/Server Character Set Conversions)? I think we should
unify it with that table, or at least put it next to that one.
Perhaps that's material for a separate patch though.

I don't know. But it does seem something that can be addressed
in isolation and suitable for it's own patch.

Thanks for the help. I will wait for a response to this
before submitting another patch, just in case someone sees any
ideas here to be followed up on.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#26

Fabien COELHO

coelho@cri.ensmp.fr

over 6 years ago

In reply to: Karl O. Pinc (#25)

Re: Patch to document base64 encoding

My 0.02 ï¿œ

It seems entirely crazy that encode() and decode() are no longer in the
same section, likewise that convert_from() and convert_to() aren't
documented together anymore.

Awkward, yes. But findable if you know what the categories are.

I suppose there could be 3 different categories: those that input
and output strings, those that input and output binary, and those
that convert -- inputting one data type and outputting another.

Personnaly, I'd be ok with having a separate "conversion function" table,
and also with Tom suggestion to have string functions with "only simple
string" functions, and if any binary appears it is moved into the binary
section.

--
Fabien.

#27

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Karl O. Pinc (#25)

Re: Patch to document base64 encoding

On Wed, Jul 31, 2019 at 6:27 AM Karl O. Pinc <kop@karlpinc.com> wrote:

On Tue, 30 Jul 2019 11:40:03 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:
[review]

Thanks for the help. I will wait for a response to this
before submitting another patch, just in case someone sees any
ideas here to be followed up on.

Based on the above, I set this back to "Waiting on Author", and moved
it to the September CF.

--
Thomas Munro
https://enterprisedb.com

#28

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Fabien COELHO (#26)

Re: Patch to document base64 encoding

On Tue, 30 Jul 2019 23:44:49 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Personnaly, I'd be ok with having a separate "conversion function"
table, and also with Tom suggestion to have string functions with
"only simple string" functions, and if any binary appears it is moved
into the binary section.

I'll make a "conversion function" table and put it in the binary
section.

But I'm not happy with putting any function that works with
bytea into the binary string section. This would mean moving,
say, length() out of the regular string section. There's a
lot of functions that work on both string and bytea inputs
and most (not all, see below) are functions that people
typically associate with string data.

What I think I'd like to do is add a column to the table
in the string section that says whether or not the function
works with both string and bytea. The result would be:

The hash functions (md5(), sha256(), etc.) would move
to the string section, because they work on both strings
and binary data. So the binary function table would
considerably shorten.

There would be a new table for conversions between
bytea and string (both directions). This would
be placed in the binary string section.

So the binary string section would be "just simple bytea", plus
conversion functions. Kind of the opposite of Tom's
suggestion.

Please let me know what you think. Thanks.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#29

Tom Lane

tgl@sss.pgh.pa.us

over 6 years ago

In reply to: Karl O. Pinc (#28)

Re: Patch to document base64 encoding

"Karl O. Pinc" <kop@karlpinc.com> writes:

But I'm not happy with putting any function that works with
bytea into the binary string section. This would mean moving,
say, length() out of the regular string section. There's a
lot of functions that work on both string and bytea inputs
and most (not all, see below) are functions that people
typically associate with string data.

Well, there are two different length() functions --- length(text)
and length(bytea) are entirely different things, they don't even
measure in the same units. I think documenting them separately
is the right thing to do. I don't really have a problem with
repeating the entries for other functions that exist in both
text and bytea variants, either. There aren't that many.

What I think I'd like to do is add a column to the table
in the string section that says whether or not the function
works with both string and bytea.

Meh. Seems like what that would mostly do is ensure that
neither page is understandable on its own.

regards, tom lane

#30

Chapman Flack

chap@anastigmatix.net

over 6 years ago

In reply to: Karl O. Pinc (#28)

Re: Patch to document base64 encoding

On 8/2/19 10:32 AM, Karl O. Pinc wrote:

But I'm not happy with putting any function that works with
bytea into the binary string section. This would mean moving,
say, length() out of the regular string section.

I'm not sure why. The bytea section already has an entry for its
length() function.

There are also length() functions for bit, character, lseg, path,
tsvector....

I don't really think of those as "a length() function" that works
on all those types. I think of a variety of types, many of which
offer a length() function.

That seems to be reflected in the way the docs are arranged.

Regards,
-Chap

#31

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Tom Lane (#29)

Re: Patch to document base64 encoding

On Fri, 02 Aug 2019 10:44:43 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't really have a problem with
repeating the entries for other functions that exist in both
text and bytea variants, either.

Ok. Thanks. I'll repeat entries then.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#32

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Karl O. Pinc (#31)

Re: Patch to document base64 encoding

On 2019-Aug-02, Karl O. Pinc wrote:

On Fri, 02 Aug 2019 10:44:43 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't really have a problem with
repeating the entries for other functions that exist in both
text and bytea variants, either.

Ok. Thanks. I'll repeat entries then.

Hello Karl,

Are you submitting an updated version soon?

Tom, you're still listed as committer for this patch. Just a heads up.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#33

Karl O. Pinc

kop@karlpinc.com

over 6 years ago

In reply to: Alvaro Herrera (#32)

Re: Patch to document base64 encoding

Hi Alvaro,

On Mon, 2 Sep 2019 13:56:28 -0400
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Are you submitting an updated version soon?

I don't expect to be able to make a new patch for at
least another week.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#34

Karl O. Pinc

kop@meme.com

about 6 years ago

In reply to: Karl O. Pinc (#33)

1 attachment(s)

Re: Patch to document base64 encoding

Hi,

Attached is doc_base64_v11.patch

This addresses Tom's concerns. Functions
that operate on both strings and bytea
(e.g. length(text) and length(bytea))
are documented separately, one with
string functions and one with binary
string functions.

In this iteration I have also:

Added a sub-section for the functions
which convert between text and bytea.

Added some index entries.

Provided a link in the hash functions to
the text about why md5() returns text
not bytea.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v11.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..aac9065e16 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1432,6 +1432,13 @@
    </para>
 
    <para>
+     Functions which convert, both to and from, strings and
+     the <type>bytea</type> type
+     are <link linkend="functions-convertingstringbinary">documented
+     separately</link>.
+   </para>
+
+   <para>
     <acronym>SQL</acronym> defines some string functions that use
     key words, rather than commas, to separate
     arguments.  Details are in
@@ -1793,101 +1800,6 @@
       </row>
 
       <row>
-       <entry>
-        <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_to</primary>
-        </indexterm>
-        <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
-       </entry>
-       <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
-       <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-        <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
-       </entry>
-       <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
-       <entry><literal>\x3132330001</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
-      <row>
        <entry id="format">
         <indexterm>
          <primary>format</primary>
@@ -1955,19 +1867,6 @@
       </row>
 
       <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
-      <row>
        <entry>
         <indexterm>
          <primary>lpad</primary>
@@ -2014,7 +1913,7 @@
         </indexterm>
         <literal><function>md5(<parameter>string</parameter>)</function></literal>
        </entry>
-       <entry><type>text</type></entry>
+       <entry><type>text</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         Calculates the MD5 hash of <parameter>string</parameter>,
         returning the result in hexadecimal
@@ -2333,6 +2232,66 @@
       <row>
        <entry>
         <indexterm>
+         <primary>sha224</primary>
+        </indexterm>
+        <literal><function>sha224(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-224 hash
+       </entry>
+       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha256</primary>
+        </indexterm>
+        <literal><function>sha256(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-256 hash
+       </entry>
+       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha384</primary>
+        </indexterm>
+        <literal><function>sha384(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-384 hash
+       </entry>
+       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha512</primary>
+        </indexterm>
+        <literal><function>sha512(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-512 hash
+       </entry>
+       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>split_part</primary>
         </indexterm>
         <literal><function>split_part(<parameter>string</parameter> <type>text</type>,
@@ -3466,7 +3425,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, functions which produce
+    strings from other binary inputs, and functions which convert
+    between <type>bytea</type> and strings.
    </para>
 
    <para>
@@ -3502,8 +3463,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
      <tbody>
       <row>
-       <entry><literal><parameter>string</parameter> <literal>||</literal>
-        <parameter>string</parameter></literal></entry>
+       <entry><literal><parameter>bytea</parameter> <literal>||</literal>
+        <parameter>bytea</parameter></literal></entry>
        <entry> <type>bytea</type> </entry>
        <entry>
         String concatenation
@@ -3519,9 +3480,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
       <row>
        <entry>
         <indexterm>
+         <primary>bit_length</primary>
+        </indexterm>
+        <literal><function>bit_length(<parameter>bytea</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>Number of bits in binary string</entry>
+       <entry><literal>bit_length('jo\000se':bytea)</literal></entry>
+       <entry><literal>40</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
          <primary>octet_length</primary>
         </indexterm>
-        <literal><function>octet_length(<parameter>string</parameter>)</function></literal>
+        <literal><function>octet_length(<parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Number of bytes in binary string</entry>
@@ -3534,7 +3508,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>overlay</primary>
         </indexterm>
-        <literal><function>overlay(<parameter>string</parameter> placing <parameter>string</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>overlay(<parameter>bytea</parameter> placing <parameter>bytea</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3549,7 +3523,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>position</primary>
         </indexterm>
-        <literal><function>position(<parameter>substring</parameter> in <parameter>string</parameter>)</function></literal>
+        <literal><function>position(<parameter>byteasubstring</parameter> in <parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Location of specified substring</entry>
@@ -3562,7 +3536,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>substring</primary>
         </indexterm>
-        <literal><function>substring(<parameter>string</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>substring(<parameter>bytea</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3579,7 +3553,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>trim(<optional>both</optional>
         <parameter>bytes</parameter> from
-        <parameter>string</parameter>)</function></literal>
+        <parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3590,6 +3564,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>trim('\000\001'::bytea from '\000Tom\001'::bytea)</literal></entry>
        <entry><literal>Tom</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <literal><function>trim(<optional>leading | trailing
+        | both</optional> <optional>from</optional>
+        <parameter>bytea</parameter>
+        <optional>, <parameter>bytes</parameter></optional>
+        )</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Non-standard syntax for <function>trim()</function>
+       </entry>
+       <entry><literal>trim(both from 'yxTomxx'::bytea, 'xyz'::bytea)</literal></entry>
+       <entry><literal>Tom</literal></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -3629,58 +3619,44 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         Remove the longest string containing only bytes appearing in
         <parameter>bytes</parameter> from the start and end of
         <parameter>string</parameter>
-      </entry>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
-
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert string from <parameter>src_encoding</parameter>
+        to <parameter>dest_encoding</parameter>, or raise an error.  See
+        <xref linkend="conversion-names"/> for the predefined conversions.
+        The <parameter>string</parameter> must be valid
+        in <parameter>src_encoding</parameter>.  New conversions may be
+        defined with <command>CREATE CONVERSION</command>.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>get_bit</primary>
         </indexterm>
-        <literal><function>get_bit(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_bit(<parameter>bytea</parameter>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract bit from string
+         Extract bit from binary string
        </entry>
        <entry><literal>get_bit('Th\000omas'::bytea, 45)</literal></entry>
        <entry><literal>1</literal></entry>
@@ -3691,67 +3667,80 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>get_byte</primary>
         </indexterm>
-        <literal><function>get_byte(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_byte(<parameter>bytea</parameter>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract byte from string
+        Extract byte from binary string
        </entry>
        <entry><literal>get_byte('Th\000omas'::bytea, 4)</literal></entry>
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>bytea</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>string</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>string</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose'::bytea, 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>string</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>text</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        Calculates the MD5 hash of <parameter>string</parameter>, returning
+        the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>set_bit</primary>
         </indexterm>
-        <literal><function>set_bit(<parameter>string</parameter>,
+        <literal><function>set_bit(<parameter>bytea</parameter>,
         <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set bit in string
+        Set bit in binary string
        </entry>
        <entry><literal>set_bit('Th\000omas'::bytea, 45, 0)</literal></entry>
        <entry><literal>Th\000omAs</literal></entry>
@@ -3762,12 +3751,12 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>set_byte</primary>
         </indexterm>
-        <literal><function>set_byte(<parameter>string</parameter>,
+        <literal><function>set_byte(<parameter>bytea</parameter>,
         <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set byte in string
+        Set byte in binary string
        </entry>
        <entry><literal>set_byte('Th\000omas'::bytea, 4, 64)</literal></entry>
        <entry><literal>Th\000o@as</literal></entry>
@@ -3780,11 +3769,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>sha224(<type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-224 hash
        </entry>
-       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>sha224('abc'::bytea)</literal></entry>
        <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
       </row>
 
@@ -3795,11 +3784,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>sha256(<type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-256 hash
        </entry>
-       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>sha256('abc'::bytea)</literal></entry>
        <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
       </row>
 
@@ -3810,11 +3799,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>sha384(<type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-384 hash
        </entry>
-       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>sha384('abc'::bytea)</literal></entry>
        <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
       </row>
 
@@ -3825,13 +3814,29 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>sha512(<type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
-        SHA-512 hash
+         SHA-512 hash
        </entry>
-       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>sha512('abc'::bytea)</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>substr</primary>
+        </indexterm>
+        <literal><function>substr(<parameter>bytea</parameter>, <parameter>from</parameter> <type>bytea</type> <optional>, <parameter>count</parameter></optional>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Extract substring (same as
+        <literal>substring(<parameter>bytea</parameter> from <parameter>from</parameter> <type>bytea</type> for <parameter>count</parameter>)</literal>)
+       </entry>
+       <entry><literal>substr('alphabet', 3, 2)</literal></entry>
+       <entry><literal>ph</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>
@@ -3844,7 +3849,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    the first byte, and bit 15 is the most significant bit of the second byte.
   </para>
 
-  <para>
+  <para id="functions-hashnote">
    Note that for historic reasons, the function <function>md5</function>
    returns a hex-encoded value of type <type>text</type> whereas the SHA-2
    functions return type <type>bytea</type>.  Use the functions
@@ -3858,6 +3863,210 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    <xref linkend="functions-aggregate"/> and the large object functions
    in <xref linkend="lo-funcs"/>.
   </para>
+
+  <sect2 id="functions-convertingstringbinary">
+   <title>String to Binary and Binary to String Conversion</title>
+
+   <indexterm>
+    <primary>function</primary>
+    <secondary>converting between string and binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>character string</primary>
+     <secondary>converting to binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>binary string</primary>
+     <secondary>converting to character string</secondary>
+   </indexterm>
+
+   <table id="functions-binarystringconversions">
+    <title>Binary/String Conversion Functions</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+       <entry>Example</entry>
+       <entry>Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_from</primary>
+         </indexterm>
+         <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
+         <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Convert string to the database encoding.  The original encoding
+         is specified by <parameter>src_encoding</parameter>. The
+         <parameter>string</parameter> must be valid in this encoding.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_to</primary>
+         </indexterm>
+         <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
+         <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Convert string to <parameter>dest_encoding</parameter>.  See <xref
+         linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
+        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+         <indexterm>
+          <primary>base64 encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>hex encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>escape encoding</primary>
+         </indexterm>
+         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Encode binary data into a textual representation.  Supported
+         formats are:
+         <link linkend="base64-encoding"><literal>base64</literal></link>,
+         <link linkend="escape-encoding"><literal>escape</literal></link>,
+         <link linkend="hex-encoding"><literal>hex</literal></link>.
+        </entry>
+        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+        <entry><literal>MTIzAAE=</literal></entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>decode</primary>
+         </indexterm>
+         <indexterm>
+           <primary>base64 encoding</primary>
+         </indexterm>
+         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Decode binary data from textual representation in <parameter>string</parameter>.
+         <link linkend="encoding-options">Options
+         for <parameter>format</parameter></link> are same as
+         in <function>encode</function>.
+        </entry>
+        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
+        <entry><literal>\x3132330001</literal></entry>
+       </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+  </sect2>
  </sect1>

#35

Fabien COELHO

coelho@cri.ensmp.fr

about 6 years ago

In reply to: Karl O. Pinc (#34)

Re: Patch to document base64 encoding

Hello Karl,

Attached is doc_base64_v11.patch

Patch applies cleanly and compiles.

I'm in favor of moving and reorganizing these function descriptions, as
they are somehow scattered with a unclear logic when you are looking for
them.

  +       <entry><literal><parameter>bytea</parameter> <literal>||</literal>
  +        <parameter>bytea</parameter></literal></entry>
          <entry> <type>bytea</type> </entry>
          <entry>
           String concatenation

Bytea concatenation?

I'm not keen on calling the parameter the name of its type. I'd suggest to
keep "string" as a name everywhere, which is not a type name in Pg.

The functions descriptions are not homogeneous. Some have parameter name &
type "btrim(string bytea, bytes bytea)" and others only type or parameter
with tagged as a parameter "get_bit(bytea, offset)" (first param),
"sha224(bytea)".

I'd suggest to be consistent, eg use "string bytea" everywhere
appropriate.

--
Fabien.

#36

Karl O. Pinc

kop@meme.com

about 6 years ago

In reply to: Fabien COELHO (#35)

1 attachment(s)

Re: Patch to document base64 encoding

Hello Fabien,

On Sun, 5 Jan 2020 12:48:59 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I'm in favor of moving and reorganizing these function descriptions,
as they are somehow scattered with a unclear logic when you are
looking for them.

I assume by this you mean you are happy with the organization done
by the patch.

For review (I think I've got this right) the organizational
changes are:

The changes suggested by Tom where 2
functions with the same name, one of which takes string
arguments and the other of which takes bytea arguments
now show up both in the doc section on string functions and
in the doc section on bytea functions.

I believe I also alphabetized the binary function ordering.

And this patch introduces a separate table/sub-section for
functions which convert between binary and string.)
There are much-expanded descriptions of encode()
and decode(). (Which is how this patch series
started, explaining base64 encoding/decoding.)

FYI. There is also an unusual usage of a hyperlinked
asterisk following the returned datatype of the hash
functions. The hyperlink leads to the historical
note on the datatype used for the md5() function v.s.
the other hash functions.

+       <entry><literal><parameter>bytea</parameter>
<literal>||</literal>
+        <parameter>bytea</parameter></literal></entry>
<entry> <type>bytea</type> </entry>
<entry>
String concatenation

Bytea concatenation?

Done. (Could just say "Concatenation" I suppose. But "Bytea
concatenation" does not hurt and would be nice if you
ever looked at the tables for string operators and bytea
operators side-by-side.)

I'm not keen on calling the parameter the name of its type. I'd
suggest to keep "string" as a name everywhere, which is not a type
name in Pg.

The functions descriptions are not homogeneous. Some have parameter
name & type "btrim(string bytea, bytes bytea)" and others only type
or parameter with tagged as a parameter "get_bit(bytea,
offset)" (first param), "sha224(bytea)".

I'd suggest to be consistent, eg use "string bytea" everywhere
appropriate.

Ok. Done. Except that I've left the encode() function
as encode(data bytea, format text) because the whole
point is to convert _to_ a string/text datatype
from something that's _not_ a string. Calling
the input a string just seems wrong. This inconsistency seems ok
because encode() is in the table of string <-> bytea functions,
away from the other bytea functions.

If you're interested, another possibility would be the
consistent use of "data bytea" everywhere. I like this
choice because it works well to write
encode(<parameter>data</parameter> bytea,
<parameter>format</parameter text), and probably
works well in other places too. But then the word
"string" does not really fit in a lot of the descriptions.
So this choice would involve re-writing descriptions so
that the existing description:

btrim(<parameter>string</parameter> bytea,
<parameter>bytes</parameter> bytea)

Remove the longest string containing only bytes appearing in
<parameter>bytes</parameter> from the start and end of
<parameter>string</parameter>

Would change to (say):

btrim(<parameter>data</parameter> bytea,
<parameter>bytes</parameter> bytea)

Remove the longest contiguous sequence of bytes containing only
those bytes appearing in <parameter>bytes</parameter>
from the start and end of <parameter>data</parameter>

The trouble with using "data bytea" is that there might
need to be adjustments to the word "string" elsewhere in
the section, not just in the descriptions.

Let me know if you'd prefer "data bytea" to "string bytea"
and consequent frobbing of descriptions. That might be
out-of-scope for this patch. (Which is already
a poster-child for feature-creep.)

Attached is doc_base64_v12.patch.

Regards,

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v12.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 57a1539506..2e45bf71c8 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1431,6 +1431,13 @@
     natively for the bit-string types.
    </para>
 
+   <para>
+     Functions which convert, both to and from, strings and
+     the <type>bytea</type> type
+     are <link linkend="functions-convertingstringbinary">documented
+     separately</link>.
+   </para>
+
    <para>
     <acronym>SQL</acronym> defines some string functions that use
     key words, rather than commas, to separate
@@ -1792,101 +1799,6 @@
        <entry><literal>abcde,2,22</literal></entry>
       </row>
 
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_to</primary>
-        </indexterm>
-        <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
-       </entry>
-       <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
-       <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-        <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
-       </entry>
-       <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
-       <entry><literal>\x3132330001</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
       <row>
        <entry id="format">
         <indexterm>
@@ -1954,19 +1866,6 @@
        <entry><literal>4</literal></entry>
       </row>
 
-      <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
       <row>
        <entry>
         <indexterm>
@@ -2014,7 +1913,7 @@
         </indexterm>
         <literal><function>md5(<parameter>string</parameter>)</function></literal>
        </entry>
-       <entry><type>text</type></entry>
+       <entry><type>text</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         Calculates the MD5 hash of <parameter>string</parameter>,
         returning the result in hexadecimal
@@ -2330,6 +2229,66 @@
        <entry><literal>test</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha224</primary>
+        </indexterm>
+        <literal><function>sha224(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-224 hash
+       </entry>
+       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha256</primary>
+        </indexterm>
+        <literal><function>sha256(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-256 hash
+       </entry>
+       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha384</primary>
+        </indexterm>
+        <literal><function>sha384(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-384 hash
+       </entry>
+       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha512</primary>
+        </indexterm>
+        <literal><function>sha512(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        SHA-512 hash
+       </entry>
+       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
@@ -3466,7 +3425,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, functions which produce
+    strings from other binary inputs, and functions which convert
+    between <type>bytea</type> and strings.
    </para>
 
    <para>
@@ -3502,11 +3463,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
      <tbody>
       <row>
-       <entry><literal><parameter>string</parameter> <literal>||</literal>
-        <parameter>string</parameter></literal></entry>
+       <entry><literal><parameter>bytea</parameter> <literal>||</literal>
+        <parameter>bytea</parameter></literal></entry>
        <entry> <type>bytea</type> </entry>
        <entry>
-        String concatenation
+        Bytea concatenation
         <indexterm>
          <primary>binary string</primary>
          <secondary>concatenation</secondary>
@@ -3516,12 +3477,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>\\Post'gres\000</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>bit_length</primary>
+        </indexterm>
+        <literal><function>bit_length(<parameter>bytea</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>Number of bits in binary string</entry>
+       <entry><literal>bit_length('jo\000se':bytea)</literal></entry>
+       <entry><literal>40</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
          <primary>octet_length</primary>
         </indexterm>
-        <literal><function>octet_length(<parameter>string</parameter>)</function></literal>
+        <literal><function>octet_length(<parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Number of bytes in binary string</entry>
@@ -3534,7 +3508,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>overlay</primary>
         </indexterm>
-        <literal><function>overlay(<parameter>string</parameter> placing <parameter>string</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>overlay(<parameter>bytea</parameter> placing <parameter>bytea</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3549,7 +3523,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>position</primary>
         </indexterm>
-        <literal><function>position(<parameter>substring</parameter> in <parameter>string</parameter>)</function></literal>
+        <literal><function>position(<parameter>byteasubstring</parameter> in <parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Location of specified substring</entry>
@@ -3562,7 +3536,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>substring</primary>
         </indexterm>
-        <literal><function>substring(<parameter>string</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>substring(<parameter>bytea</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3579,7 +3553,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         </indexterm>
         <literal><function>trim(<optional>both</optional>
         <parameter>bytes</parameter> from
-        <parameter>string</parameter>)</function></literal>
+        <parameter>bytea</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3590,6 +3564,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>trim('\000\001'::bytea from '\000Tom\001'::bytea)</literal></entry>
        <entry><literal>Tom</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <literal><function>trim(<optional>leading | trailing
+        | both</optional> <optional>from</optional>
+        <parameter>bytea</parameter>
+        <optional>, <parameter>bytes</parameter></optional>
+        )</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Non-standard syntax for <function>trim()</function>
+       </entry>
+       <entry><literal>trim(both from 'yxTomxx'::bytea, 'xyz'::bytea)</literal></entry>
+       <entry><literal>Tom</literal></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -3629,58 +3619,44 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         Remove the longest string containing only bytes appearing in
         <parameter>bytes</parameter> from the start and end of
         <parameter>string</parameter>
-      </entry>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
-
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert string from <parameter>src_encoding</parameter>
+        to <parameter>dest_encoding</parameter>, or raise an error.  See
+        <xref linkend="conversion-names"/> for the predefined conversions.
+        The <parameter>string</parameter> must be valid
+        in <parameter>src_encoding</parameter>.  New conversions may be
+        defined with <command>CREATE CONVERSION</command>.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>get_bit</primary>
         </indexterm>
-        <literal><function>get_bit(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_bit(<parameter>string</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract bit from string
+         Extract bit from binary string
        </entry>
        <entry><literal>get_bit('Th\000omas'::bytea, 45)</literal></entry>
        <entry><literal>1</literal></entry>
@@ -3691,67 +3667,80 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>get_byte</primary>
         </indexterm>
-        <literal><function>get_byte(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_byte(<parameter>string</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract byte from string
+        Extract byte from binary string
        </entry>
        <entry><literal>get_byte('Th\000omas'::bytea, 4)</literal></entry>
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>string</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>string</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>string</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose'::bytea, 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>string</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>text</type><link linkend="functions-hashnote">*</link></entry>
+       <entry>
+        Calculates the MD5 hash of <parameter>string</parameter>, returning
+        the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>set_bit</primary>
         </indexterm>
-        <literal><function>set_bit(<parameter>string</parameter>,
+        <literal><function>set_bit(<parameter>string</parameter> <type>bytea</type>,
         <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set bit in string
+        Set bit in binary string
        </entry>
        <entry><literal>set_bit('Th\000omas'::bytea, 45, 0)</literal></entry>
        <entry><literal>Th\000omAs</literal></entry>
@@ -3762,12 +3751,12 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>set_byte</primary>
         </indexterm>
-        <literal><function>set_byte(<parameter>string</parameter>,
+        <literal><function>set_byte(<parameter>string</parameter> <type>bytea</type>,
         <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set byte in string
+        Set byte in binary string
        </entry>
        <entry><literal>set_byte('Th\000omas'::bytea, 4, 64)</literal></entry>
        <entry><literal>Th\000o@as</literal></entry>
@@ -3778,13 +3767,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha224</primary>
         </indexterm>
-        <literal><function>sha224(<type>bytea</type>)</function></literal>
+        <literal><function>sha224(<parameter>string</parameter> <type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-224 hash
        </entry>
-       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>sha224('abc'::bytea)</literal></entry>
        <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
       </row>
 
@@ -3793,13 +3782,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha256</primary>
         </indexterm>
-        <literal><function>sha256(<type>bytea</type>)</function></literal>
+        <literal><function>sha256(<parameter>string</parameter> <type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-256 hash
        </entry>
-       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>sha256('abc'::bytea)</literal></entry>
        <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
       </row>
 
@@ -3808,13 +3797,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha384</primary>
         </indexterm>
-        <literal><function>sha384(<type>bytea</type>)</function></literal>
+        <literal><function>sha384(<parameter>string</parameter> <type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
         SHA-384 hash
        </entry>
-       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>sha384('abc'::bytea)</literal></entry>
        <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
       </row>
 
@@ -3823,15 +3812,31 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha512</primary>
         </indexterm>
-        <literal><function>sha512(<type>bytea</type>)</function></literal>
+        <literal><function>sha512(<parameter>string</parameter> <type>bytea</type>)</function></literal>
        </entry>
-       <entry><type>bytea</type></entry>
+       <entry><type>bytea</type><link linkend="functions-hashnote">*</link></entry>
        <entry>
-        SHA-512 hash
+         SHA-512 hash
        </entry>
-       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>sha512('abc'::bytea)</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>substr</primary>
+        </indexterm>
+        <literal><function>substr(<parameter>string</parameter> <type>bytea</type>, <parameter>from</parameter> <optional>, <parameter>count</parameter> </optional>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Extract substring (same as
+        <literal>substring(<parameter>bytea</parameter> from <parameter>from</parameter> for <parameter>count</parameter>)</literal>)
+       </entry>
+       <entry><literal>substr('alphabet', 3, 2)</literal></entry>
+       <entry><literal>ph</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>
@@ -3844,7 +3849,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    the first byte, and bit 15 is the most significant bit of the second byte.
   </para>
 
-  <para>
+  <para id="functions-hashnote">
    Note that for historic reasons, the function <function>md5</function>
    returns a hex-encoded value of type <type>text</type> whereas the SHA-2
    functions return type <type>bytea</type>.  Use the functions
@@ -3858,6 +3863,210 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    <xref linkend="functions-aggregate"/> and the large object functions
    in <xref linkend="lo-funcs"/>.
   </para>
+
+  <sect2 id="functions-convertingstringbinary">
+   <title>String to Binary and Binary to String Conversion</title>
+
+   <indexterm>
+    <primary>function</primary>
+    <secondary>converting between string and binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>character string</primary>
+     <secondary>converting to binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>binary string</primary>
+     <secondary>converting to character string</secondary>
+   </indexterm>
+
+   <table id="functions-binarystringconversions">
+    <title>Binary/String Conversion Functions</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+       <entry>Example</entry>
+       <entry>Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_from</primary>
+         </indexterm>
+         <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
+         <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Convert string to the database encoding.  The original encoding
+         is specified by <parameter>src_encoding</parameter>. The
+         <parameter>string</parameter> must be valid in this encoding.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_to</primary>
+         </indexterm>
+         <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
+         <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Convert string to <parameter>dest_encoding</parameter>.  See <xref
+         linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
+        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+         <indexterm>
+          <primary>base64 encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>hex encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>escape encoding</primary>
+         </indexterm>
+         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Encode binary data into a textual representation.  Supported
+         formats are:
+         <link linkend="base64-encoding"><literal>base64</literal></link>,
+         <link linkend="escape-encoding"><literal>escape</literal></link>,
+         <link linkend="hex-encoding"><literal>hex</literal></link>.
+        </entry>
+        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+        <entry><literal>MTIzAAE=</literal></entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>decode</primary>
+         </indexterm>
+         <indexterm>
+           <primary>base64 encoding</primary>
+         </indexterm>
+         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Decode binary data from textual representation in <parameter>string</parameter>.
+         <link linkend="encoding-options">Options
+         for <parameter>format</parameter></link> are same as
+         in <function>encode</function>.
+        </entry>
+        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
+        <entry><literal>\x3132330001</literal></entry>
+       </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+  </sect2>
  </sect1>

#37

Karl O. Pinc

kop@karlpinc.com

about 6 years ago

In reply to: Karl O. Pinc (#36)

Re: Patch to document base64 encoding

On Mon, 6 Jan 2020 01:35:00 -0600
"Karl O. Pinc" <kop@meme.com> wrote:

On Sun, 5 Jan 2020 12:48:59 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

I'm not keen on calling the parameter the name of its type. I'd
suggest to keep "string" as a name everywhere, which is not a type
name in Pg.

The functions descriptions are not homogeneous. Some have parameter
name & type "btrim(string bytea, bytes bytea)" and others only type
or parameter with tagged as a parameter "get_bit(bytea,
offset)" (first param), "sha224(bytea)".

I'd suggest to be consistent, eg use "string bytea" everywhere
appropriate.

Ok. Done.

If you're interested, another possibility would be the
consistent use of "data bytea" everywhere.

But then the word
"string" does not really fit in a lot of the descriptions.
So this choice would involve re-writing descriptions

...

The trouble with using "data bytea" is that there might
need to be adjustments to the word "string" elsewhere in
the section, not just in the descriptions.

Let me know if you'd prefer "data bytea" to "string bytea"
and consequent frobbing of descriptions. That might be
out-of-scope for this patch. (Which is already
a poster-child for feature-creep.)

Another option would be to use "bytes bytea".
(The current patch uses "string bytea".)
This would probably also require some re-wording throughout.

Please let me know your preference. Thanks.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#38

Fabien COELHO

coelho@cri.ensmp.fr

about 6 years ago

In reply to: Karl O. Pinc (#37)

Re: Patch to document base64 encoding

Hello Karl,

Another option would be to use "bytes bytea".

(The current patch uses "string bytea".)
This would probably also require some re-wording throughout.

Please let me know your preference.

I like it, but this is only my own limited opinion, and I'm not a native
English speaker.

--
Fabien.

#39

Karl O. Pinc

kop@karlpinc.com

about 6 years ago

In reply to: Fabien COELHO (#38)

1 attachment(s)

Re: Patch to document base64 encoding

On Thu, 9 Jan 2020 08:27:28 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Another option would be to use "bytes bytea".

(The current patch uses "string bytea".)
This would probably also require some re-wording throughout.

I like it, but this is only my own limited opinion, and I'm not a
native English speaker.

Per your request for consistency I made this change throughout
the entire binary string section.

New patch attached: doc_base64_v13.patch

This required surprisingly little re-wording.
Added word "binary" into the descriptions of convert(),
substring(), convert_from(), and convert_to().

I also added data types to the call syntax of set_bit()
and set_byte().

And this patch adds hyperlinks from the get_bit(), get_byte(),
set_bit(), and set_byte() descriptions to the note
that offsets are zero-based.

I also removed the hyperlinked asterisks about the hash
function results and instead hyperlinked the word "hash"
in the descriptions. (Links to the note about md5()
returning hex text and the others returning bytea and how
to convert between the two.)

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v13.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b42f12862..3bd7232814 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1459,6 +1459,13 @@
     natively for the bit-string types.
    </para>
 
+   <para>
+     Functions which convert, both to and from, strings and
+     the <type>bytea</type> type
+     are <link linkend="functions-convertingstringbinary">documented
+     separately</link>.
+   </para>
+
    <para>
     <acronym>SQL</acronym> defines some string functions that use
     key words, rather than commas, to separate
@@ -1820,101 +1827,6 @@
        <entry><literal>abcde,2,22</literal></entry>
       </row>
 
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_to</primary>
-        </indexterm>
-        <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
-       </entry>
-       <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
-       <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-        <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
-       </entry>
-       <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
-       <entry><literal>\x3132330001</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
       <row>
        <entry id="format">
         <indexterm>
@@ -1982,19 +1894,6 @@
        <entry><literal>4</literal></entry>
       </row>
 
-      <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
       <row>
        <entry>
         <indexterm>
@@ -2044,8 +1943,8 @@
        </entry>
        <entry><type>text</type></entry>
        <entry>
-        Calculates the MD5 hash of <parameter>string</parameter>,
-        returning the result in hexadecimal
+        Calculates the MD5 <link linkend="functions-hashnote">hash</link>
+        of <parameter>string</parameter>, returning the result in hexadecimal
        </entry>
        <entry><literal>md5('abc')</literal></entry>
        <entry><literal>900150983cd24fb0 d6963f7d28e17f72</literal></entry>
@@ -2358,6 +2257,66 @@
        <entry><literal>test</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha224</primary>
+        </indexterm>
+        <literal><function>sha224(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-224 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha256</primary>
+        </indexterm>
+        <literal><function>sha256(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-256 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha384</primary>
+        </indexterm>
+        <literal><function>sha384(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-384 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha512</primary>
+        </indexterm>
+        <literal><function>sha512(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-512 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
@@ -3494,7 +3453,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, functions which produce
+    strings from other binary inputs, and functions which convert
+    between <type>bytea</type> and strings.
    </para>
 
    <para>
@@ -3530,11 +3491,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
      <tbody>
       <row>
-       <entry><literal><parameter>string</parameter> <literal>||</literal>
-        <parameter>string</parameter></literal></entry>
+       <entry><literal><parameter>bytes</parameter> <literal>||</literal>
+        <parameter>bytes</parameter></literal></entry>
        <entry> <type>bytea</type> </entry>
        <entry>
-        String concatenation
+        Bytea concatenation
         <indexterm>
          <primary>binary string</primary>
          <secondary>concatenation</secondary>
@@ -3544,12 +3505,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>\\Post'gres\000</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>bit_length</primary>
+        </indexterm>
+        <literal><function>bit_length(<parameter>bytes</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>Number of bits in binary string</entry>
+       <entry><literal>bit_length('jo\000se':bytea)</literal></entry>
+       <entry><literal>40</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
          <primary>octet_length</primary>
         </indexterm>
-        <literal><function>octet_length(<parameter>string</parameter>)</function></literal>
+        <literal><function>octet_length(<parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Number of bytes in binary string</entry>
@@ -3562,7 +3536,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>overlay</primary>
         </indexterm>
-        <literal><function>overlay(<parameter>string</parameter> placing <parameter>string</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>overlay(<parameter>bytes</parameter> placing <parameter>bytes</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3577,7 +3551,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>position</primary>
         </indexterm>
-        <literal><function>position(<parameter>substring</parameter> in <parameter>string</parameter>)</function></literal>
+        <literal><function>position(<parameter>bytesubstring</parameter> in <parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Location of specified substring</entry>
@@ -3590,7 +3564,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>substring</primary>
         </indexterm>
-        <literal><function>substring(<parameter>string</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>substring(<parameter>bytes</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3606,18 +3580,34 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
          <primary>trim</primary>
         </indexterm>
         <literal><function>trim(<optional>both</optional>
-        <parameter>bytes</parameter> from
-        <parameter>string</parameter>)</function></literal>
+        <parameter>bytesremoved</parameter> from
+        <parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Remove the longest string containing only bytes appearing in
-        <parameter>bytes</parameter> from the start
-        and end of <parameter>string</parameter>
+        <parameter>bytesremoved</parameter> from the start
+        and end of <parameter>bytes</parameter>
        </entry>
        <entry><literal>trim('\000\001'::bytea from '\000Tom\001'::bytea)</literal></entry>
        <entry><literal>Tom</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <literal><function>trim(<optional>leading | trailing
+        | both</optional> <optional>from</optional>
+        <parameter>bytes</parameter>
+        <optional>, <parameter>bytesremoved</parameter></optional>
+        )</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Non-standard syntax for <function>trim()</function>
+       </entry>
+       <entry><literal>trim(both from 'yxTomxx'::bytea, 'xyz'::bytea)</literal></entry>
+       <entry><literal>Tom</literal></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -3649,66 +3639,53 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>btrim</primary>
         </indexterm>
-        <literal><function>btrim(<parameter>string</parameter>
-        <type>bytea</type>, <parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+        <literal><function>btrim(<parameter>bytes</parameter>
+        <type>bytea</type>, <parameter>bytesremoved</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Remove the longest string containing only bytes appearing in
-        <parameter>bytes</parameter> from the start and end of
-        <parameter>string</parameter>
-      </entry>
+        <parameter>bytesremoved</parameter> from the start and end of
+        <parameter>bytes</parameter>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
-
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert binary string from <parameter>src_encoding</parameter>
+        to <parameter>dest_encoding</parameter>, or raise an error.  See
+        <xref linkend="conversion-names"/> for the predefined conversions.
+        The <parameter>bytes</parameter> must be valid
+        in <parameter>src_encoding</parameter>.  New conversions may be
+        defined with <command>CREATE CONVERSION</command>.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>get_bit</primary>
         </indexterm>
-        <literal><function>get_bit(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_bit(<parameter>bytes</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract bit from string
+        <link linkend="functions-zerobased-note">Extract</link> bit from binary
+        string
        </entry>
        <entry><literal>get_bit('Th\000omas'::bytea, 45)</literal></entry>
        <entry><literal>1</literal></entry>
@@ -3719,67 +3696,81 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>get_byte</primary>
         </indexterm>
-        <literal><function>get_byte(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_byte(<parameter>bytes</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract byte from string
+        <link linkend="functions-zerobased-note">Extract</link> byte from binary string
        </entry>
        <entry><literal>get_byte('Th\000omas'::bytea, 4)</literal></entry>
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>bytes</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>bytes</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose'::bytea, 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Calculates the MD5 <link linkend="functions-hashnote">hash</link>
+        of <parameter>bytes</parameter>, returning the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>set_bit</primary>
         </indexterm>
-        <literal><function>set_bit(<parameter>string</parameter>,
-        <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
+        <literal><function>set_bit(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>int</type>,
+        <parameter>newvalue</parameter> <type>int</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set bit in string
+        <link linkend="functions-zerobased-note">Set</link> bit in binary string
        </entry>
        <entry><literal>set_bit('Th\000omas'::bytea, 45, 0)</literal></entry>
        <entry><literal>Th\000omAs</literal></entry>
@@ -3790,12 +3781,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>set_byte</primary>
         </indexterm>
-        <literal><function>set_byte(<parameter>string</parameter>,
-        <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
+        <literal><function>set_byte(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>int</type>,
+        <parameter>newvalue</parameter> <type>int</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set byte in string
+        <link linkend="functions-zerobased-note">Set</link> byte in binary string
        </entry>
        <entry><literal>set_byte('Th\000omas'::bytea, 4, 64)</literal></entry>
        <entry><literal>Th\000o@as</literal></entry>
@@ -3806,13 +3798,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha224</primary>
         </indexterm>
-        <literal><function>sha224(<type>bytea</type>)</function></literal>
+        <literal><function>sha224(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-224 hash
+        SHA-224 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>sha224('abc'::bytea)</literal></entry>
        <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
       </row>
 
@@ -3821,13 +3813,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha256</primary>
         </indexterm>
-        <literal><function>sha256(<type>bytea</type>)</function></literal>
+        <literal><function>sha256(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-256 hash
+        SHA-256 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>sha256('abc'::bytea)</literal></entry>
        <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
       </row>
 
@@ -3836,13 +3828,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha384</primary>
         </indexterm>
-        <literal><function>sha384(<type>bytea</type>)</function></literal>
+        <literal><function>sha384(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-384 hash
+        SHA-384 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>sha384('abc'::bytea)</literal></entry>
        <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
       </row>
 
@@ -3851,20 +3843,36 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha512</primary>
         </indexterm>
-        <literal><function>sha512(<type>bytea</type>)</function></literal>
+        <literal><function>sha512(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-512 hash
+         SHA-512 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>sha512('abc'::bytea)</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>substr</primary>
+        </indexterm>
+        <literal><function>substr(<parameter>bytes</parameter> <type>bytea</type>, <parameter>from</parameter> <optional>, <parameter>count</parameter> </optional>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Extract binary substring (same as
+        <literal>substring(<parameter>bytea</parameter> from <parameter>from</parameter> for <parameter>count</parameter>)</literal>)
+       </entry>
+       <entry><literal>substr('alphabet', 3, 2)</literal></entry>
+       <entry><literal>ph</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>
 
-  <para>
+  <para id="functions-zerobased-note">
    <function>get_byte</function> and <function>set_byte</function> number the first byte
    of a binary string as byte 0.
    <function>get_bit</function> and <function>set_bit</function> number bits from the
@@ -3872,7 +3880,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    the first byte, and bit 15 is the most significant bit of the second byte.
   </para>
 
-  <para>
+  <para id="functions-hashnote">
    Note that for historic reasons, the function <function>md5</function>
    returns a hex-encoded value of type <type>text</type> whereas the SHA-2
    functions return type <type>bytea</type>.  Use the functions
@@ -3886,6 +3894,210 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    <xref linkend="functions-aggregate"/> and the large object functions
    in <xref linkend="lo-funcs"/>.
   </para>
+
+  <sect2 id="functions-convertingstringbinary">
+   <title>String to Binary and Binary to String Conversion</title>
+
+   <indexterm>
+    <primary>function</primary>
+    <secondary>converting between string and binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>character string</primary>
+     <secondary>converting to binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>binary string</primary>
+     <secondary>converting to character string</secondary>
+   </indexterm>
+
+   <table id="functions-binarystringconversions">
+    <title>Binary/String Conversion Functions</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+       <entry>Example</entry>
+       <entry>Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_from</primary>
+         </indexterm>
+         <literal><function>convert_from(<parameter>bytes</parameter> <type>bytea</type>,
+         <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Convert binary string to the database encoding.  The original encoding
+         is specified by <parameter>src_encoding</parameter>. The
+         <parameter>bytes</parameter> must be valid in this encoding.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_to</primary>
+         </indexterm>
+         <literal><function>convert_to(<parameter>bytes</parameter> <type>text</type>,
+         <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Convert binary string to <parameter>dest_encoding</parameter>.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
+        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+         <indexterm>
+          <primary>base64 encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>hex encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>escape encoding</primary>
+         </indexterm>
+         <literal><function>encode(<parameter>bytes</parameter> <type>bytea</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Encode binary data into a textual representation.  Supported
+         formats are:
+         <link linkend="base64-encoding"><literal>base64</literal></link>,
+         <link linkend="escape-encoding"><literal>escape</literal></link>,
+         <link linkend="hex-encoding"><literal>hex</literal></link>.
+        </entry>
+        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+        <entry><literal>MTIzAAE=</literal></entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>decode</primary>
+         </indexterm>
+         <indexterm>
+           <primary>base64 encoding</primary>
+         </indexterm>
+         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Decode binary data from textual representation in <parameter>string</parameter>.
+         <link linkend="encoding-options">Options
+         for <parameter>format</parameter></link> are same as
+         in <function>encode</function>.
+        </entry>
+        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
+        <entry><literal>\x3132330001</literal></entry>
+       </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+  </sect2>
  </sect1>

#40

Fabien COELHO

coelho@cri.ensmp.fr

almost 6 years ago

In reply to: Karl O. Pinc (#39)

Re: Patch to document base64 encoding

Hello Karl,

New patch attached: doc_base64_v13.patch

This required surprisingly little re-wording.
Added word "binary" into the descriptions of convert(),
substring(), convert_from(), and convert_to().

I also added data types to the call syntax of set_bit()
and set_byte().

And this patch adds hyperlinks from the get_bit(), get_byte(),
set_bit(), and set_byte() descriptions to the note
that offsets are zero-based.

I also removed the hyperlinked asterisks about the hash
function results and instead hyperlinked the word "hash"
in the descriptions. (Links to the note about md5()
returning hex text and the others returning bytea and how
to convert between the two.)

Patch applies cleanly and compiles.

My 0.02€: The overall restructuration and cross references is an
improvement.

Some comments about v13:

The note about get_byte reads:

get_byte and set_byte number the first byte of a binary string as byte
0. get_bit and set_bit number bits from the right within each byte; for
example bit 0 is the least significant bit of the first byte, and bit 15
is the most significant bit of the second byte.

The two sentences starts with a lower case letter, which looks strange to
me. I'd suggest to put "Functions" at the beginning of the sentences:

Functions get_byte and set_byte number the first byte of a binary string
as byte 0. Functions get_bit and set_bit number bits from the right
within each byte; for example bit 0 is the least significant bit of the
first byte, and bit 15 is the most significant bit of the second byte.

The note about hash provides an example for getting the hex representation
out of sha*. I'd add an exemple to get the bytea representation from md5,
eg "DECODE(MD5('hello world'), 'hex')"…

Maybe the encode/decode in the note could be linked to the function
description? Well, they are just after, maybe it is not very useful.

The "Binary String Functions and Operators" 9.5 section has only one
subsection, "9.5.1", which is about at two thirds of the page. This
structure looks weird. ISTM that a subsection is missing for the beginning
of the page, or that the subsection should just be dropped, because it is
somehow redundant with the table title.

The "9.4" section has the same structural weirdness. Either remove the
subsection, or add some for the other parts?

--
Fabien.

#41

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Karl O. Pinc (#39)

Re: Patch to document base64 encoding

I just wanted to throw this in the archives; this doesn't need to affect
your patch.

Because of how the new tables look in the PDF docs, I thought it might
be a good time to research how to make each function-entry occupy two
rows: one for prototype, return type and description, and the other for
the example and its result. Below is a first cut of how you'd implement
that idea -- see colspec/spanspec/spanname ... only the output looks
almost as bad (though the benefit is that it doesn't overwrite cell
contents anymore).

I think we have two choices. One is to figure out how to make this
work (ie. make it pretty; maybe by using alternate cell backgrounds, say
one white and one very light gray; maybe by using thinner/thicker
inter-cell lines); the other is to forget tables altogether and format
the info in some completely different way.

<table id="functions-binarystringconversions">
<title>Binary/String Conversion Functions</title>
<tgroup cols="4">
<colspec colnum="1" colname="col1" colwidth="1*" />
<colspec colnum="2" colname="col2" colwidth="1*" />
<colspec colnum="3" colname="col3" colwidth="1*" />
<colspec colnum="4" colname="col4" colwidth="1*" />
<spanspec spanname="cols12" namest="col1" nameend="col2" />
<spanspec spanname="cols34" namest="col3" nameend="col4" />

<thead>
<row>
<entry spanname="cols12">Function</entry>
<entry>Return Type</entry>
<entry>Description</entry>
</row>
<row>
<entry spanname="cols12">Example</entry>
<entry spanname="cols34">Result</entry>
</row>
</thead>

<tbody>
<row>
<entry spanname="cols12">
<indexterm>
<primary>convert_from</primary>
</indexterm>
<literal><function>convert_from(<parameter>bytes</parameter> <type>bytea</type>,
<parameter>src_encoding</parameter> <type>name</type>)</function></literal>
</entry>
<entry><type>text</type></entry>
<entry>
Convert binary string to the database encoding. The original encoding
is specified by <parameter>src_encoding</parameter>. The
<parameter>bytes</parameter> must be valid in this encoding. See
<xref linkend="conversion-names"/> for available conversions.
</entry>
</row>
<row>
<entry spanname="cols12"><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
<entry spanname="cols34"><literal>text_in_utf8</literal> represented in the current database encoding</entry>
</row>

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#42

Karl O. Pinc

kop@karlpinc.com

almost 6 years ago

In reply to: Fabien COELHO (#40)

Re: Patch to document base64 encoding

On Thu, 16 Jan 2020 14:41:33 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

The "Binary String Functions and Operators" 9.5 section has only one
subsection, "9.5.1", which is about at two thirds of the page. This
structure looks weird. ISTM that a subsection is missing for the
beginning of the page, or that the subsection should just be dropped,
because it is somehow redundant with the table title.

The "9.4" section has the same structural weirdness. Either remove
the subsection, or add some for the other parts?

Hi Fabien,

cc-ing the folks who did the work on format(), who added a sub-section
9.4.1. The whole thread for that is here:
/messages/by-id/CAFj8pRBjMdAjybSZkczyez0x=FhC8WXvgR2wOYGuhrk1TUkraA@mail.gmail.com

I'm going to dis-agree with you on this. Yes, it's a little odd
to have only a single sub-section but it does not really bother me.

If there's a big/important enough chunk of information to present I like
seeing something in the table of contents. That's the "big thing"
to my mind.

I don't see a good way to get rid of "9.4.1. format". Adding
another sub-section heading above it just to have 2 seems
pointless.

I really want the "9.5.1 String to Binary and Binary to String
Conversion" to show up in the table of contents. Because it
is not at all obvious that "9.5. Binary String Functions and
Operators" is the place to look for conversions between
string and binary. Tom thought that merely having a separate
table for string<->binary functions "could be overkill"
so my impression right now is that having an entirely
separate section for these would be rejected.
(See: /messages/by-id/22540.1564501203@sss.pgh.pa.us)

Otherwise an entirely separate section might be the right approach.

The following *.1 sections
in the (devel version) documentation are "single sub-sections":

(Er, this is too much but once I started I figured I'd finish.)

5.10. Inheritance
5.10.1. Caveats
9.4. String Functions and Operators
9.4.1. format
9.30. Statistics Information Functions
9.30.1. Inspecting MCV Lists
15.4. Parallel Safety
15.4.1. Parallel Labeling for Functions and Aggregates
17. Installation from Source Code on Windows
17.1. Building with Visual C++ or the Microsoft Windows SDK
18.10. Secure TCP/IP Connections with GSSAPI Encryption
18.10.1. Basic Setup
30.2. Subscription
30.2.1. Replication Slot Management
30.5. Architecture
30.5.1. Initial Snapshot
37.13. User-Defined Types
37.13.1. TOAST Considerations
41. Procedural Languages
41.1. Installing Procedural Languages
50.5. Planner/Optimizer
50.5.1. Generating Possible Plans
52.3. SASL Authentication
52.3.1. SCRAM-SHA-256 Authentication
57. Writing a Table Sampling Method
57.1. Sampling Method Support Functions
58.1. Creating Custom Scan Paths
58.1.1. Custom Scan Path Callbacks
58.2. Creating Custom Scan Plans
58.2.1. Custom Scan Plan Callbacks
58.3. Executing Custom Scans
58.3.1. Custom Scan Execution Callbacks
64.4. Implementation
64.4.1. GiST Buffering Build
67.1. Introduction
67.1.1. Index Maintenance
68.6. Database Page Layout
68.6.1. Table Row Layout
G.2. Server Applications
pg_standby — supports the creation of a PostgreSQL warm standby server
I. The Source Code Repository
I.1. Getting the Source via Git
J.4. Documentation Authoring
J.4.1. Emacs
J.5. Style Guide
J.5.1. Reference Pages

I like that I can see these in the documentation.

FYI, the format sub-section, 9.4.1, was first mentioned
by Dean Rasheed in this email:
/messages/by-id/CAEZATCWLtRi-Vbh5k_2fYkOAPxas0wZh6a0brOohHtVOtHiddA@mail.gmail.com

"I'm thinking perhaps
format() should now have its own separate sub-section in the manual,
rather than trying to cram it's docs into a single table row."

There was never really any further discussion or objection to
having a separate sub-section.

Attaching a new patch to my next email, leaving off the
folks cc-ed regarding "9.4.1 format".

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#43

Karl O. Pinc

kop@karlpinc.com

almost 6 years ago

In reply to: Fabien COELHO (#40)

1 attachment(s)

Re: Patch to document base64 encoding

On Thu, 16 Jan 2020 14:41:33 +0100 (CET)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

Some comments about v13:

The note about get_byte reads:

get_byte and set_byte number the first byte of a binary string as
byte 0. get_bit and set_bit number bits from the right within each
byte; for example bit 0 is the least significant bit of the first
byte, and bit 15 is the most significant bit of the second byte.

The two sentences starts with a lower case letter, which looks
strange to me. I'd suggest to put "Functions" at the beginning of the
sentences:

Functions get_byte and set_byte number the first byte of a binary
string as byte 0. Functions get_bit and set_bit number bits from the
right within each byte; for example bit 0 is the least significant
bit of the first byte, and bit 15 is the most significant bit of the
second byte.

Excellent suggestion, done.

The note about hash provides an example for getting the hex
representation out of sha*. I'd add an exemple to get the bytea
representation from md5, eg "DECODE(MD5('hello world'), 'hex')"…

Ok. Done.

Maybe the encode/decode in the note could be linked to the function
description? Well, they are just after, maybe it is not very useful.

Can't hurt? Done.

Patch attached: doc_base64_v14.patch
Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Attachments:

doc_base64_v14.patchtext/x-patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 72072e7545..c075872364 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1459,6 +1459,13 @@
     natively for the bit-string types.
    </para>
 
+   <para>
+     Functions which convert, both to and from, strings and
+     the <type>bytea</type> type
+     are <link linkend="functions-convertingstringbinary">documented
+     separately</link>.
+   </para>
+
    <para>
     <acronym>SQL</acronym> defines some string functions that use
     key words, rather than commas, to separate
@@ -1820,101 +1827,6 @@
        <entry><literal>abcde,2,22</literal></entry>
       </row>
 
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert</primary>
-        </indexterm>
-        <literal><function>convert(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.  The
-        original encoding is specified by
-        <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-        Conversions can be defined by <command>CREATE CONVERSION</command>.
-        Also there are some predefined conversions. See <xref
-        linkend="conversion-names"/> for available conversions.
-       </entry>
-       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in Latin-1
-       encoding (ISO 8859-1)</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_from</primary>
-        </indexterm>
-        <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Convert string to the database encoding.  The original encoding
-        is specified by <parameter>src_encoding</parameter>. The
-        <parameter>string</parameter> must be valid in this encoding.
-       </entry>
-       <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
-       <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>convert_to</primary>
-        </indexterm>
-        <literal><function>convert_to(<parameter>string</parameter> <type>text</type>,
-        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Convert string to <parameter>dest_encoding</parameter>.
-       </entry>
-       <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
-       <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-        <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>bytea</type></entry>
-       <entry>
-        Decode binary data from textual representation in <parameter>string</parameter>.
-        Options for <parameter>format</parameter> are same as in <function>encode</function>.
-       </entry>
-       <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
-       <entry><literal>\x3132330001</literal></entry>
-      </row>
-
-      <row>
-       <entry>
-        <indexterm>
-         <primary>encode</primary>
-        </indexterm>
-        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-        <parameter>format</parameter> <type>text</type>)</function></literal>
-       </entry>
-       <entry><type>text</type></entry>
-       <entry>
-        Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
-       </entry>
-       <entry><literal>encode('123\000\001', 'base64')</literal></entry>
-       <entry><literal>MTIzAAE=</literal></entry>
-      </row>
-
       <row>
        <entry id="format">
         <indexterm>
@@ -1982,19 +1894,6 @@
        <entry><literal>4</literal></entry>
       </row>
 
-      <row>
-       <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>,
-        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
-       <entry><type>int</type></entry>
-       <entry>
-        Number of characters in <parameter>string</parameter> in the given
-        <parameter>encoding</parameter>. The <parameter>string</parameter>
-        must be valid in this encoding.
-       </entry>
-       <entry><literal>length('jose', 'UTF8')</literal></entry>
-       <entry><literal>4</literal></entry>
-      </row>
-
       <row>
        <entry>
         <indexterm>
@@ -2044,8 +1943,8 @@
        </entry>
        <entry><type>text</type></entry>
        <entry>
-        Calculates the MD5 hash of <parameter>string</parameter>,
-        returning the result in hexadecimal
+        Calculates the MD5 <link linkend="functions-hashnote">hash</link>
+        of <parameter>string</parameter>, returning the result in hexadecimal
        </entry>
        <entry><literal>md5('abc')</literal></entry>
        <entry><literal>900150983cd24fb0 d6963f7d28e17f72</literal></entry>
@@ -2358,6 +2257,66 @@
        <entry><literal>test</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha224</primary>
+        </indexterm>
+        <literal><function>sha224(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-224 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha256</primary>
+        </indexterm>
+        <literal><function>sha256(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-256 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha384</primary>
+        </indexterm>
+        <literal><function>sha384(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-384 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>sha512</primary>
+        </indexterm>
+        <literal><function>sha512(<type>string</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        SHA-512 <link linkend="functions-hashnote">hash</link>
+       </entry>
+       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
@@ -3494,7 +3453,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
    <para>
     This section describes functions and operators for examining and
-    manipulating values of type <type>bytea</type>.
+    manipulating values of type <type>bytea</type>, functions which produce
+    strings from other binary inputs, and functions which convert
+    between <type>bytea</type> and strings.
    </para>
 
    <para>
@@ -3530,11 +3491,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
 
      <tbody>
       <row>
-       <entry><literal><parameter>string</parameter> <literal>||</literal>
-        <parameter>string</parameter></literal></entry>
+       <entry><literal><parameter>bytes</parameter> <literal>||</literal>
+        <parameter>bytes</parameter></literal></entry>
        <entry> <type>bytea</type> </entry>
        <entry>
-        String concatenation
+        Bytea concatenation
         <indexterm>
          <primary>binary string</primary>
          <secondary>concatenation</secondary>
@@ -3544,12 +3505,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        <entry><literal>\\Post'gres\000</literal></entry>
       </row>
 
+      <row>
+       <entry>
+        <indexterm>
+         <primary>bit_length</primary>
+        </indexterm>
+        <literal><function>bit_length(<parameter>bytes</parameter>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>Number of bits in binary string</entry>
+       <entry><literal>bit_length('jo\000se':bytea)</literal></entry>
+       <entry><literal>40</literal></entry>
+      </row>
+
       <row>
        <entry>
         <indexterm>
          <primary>octet_length</primary>
         </indexterm>
-        <literal><function>octet_length(<parameter>string</parameter>)</function></literal>
+        <literal><function>octet_length(<parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Number of bytes in binary string</entry>
@@ -3562,7 +3536,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>overlay</primary>
         </indexterm>
-        <literal><function>overlay(<parameter>string</parameter> placing <parameter>string</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>overlay(<parameter>bytes</parameter> placing <parameter>bytes</parameter> from <type>int</type> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3577,7 +3551,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>position</primary>
         </indexterm>
-        <literal><function>position(<parameter>substring</parameter> in <parameter>string</parameter>)</function></literal>
+        <literal><function>position(<parameter>bytesubstring</parameter> in <parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>Location of specified substring</entry>
@@ -3590,7 +3564,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>substring</primary>
         </indexterm>
-        <literal><function>substring(<parameter>string</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
+        <literal><function>substring(<parameter>bytes</parameter> <optional>from <type>int</type></optional> <optional>for <type>int</type></optional>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
@@ -3606,18 +3580,34 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
          <primary>trim</primary>
         </indexterm>
         <literal><function>trim(<optional>both</optional>
-        <parameter>bytes</parameter> from
-        <parameter>string</parameter>)</function></literal>
+        <parameter>bytesremoved</parameter> from
+        <parameter>bytes</parameter>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Remove the longest string containing only bytes appearing in
-        <parameter>bytes</parameter> from the start
-        and end of <parameter>string</parameter>
+        <parameter>bytesremoved</parameter> from the start
+        and end of <parameter>bytes</parameter>
        </entry>
        <entry><literal>trim('\000\001'::bytea from '\000Tom\001'::bytea)</literal></entry>
        <entry><literal>Tom</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <literal><function>trim(<optional>leading | trailing
+        | both</optional> <optional>from</optional>
+        <parameter>bytes</parameter>
+        <optional>, <parameter>bytesremoved</parameter></optional>
+        )</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Non-standard syntax for <function>trim()</function>
+       </entry>
+       <entry><literal>trim(both from 'yxTomxx'::bytea, 'xyz'::bytea)</literal></entry>
+       <entry><literal>Tom</literal></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
@@ -3649,66 +3639,53 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>btrim</primary>
         </indexterm>
-        <literal><function>btrim(<parameter>string</parameter>
-        <type>bytea</type>, <parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+        <literal><function>btrim(<parameter>bytes</parameter>
+        <type>bytea</type>, <parameter>bytesremoved</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
         Remove the longest string containing only bytes appearing in
-        <parameter>bytes</parameter> from the start and end of
-        <parameter>string</parameter>
-      </entry>
+        <parameter>bytesremoved</parameter> from the start and end of
+        <parameter>bytes</parameter>
+       </entry>
       <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry>
       <entry><literal>trim</literal></entry>
      </row>
 
-     <row>
-      <entry>
-        <indexterm>
-         <primary>decode</primary>
-        </indexterm>
-       <literal><function>decode(<parameter>string</parameter> <type>text</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>bytea</type></entry>
-      <entry>
-       Decode binary data from textual representation in <parameter>string</parameter>.
-       Options for <parameter>format</parameter> are same as in <function>encode</function>.
-      </entry>
-      <entry><literal>decode('123\000456', 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
-
-     <row>
-      <entry>
+      <row>
+       <entry>
         <indexterm>
-         <primary>encode</primary>
+         <primary>convert</primary>
         </indexterm>
-       <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
-       <parameter>format</parameter> <type>text</type>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
-      </entry>
-      <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
-      <entry><literal>123\000456</literal></entry>
-     </row>
+        <literal><function>convert(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>src_encoding</parameter> <type>name</type>,
+        <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Convert binary string from <parameter>src_encoding</parameter>
+        to <parameter>dest_encoding</parameter>, or raise an error.  See
+        <xref linkend="conversion-names"/> for the predefined conversions.
+        The <parameter>bytes</parameter> must be valid
+        in <parameter>src_encoding</parameter>.  New conversions may be
+        defined with <command>CREATE CONVERSION</command>.
+       </entry>
+       <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry>
+       <entry><literal>text_in_utf8</literal> represented in Latin-1
+       encoding (ISO 8859-1)</entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>get_bit</primary>
         </indexterm>
-        <literal><function>get_bit(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_bit(<parameter>bytes</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract bit from string
+        <link linkend="functions-zerobased-note">Extract</link> bit from binary
+        string
        </entry>
        <entry><literal>get_bit('Th\000omas'::bytea, 45)</literal></entry>
        <entry><literal>1</literal></entry>
@@ -3719,67 +3696,81 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>get_byte</primary>
         </indexterm>
-        <literal><function>get_byte(<parameter>string</parameter>, <parameter>offset</parameter>)</function></literal>
+        <literal><function>get_byte(<parameter>bytes</parameter> <type>bytea</type>, <parameter>offset</parameter>)</function></literal>
        </entry>
        <entry><type>int</type></entry>
        <entry>
-        Extract byte from string
+        <link linkend="functions-zerobased-note">Extract</link> byte from binary string
        </entry>
        <entry><literal>get_byte('Th\000omas'::bytea, 4)</literal></entry>
        <entry><literal>109</literal></entry>
       </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>length</primary>
-       </indexterm>
-       <literal><function>length(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>int</type></entry>
-      <entry>
-       Length of binary string
-       <indexterm>
-        <primary>binary string</primary>
-        <secondary>length</secondary>
-       </indexterm>
-       <indexterm>
-        <primary>length</primary>
-        <secondary sortas="binary string">of a binary string</secondary>
-        <see>binary strings, length</see>
-       </indexterm>
-      </entry>
-      <entry><literal>length('jo\000se'::bytea)</literal></entry>
-      <entry><literal>5</literal></entry>
-     </row>
+      <row>
+       <entry>
+        <indexterm>
+         <primary>length</primary>
+        </indexterm>
+        <literal><function>length(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Length of binary string
+        <indexterm>
+         <primary>binary string</primary>
+         <secondary>length</secondary>
+        </indexterm>
+        <indexterm>
+         <primary>length</primary>
+         <secondary sortas="binary string">of a binary string</secondary>
+         <see>binary strings, length</see>
+        </indexterm>
+       </entry>
+       <entry><literal>length('jo\000se'::bytea)</literal></entry>
+       <entry><literal>5</literal></entry>
+      </row>
 
-     <row>
-      <entry>
-       <indexterm>
-        <primary>md5</primary>
-       </indexterm>
-       <literal><function>md5(<parameter>string</parameter>)</function></literal>
-      </entry>
-      <entry><type>text</type></entry>
-      <entry>
-       Calculates the MD5 hash of <parameter>string</parameter>,
-       returning the result in hexadecimal
-      </entry>
-      <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
-      <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
-     </row>
+      <row>
+       <entry><literal><function>length(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>encoding</parameter> <type>name</type> )</function></literal></entry>
+       <entry><type>int</type></entry>
+       <entry>
+        Number of characters in <parameter>bytes</parameter> in the given
+        <parameter>encoding</parameter>. The <parameter>bytes</parameter>
+        must be valid in this encoding.
+       </entry>
+       <entry><literal>length('jose'::bytea, 'UTF8')</literal></entry>
+       <entry><literal>4</literal></entry>
+      </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>md5</primary>
+        </indexterm>
+        <literal><function>md5(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
+       </entry>
+       <entry><type>text</type></entry>
+       <entry>
+        Calculates the MD5 <link linkend="functions-hashnote">hash</link>
+        of <parameter>bytes</parameter>, returning the result in hexadecimal
+       </entry>
+       <entry><literal>md5('Th\000omas'::bytea)</literal></entry>
+       <entry><literal>8ab2d3c9689aaf18&#x200B;b4958c334c82d8b1</literal></entry>
+      </row>
 
       <row>
        <entry>
         <indexterm>
          <primary>set_bit</primary>
         </indexterm>
-        <literal><function>set_bit(<parameter>string</parameter>,
-        <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
+        <literal><function>set_bit(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>int</type>,
+        <parameter>newvalue</parameter> <type>int</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set bit in string
+        <link linkend="functions-zerobased-note">Set</link> bit in binary string
        </entry>
        <entry><literal>set_bit('Th\000omas'::bytea, 45, 0)</literal></entry>
        <entry><literal>Th\000omAs</literal></entry>
@@ -3790,12 +3781,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>set_byte</primary>
         </indexterm>
-        <literal><function>set_byte(<parameter>string</parameter>,
-        <parameter>offset</parameter>, <parameter>newvalue</parameter>)</function></literal>
+        <literal><function>set_byte(<parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>int</type>,
+        <parameter>newvalue</parameter> <type>int</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        Set byte in string
+        <link linkend="functions-zerobased-note">Set</link> byte in binary string
        </entry>
        <entry><literal>set_byte('Th\000omas'::bytea, 4, 64)</literal></entry>
        <entry><literal>Th\000o@as</literal></entry>
@@ -3806,13 +3798,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha224</primary>
         </indexterm>
-        <literal><function>sha224(<type>bytea</type>)</function></literal>
+        <literal><function>sha224(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-224 hash
+        SHA-224 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha224('abc')</literal></entry>
+       <entry><literal>sha224('abc'::bytea)</literal></entry>
        <entry><literal>\x23097d223405d8228642a477bda2&#x200B;55b32aadbce4bda0b3f7e36c9da7</literal></entry>
       </row>
 
@@ -3821,13 +3813,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha256</primary>
         </indexterm>
-        <literal><function>sha256(<type>bytea</type>)</function></literal>
+        <literal><function>sha256(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-256 hash
+        SHA-256 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha256('abc')</literal></entry>
+       <entry><literal>sha256('abc'::bytea)</literal></entry>
        <entry><literal>\xba7816bf8f01cfea414140de5dae2223&#x200B;b00361a396177a9cb410ff61f20015ad</literal></entry>
       </row>
 
@@ -3836,13 +3828,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha384</primary>
         </indexterm>
-        <literal><function>sha384(<type>bytea</type>)</function></literal>
+        <literal><function>sha384(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-384 hash
+        SHA-384 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha384('abc')</literal></entry>
+       <entry><literal>sha384('abc'::bytea)</literal></entry>
        <entry><literal>\xcb00753f45a35e8bb5a03d699ac65007&#x200B;272c32ab0eded1631a8b605a43ff5bed&#x200B;8086072ba1e7cc2358baeca134c825a7</literal></entry>
       </row>
 
@@ -3851,34 +3843,54 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>sha512</primary>
         </indexterm>
-        <literal><function>sha512(<type>bytea</type>)</function></literal>
+        <literal><function>sha512(<parameter>bytes</parameter> <type>bytea</type>)</function></literal>
        </entry>
        <entry><type>bytea</type></entry>
        <entry>
-        SHA-512 hash
+         SHA-512 <link linkend="functions-hashnote">hash</link>
        </entry>
-       <entry><literal>sha512('abc')</literal></entry>
+       <entry><literal>sha512('abc'::bytea)</literal></entry>
        <entry><literal>\xddaf35a193617abacc417349ae204131&#x200B;12e6fa4e89a97ea20a9eeee64b55d39a&#x200B;2192992a274fc1a836ba3c23a3feebbd&#x200B;454d4423643ce80e2a9ac94fa54ca49f</literal></entry>
       </row>
+
+      <row>
+       <entry>
+        <indexterm>
+         <primary>substr</primary>
+        </indexterm>
+        <literal><function>substr(<parameter>bytes</parameter> <type>bytea</type>, <parameter>from</parameter> <optional>, <parameter>count</parameter> </optional>)</function></literal>
+       </entry>
+       <entry><type>bytea</type></entry>
+       <entry>
+        Extract binary substring (same as
+        <literal>substring(<parameter>bytea</parameter> from <parameter>from</parameter> for <parameter>count</parameter>)</literal>)
+       </entry>
+       <entry><literal>substr('alphabet', 3, 2)</literal></entry>
+       <entry><literal>ph</literal></entry>
+      </row>
     </tbody>
    </tgroup>
   </table>
 
-  <para>
-   <function>get_byte</function> and <function>set_byte</function> number the first byte
-   of a binary string as byte 0.
-   <function>get_bit</function> and <function>set_bit</function> number bits from the
-   right within each byte; for example bit 0 is the least significant bit of
-   the first byte, and bit 15 is the most significant bit of the second byte.
+  <para id="functions-zerobased-note">
+   Functions <function>get_byte</function> and <function>set_byte</function>
+   number the first byte of a binary string as byte 0.
+   Functions <function>get_bit</function> and <function>set_bit</function>
+   number bits from the right within each byte; for example bit 0 is the least
+   significant bit of the first byte, and bit 15 is the most significant bit
+   of the second byte.
   </para>
 
-  <para>
+  <para id="functions-hashnote">
    Note that for historic reasons, the function <function>md5</function>
    returns a hex-encoded value of type <type>text</type> whereas the SHA-2
    functions return type <type>bytea</type>.  Use the functions
-   <function>encode</function> and <function>decode</function> to convert
-   between the two, for example <literal>encode(sha256('abc'),
-   'hex')</literal> to get a hex-encoded text representation.
+   <link linkend="functions-encode"><function>encode</function></link>
+   and <link linkend="functions-decode"><function>decode</function></link> to
+   convert between the two.  For example <literal>encode(sha256('abc'),
+   'hex')</literal> to get a hex-encoded text representation
+   and <literal>decode(md5('abc'), 'hex')</literal> to get
+   a <type>bytea</type> value.
   </para>
 
   <para>
@@ -3886,6 +3898,210 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    <xref linkend="functions-aggregate"/> and the large object functions
    in <xref linkend="lo-funcs"/>.
   </para>
+
+  <sect2 id="functions-convertingstringbinary">
+   <title>String to Binary and Binary to String Conversion</title>
+
+   <indexterm>
+    <primary>function</primary>
+    <secondary>converting between string and binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>character string</primary>
+     <secondary>converting to binary</secondary>
+   </indexterm>
+   <indexterm>
+     <primary>binary string</primary>
+     <secondary>converting to character string</secondary>
+   </indexterm>
+
+   <table id="functions-binarystringconversions">
+    <title>Binary/String Conversion Functions</title>
+    <tgroup cols="5">
+     <thead>
+      <row>
+       <entry>Function</entry>
+       <entry>Return Type</entry>
+       <entry>Description</entry>
+       <entry>Example</entry>
+       <entry>Result</entry>
+      </row>
+     </thead>
+
+     <tbody>
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_from</primary>
+         </indexterm>
+         <literal><function>convert_from(<parameter>bytes</parameter> <type>bytea</type>,
+         <parameter>src_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Convert binary string to the database encoding.  The original encoding
+         is specified by <parameter>src_encoding</parameter>. The
+         <parameter>bytes</parameter> must be valid in this encoding.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry>
+        <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
+       </row>
+
+       <row>
+        <entry>
+         <indexterm>
+          <primary>convert_to</primary>
+         </indexterm>
+         <literal><function>convert_to(<parameter>bytes</parameter> <type>text</type>,
+         <parameter>dest_encoding</parameter> <type>name</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Convert binary string to <parameter>dest_encoding</parameter>.  See
+         <xref linkend="conversion-names"/> for available conversions.
+        </entry>
+        <entry><literal>convert_to('some text', 'UTF8')</literal></entry>
+        <entry><literal>some text</literal> represented in the UTF8 encoding</entry>
+       </row>
+
+       <row id="functions-encode">
+        <entry>
+         <indexterm>
+          <primary>encode</primary>
+         </indexterm>
+         <indexterm>
+          <primary>base64 encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>hex encoding</primary>
+         </indexterm>
+         <indexterm>
+          <primary>escape encoding</primary>
+         </indexterm>
+         <literal><function>encode(<parameter>bytes</parameter> <type>bytea</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>text</type></entry>
+        <entry>
+         Encode binary data into a textual representation.  Supported
+         formats are:
+         <link linkend="base64-encoding"><literal>base64</literal></link>,
+         <link linkend="escape-encoding"><literal>escape</literal></link>,
+         <link linkend="hex-encoding"><literal>hex</literal></link>.
+        </entry>
+        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
+        <entry><literal>MTIzAAE=</literal></entry>
+       </row>
+
+       <row id="functions-decode">
+        <entry>
+         <indexterm>
+          <primary>decode</primary>
+         </indexterm>
+         <indexterm>
+           <primary>base64 encoding</primary>
+         </indexterm>
+         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
+         <parameter>format</parameter> <type>text</type>)</function></literal>
+        </entry>
+        <entry><type>bytea</type></entry>
+        <entry>
+         Decode binary data from textual representation in <parameter>string</parameter>.
+         <link linkend="encoding-options">Options
+         for <parameter>format</parameter></link> are same as
+         in <function>encode</function>.
+        </entry>
+        <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry>
+        <entry><literal>\x3132330001</literal></entry>
+       </row>
+     </tbody>
+    </tgroup>
+   </table>
+
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para id="encoding-options">
+     The <function>encode</function> and <function>decode</function> functions
+     support the following encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             The <literal>escape</literal> encoding converts zero bytes and
+             high-bit-set bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             The <literal>hex</literal> encoding represents each 4 bits of
+             data as a single hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+  </sect2>
  </sect1>

#44

Karl O. Pinc

kop@karlpinc.com

almost 6 years ago

In reply to: Alvaro Herrera (#41)

Re: Patch to document base64 encoding

On Thu, 16 Jan 2020 15:44:44 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Because of how the new tables look in the PDF docs, I thought it might
be a good time to research how to make each function-entry occupy two
rows: one for prototype, return type and description, and the other
for the example and its result.

Another approach might be to fix/change the software that generates
PDFs. Or whatever turns it into latex if that's the
intermediate and really where the problem lies. (FWIW, I've had luck
with dblatex.)

(Maybe best to take this thread to the pgsql-docs mailing list?)

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

#45

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Karl O. Pinc (#43)

Re: Patch to document base64 encoding

Tom, you're marked as committer for this one in the commitfest app; are
you still intending to get it committed? If not, I can.

Thanks,

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#46

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Alvaro Herrera (#45)

Re: Patch to document base64 encoding

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom, you're marked as committer for this one in the commitfest app; are
you still intending to get it committed? If not, I can.

I've not been paying much attention to this thread, but I'll take
another look.

regards, tom lane

#47

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Tom Lane (#46)

Re: Patch to document base64 encoding

I've pushed this patch with some further hacking.

Most notably, I shoved the conversion-names table over to charset.sgml,
as I'd mused about doing upthread. It no longer had any reason at all
to be in section 9.4. We could have moved it down to 9.5, but I felt
that it would still be wrongly placed there. Since none of these
functions actually take conversion names, it's not very relevant
documentation for them --- you can generally assume that whatever
conversion you want is available, and you'll usually be right.

Another point relevant to the discussion is that I dropped the <sect2>
for the conversion functions. It seemed to me that giving them their
own table was enough --- the discussion about them isn't lengthy enough
to justify a separate section, IMO. I also don't buy the argument that
we need a <sect2> to make these things visible in the table of contents.
We have the cross-reference from section 9.4, as well as a passel of
index entries, to help people who don't know where to look. (Once upon
a time we had a list of tables alongside the TOC; maybe that should be
resurrected?)

I took the liberty of doing some copy-editing on nearby function
descriptions, too, mostly to try to give them more uniform style.
And there were some errors; notably, the patch added descriptions
for shaNNN(text), which are functions we do not have AFAICS.

I share Alvaro's feeling that these tables could stand to be reformatted
so that they're not such a mess when rendered in narrower formats. But
that seems like a task for a separate patch, especially since the problem
is hardly confined to these two sections.

regards, tom lane

#48

Karl O. Pinc

kop@karlpinc.com

almost 6 years ago

In reply to: Tom Lane (#47)

Re: Patch to document base64 encoding

On Sat, 18 Jan 2020 18:05:49 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

And there were some errors; notably, the patch added descriptions
for shaNNN(text), which are functions we do not have AFAICS.

Apologies for that, my mistake.

Thank you to Fabien and everybody who helped.

Regards,

Karl <kop@karlpinc.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein