Glossary and initdb definition work for "superuser" and database/cluster

Started by David G. Johnstonabout 3 years ago6 messages
#1David G. Johnston
david.g.johnston@gmail.com
1 attachment(s)

Hey,

Recent threads have pointed out some long-standing doc language in initdb
that could be made more precise, especially in light of the relatively
recent addition of a glossary. Toward this end I'm attaching a patch that
defines three terms: "bootstrap superuser", "database superuser" and
"superuser". I didn't add any extra-glossary links for the later two but
did for the limited-in-scope bootstrap superuser that is really only
defined in initdb (actually, I suspect the authorization docs could use a
link too but haven't gone looking for an appropriate place yet).

In passing I also changed a few places where the documentation says
"database" when the thing being referred to is basically the file system
data directory, which is a cluster-scoped thing.

I did some grep'ing, though another pass or two is probably worthwhile.
For now I submit a preliminary patch for consideration and buy-in before
trying to polish it up.

David J.

Attachments:

initdb-and-glossary.diffapplication/octet-stream; name=initdb-and-glossary.diffDownload
diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml
index 1150b7f5bb..99acabda2d 100644
--- a/doc/src/sgml/adminpack.sgml
+++ b/doc/src/sgml/adminpack.sgml
@@ -12,7 +12,7 @@
   <application>pgAdmin</application> and other administration and management tools can
   use to provide additional functionality, such as remote management
   of server log files.
-  Use of all these functions is only allowed to the superuser by default but may be
+  Use of all these functions is only allowed to database superusers by default but may be
   allowed to other users by using the <command>GRANT</command> command.
  </para>
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index d6d0a3a814..9efb694248 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -233,6 +233,28 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-bootstrap-superuser">
+   <glossterm>Bootstrap superuser</glossterm>
+   <glossdef>
+    <para>
+     The very first user created in a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.
+     By default this user is named <literal>postgres</literal> but
+     the <option>--username</option> argument to <xref linkend="app-initdb" />
+     allows this to be changed.
+    </para>
+    <para>
+     This user owns all system catalog tables in each database.  It also is the role
+     from which all granted permission originate.  Because of these things this
+     role may not be dropped.
+    </para>
+    <para>
+     This role also behaves as a normal
+     <glossterm linkend="glossary-database-superuser">database superuser</glossterm>
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-cast">
    <glossterm>Cast</glossterm>
    <glossdef>
@@ -489,6 +511,25 @@
    <glosssee otherterm="glossary-instance" />
   </glossentry>
 
+  <glossentry id="glossary-database-superuser">
+   <glossterm>Database superuser</glossterm>
+   <glossdef>
+    <para>
+     A role having the <literal>superuser</literal> <xref linkend="role-attributes"/>.
+    </para>
+    <para>
+     All superusers in the system are collectively referred to as database superusers throughout
+     the documentation.  Any plain use of the term
+     <glossterm linkend="glossary-bootstrap-superuser">superuser</glossterm>
+     can be interpreted to mean database superuser.
+    </para>
+    <para>
+     While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
+     a database superuser it has special obligations and restrictions that plain database superusers do not.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-data-directory">
    <glossterm>Data directory</glossterm>
    <glossdef>
@@ -1577,6 +1618,16 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-superuser">
+   <glossterm>Superuser</glossterm>
+   <glossdef>
+    <para>
+     As used in this documentation it is a synonym for
+     <glossterm linkend="glossary-database-superuser">database superuser</glossterm>.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-system-catalog">
    <glossterm>System catalog</glossterm>
    <glossdef>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8158896298..2132f32ac2 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -44,10 +44,13 @@ PostgreSQL documentation
 
   <para>
    Creating a database cluster consists of creating the directories in
-   which the database data will live, generating the shared catalog
+   which the cluster data will live, generating the shared catalog
    tables (tables that belong to the whole cluster rather than to any
-   particular database), and creating the <literal>postgres</literal>,
-   <literal>template1</literal>, and <literal>template0</literal> databases.
+   particular database), creating the <literal>postgres</literal>,
+   <literal>template1</literal>, and <literal>template0</literal> databases,
+   and creating the
+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>
+   (<literal>postgres</literal>, by default).
    The <literal>postgres</literal> database is a default database meant
    for use by users, utilities and third party applications.
    <literal>template1</literal> and <literal>template0</literal> are
@@ -64,14 +67,14 @@ PostgreSQL documentation
    directory of the desired data directory is root-owned. To initialize
    in such a setup, create an empty data directory as root, then use
    <command>chown</command> to assign ownership of that directory to the
-   database user account, then <command>su</command> to become the
-   database user to run <command>initdb</command>.
+   cluster user account, then <command>su</command> to become the
+   cluster user to run <command>initdb</command>.
   </para>
 
   <para>
-   <command>initdb</command> must be run as the user that will own the
-   server process, because the server needs to have access to the
-   files and directories that <command>initdb</command> creates.
+   <command>initdb</command> must be run as the operating-system user
+   that will own the server process, because the server needs to have
+   access to the files and directories that <command>initdb</command> creates.
    Since the server cannot be run as root, you must not run
    <command>initdb</command> as root either.  (It will in fact refuse
    to do so.)
@@ -79,7 +82,7 @@ PostgreSQL documentation
 
   <para>
     For security reasons the new cluster created by <command>initdb</command>
-    will only be accessible by the cluster owner by default.  The
+    will only be accessible by the cluster user by default.  The
     <option>--allow-group-access</option> option allows any user in the same
     group as the cluster owner to read files in the cluster.  This is useful
     for performing backups as a non-privileged user.
@@ -196,7 +199,7 @@ PostgreSQL documentation
         <command>initdb</command>, but you can avoid writing it by
         setting the <envar>PGDATA</envar> environment variable, which
         can be convenient since the database server
-        (<command>postgres</command>) can find the database
+        (<command>postgres</command>) can find the data
         directory later by the same variable.
        </para>
       </listitem>
@@ -338,7 +341,7 @@ PostgreSQL documentation
       <term><option>--pwfile=<replaceable>filename</replaceable></option></term>
       <listitem>
        <para>
-        Makes <command>initdb</command> read the database superuser's password
+        Makes <command>initdb</command> read the bootstrap superuser's password
         from a file.  The first line of the file is taken as the password.
        </para>
       </listitem>
@@ -349,7 +352,7 @@ PostgreSQL documentation
       <term><option>--sync-only</option></term>
       <listitem>
        <para>
-        Safely write all database files to disk and exit.  This does not
+        Safely write all database cluster files to disk and exit.  This does not
         perform any of the normal <application>initdb</application> operations.
         Generally, this option is useful for ensuring reliable recovery after
         changing <xref linkend="guc-fsync"/> from <literal>off</literal> to
@@ -374,10 +377,11 @@ PostgreSQL documentation
       <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
       <listitem>
        <para>
-        Selects the user name of the database superuser. This defaults
-        to the name of the effective user running
+        Selects the user name of the
+        <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>.
+        This defaults to the name of the effective user running
         <command>initdb</command>. It is really not important what the
-        superuser's name is, but one might choose to keep the
+        bootstrap superuser's name is, but one might choose to keep the
         customary name <systemitem>postgres</systemitem>, even if the operating
         system user's name is different.
        </para>
@@ -390,7 +394,7 @@ PostgreSQL documentation
       <listitem>
        <para>
         Makes <command>initdb</command> prompt for a password
-        to give the database superuser. If you don't plan on using password
+        to give the bootstrap superuser. If you don't plan on using password
         authentication, this is not important.  Otherwise you won't be
         able to use password authentication until you have a password
         set up.
@@ -422,7 +426,7 @@ PostgreSQL documentation
 
        <para>
         It may be useful to adjust this size to control the granularity of
-        WAL log shipping or archiving.  Also, in databases with a high volume
+        WAL log shipping or archiving.  Also, in clusters with a high volume
         of WAL, the sheer number of WAL files per directory can become a
         performance and management problem.  Increasing the WAL file size
         will reduce the number of WAL files.
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: David G. Johnston (#1)
Re: Glossary and initdb definition work for "superuser" and database/cluster

On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:

Hey,

Recent threads have pointed out some long-standing doc language in initdb
that could be made more precise, especially in light of the relatively
recent addition of a glossary. Toward this end I'm attaching a patch that
defines three terms: "bootstrap superuser", "database superuser" and
"superuser". I didn't add any extra-glossary links for the later two but
did for the limited-in-scope bootstrap superuser that is really only
defined in initdb (actually, I suspect the authorization docs could use a
link too but haven't gone looking for an appropriate place yet).

In passing I also changed a few places where the documentation says
"database" when the thing being referred to is basically the file system
data directory, which is a cluster-scoped thing.

I did some grep'ing, though another pass or two is probably worthwhile.
For now I submit a preliminary patch for consideration and buy-in before
trying to polish it up.

I think this is wrong:

| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
|
| Selects the user name of the database superuser. This defaults to
| the name of the effective user running initdb [...]

It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.

+     This user owns all system catalog tables in each database.  It also is the role
+     from which all granted permission originate.  Because of these things this
+     role may not be dropped.

plural permissions

these comma

+     While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
+     a database superuser it has special obligations and restrictions that plain database superusers do not.

comma it

+  <glossentry id="glossary-superuser">
+   <glossterm>Superuser</glossterm>
+   <glossdef>
+    <para>
+     As used in this documentation it is a synonym for

comma it

Creating a database cluster consists of creating the directories in
-   which the database data will live, generating the shared catalog
+   which the cluster data will live, generating the shared catalog

+1

tables (tables that belong to the whole cluster rather than to any
-   particular database), and creating the <literal>postgres</literal>,
-   <literal>template1</literal>, and <literal>template0</literal> databases.
+   particular database), creating the <literal>postgres</literal>,
+   <literal>template1</literal>, and <literal>template0</literal> databases,
+   and creating the
+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>
+   (<literal>postgres</literal>, by default).

"postgres" is wrong

For security reasons the new cluster created by <command>initdb</command>
-    will only be accessible by the cluster owner by default.  The
+    will only be accessible by the cluster user by default.  The

I prefer "cluster owner"

<command>initdb</command>, but you can avoid writing it by
setting the <envar>PGDATA</envar> environment variable, which
can be convenient since the database server
-        (<command>postgres</command>) can find the database
+        (<command>postgres</command>) can find the data
directory later by the same variable.

+1

-        Makes <command>initdb</command> read the database superuser's password
+        Makes <command>initdb</command> read the bootstrap superuser's password
from a file.  The first line of the file is taken as the password.

+1

-        Safely write all database files to disk and exit.  This does not
+        Safely write all database cluster files to disk and exit.  This does not

+1

It may be useful to adjust this size to control the granularity of
-        WAL log shipping or archiving.  Also, in databases with a high volume
+        WAL log shipping or archiving.  Also, in clusters with a high volume
of WAL, the sheer number of WAL files per directory can become a

+1

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Justin Pryzby (#2)
Re: Glossary and initdb definition work for "superuser" and database/cluster

On Tue, Nov 1, 2022 at 5:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:

I think this is wrong:

| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
|
| Selects the user name of the database superuser. This defaults to
| the name of the effective user running initdb [...]

It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.

Thanks. I feel bad for missing this one given that I've been working on
fixing up the default libpq user name wording.

+ This user owns all system catalog tables in each database. It also

is the role

+ from which all granted permission originate. Because of these

things this

+ role may not be dropped.

plural permissions

+1

these comma

things comma actually (+0.5)

+ While the <glossterm

linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is

+ a database superuser it has special obligations and restrictions

that plain database superusers do not.

comma it

+ 0.5

tables (tables that belong to the whole cluster rather than to any
- particular database), and creating the <literal>postgres</literal>,
- <literal>template1</literal>, and <literal>template0</literal>

databases.

+   particular database), creating the <literal>postgres</literal>,
+   <literal>template1</literal>, and <literal>template0</literal>

databases,

+   and creating the
+   <glossterm linkend="glossary-bootstrap-superuser">boostrap

superuser</glossterm>

+ (<literal>postgres</literal>, by default).

"postgres" is wrong

Yep, will give this another look to see if anywhere but the actual option
description wants to cover how this really works (or maybe just point the
reader there).

For security reasons the new cluster created by

<command>initdb</command>

-    will only be accessible by the cluster owner by default.  The
+    will only be accessible by the cluster user by default.  The

I prefer "cluster owner"

I'll either need to change it back or fix the one in the next sentence...

I'm still leaning toward continuing to use cluster user like everywhere
else on the page instead of adding a new term. The fact that this doesn't
work on Windows makes having it in the description section at all
arguable. I'd rather rewrite it something like:

"On POSIX systems, the resulting data directory, and all of its contents,
will have permissions of 700, though you can use --allow-group-access to
instead get 750. In either case, the effective user running initdb will
become the owner and group for the files created within the data directory."

(I haven't tried to prove this owner:group dynamic, but having 700 or 750
and specifying the alternative does result in the directory having its
permission bits changed during initdb)

Feel free to suggest something if similar wording should be added for
non-POSIX systems.

I intend to try and integrate something like the above to replace the
existing paragraph in the next version.

Thank you for the review!

David J.

P.S. I'm now looking at the very first paragraph to initdb more closely,
not liking "single server instance" all that much and wondering how to fit
in "cluster user" there - possibly by saying something like "...managed by
a single server process, and physical data directory, whose effective user
and owner respectively is called the cluster user. That user must exist
and be used to execute this program."

Then the whole "initdb must be run as..." paragraph can probably just go
away. Moving the commentary about "root", again a non-Windows thing, to
the notes area.

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#3)
1 attachment(s)
Re: Glossary and initdb definition work for "superuser" and database/cluster

On Tue, Nov 1, 2022 at 6:59 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:

P.S. I'm now looking at the very first paragraph to initdb more closely,
not liking "single server instance" all that much and wondering how to fit
in "cluster user" there - possibly by saying something like "...managed by
a single server process, and physical data directory, whose effective user
and owner respectively is called the cluster user. That user must exist
and be used to execute this program."

Then the whole "initdb must be run as..." paragraph can probably just go
away. Moving the commentary about "root", again a non-Windows thing, to
the notes area.

Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider moving some
of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1]https://www.postgresql.org/docs/current/creating-cluster.html.

I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter 19
yet). Also added it to the glossary.

Moved quite a bit of material to notes from the description and options and
expanded upon what had already been said based upon various discussions
I've been part of on the mailing lists.

Decided to call out, in the glossary, the effective equivalence of database
superuser and cluster owner. Which acts as an explanation as to why root
is prohibited to be a cluster owner.

David J.

[1]: https://www.postgresql.org/docs/current/creating-cluster.html

Attachments:

initdb-and-glossary-v2.patchapplication/octet-stream; name=initdb-and-glossary-v2.patchDownload
commit 2873a9c32f0843f227c43f234960e48b3b118a49
Author: David G. Johnston <David.G.Johnston@Gmail.com>
Date:   Tue Nov 1 22:48:58 2022 +0000

    initdb and glossary

diff --git a/doc/src/sgml/adminpack.sgml b/doc/src/sgml/adminpack.sgml
index 1150b7f5bb..184e96d7a0 100644
--- a/doc/src/sgml/adminpack.sgml
+++ b/doc/src/sgml/adminpack.sgml
@@ -12,7 +12,7 @@
   <application>pgAdmin</application> and other administration and management tools can
   use to provide additional functionality, such as remote management
   of server log files.
-  Use of all these functions is only allowed to the superuser by default but may be
+  Use of all these functions is only allowed to database superusers by default, but may be
   allowed to other users by using the <command>GRANT</command> command.
  </para>
 
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index d6d0a3a814..f0c4f3a389 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -233,6 +233,25 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-bootstrap-superuser">
+   <glossterm>Bootstrap superuser</glossterm>
+   <glossdef>
+    <para>
+     The first user initialized in a
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.
+    </para>
+    <para>
+     This user owns all system catalog tables in each database.  It also is the role
+     from which all granted permissions originate.  Because of these things, this
+     role may not be dropped.
+    </para>
+    <para>
+     This role also behaves as a normal
+     <glossterm linkend="glossary-database-superuser">database superuser</glossterm>
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-cast">
    <glossterm>Cast</glossterm>
    <glossdef>
@@ -342,6 +361,28 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-cluster-owner">
+   <glossterm>Cluster owner</glossterm>
+   <glossdef>
+    <para>
+     This is the term given to identify the operating system user that owns
+     the data directory and under which the <literal>postgres</literal> process is run.
+     It is required that this user exist prior to creating a new
+     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.
+    </para>
+    <para>
+     On operating systems with a <literal>root</literal> user,
+     that the <literal>root</literal> user must not be the cluster owner.
+    </para>
+    <para>
+     A cluster owner, by virtue of having full access to the database files, also can assume
+     the identity of any <glossterm linkend="glossary-database-superuser">database superuser</glossterm>.
+     The reverse is also true since <productname>PostgreSQL</productname> provides various ways
+     to execute commands in the operating system.
+    </para>    
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-column">
    <glossterm>Column</glossterm>
    <glossdef>
@@ -475,12 +516,20 @@
      and their common static and dynamic metadata.
      Sometimes referred to as a
      <firstterm>cluster</firstterm>.
+     A database cluster is created using the
+     <xref linkend="app-initdb" /> program.
     </para>
     <para>
      In <productname>PostgreSQL</productname>, the term
      <firstterm>cluster</firstterm> is also sometimes used to refer to an instance.
      (Don't confuse this term with the SQL command <command>CLUSTER</command>.)
     </para>
+    <para>
+      See also <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>,
+      the operating-system owner of a cluster,
+      and <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm>,
+      the <productname>PostgreSQL</productname> owner of a cluster.
+    </para>
    </glossdef>
   </glossentry>
 
@@ -489,6 +538,29 @@
    <glosssee otherterm="glossary-instance" />
   </glossentry>
 
+  <glossentry id="glossary-database-superuser">
+   <glossterm>Database superuser</glossterm>
+   <glossdef>
+    <para>
+     A role having the <literal>superuser</literal> attribute (see <xref linkend="role-attributes"/>).
+    </para>
+    <para>
+     All superusers in the system are collectively referred to as database superusers throughout
+     the documentation.  Any plain use of the term
+     <glossterm linkend="glossary-bootstrap-superuser">superuser</glossterm>
+     can be interpreted to mean database superuser.
+    </para>
+    <para>
+     While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
+     a database superuser, it has special obligations and restrictions that plain database superusers do not.
+    </para>
+    <para>
+     A database superuser, through various facilities provided by the server, also assumes the identity of
+     the <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>.  And vice-versa.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-data-directory">
    <glossterm>Data directory</glossterm>
    <glossdef>
@@ -1577,6 +1649,16 @@
    </glossdef>
   </glossentry>
 
+  <glossentry id="glossary-superuser">
+   <glossterm>Superuser</glossterm>
+   <glossdef>
+    <para>
+     As used in this documentation, it is a synonym for
+     <glossterm linkend="glossary-database-superuser">database superuser</glossterm>.
+    </para>
+   </glossdef>
+  </glossentry>
+
   <glossentry id="glossary-system-catalog">
    <glossterm>System catalog</glossterm>
    <glossdef>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8158896298..96442fd68c 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -37,17 +37,18 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>initdb</command> creates a new
-   <productname>PostgreSQL</productname> database cluster.  A database
-   cluster is a collection of databases that are managed by a single
-   server instance.
+   <productname>PostgreSQL</productname> <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.
   </para>
 
   <para>
-   Creating a database cluster consists of creating the directories in
-   which the database data will live, generating the shared catalog
+   Creating a <glossterm linkend="glossary-db-cluster">database cluster</glossterm>
+   consists of creating the directories in
+   which the cluster data will live, generating the shared catalog
    tables (tables that belong to the whole cluster rather than to any
-   particular database), and creating the <literal>postgres</literal>,
-   <literal>template1</literal>, and <literal>template0</literal> databases.
+   particular database), creating the <literal>postgres</literal>,
+   <literal>template1</literal>, and <literal>template0</literal> databases,
+   and creating the
+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>.
    The <literal>postgres</literal> database is a default database meant
    for use by users, utilities and third party applications.
    <literal>template1</literal> and <literal>template0</literal> are
@@ -59,30 +60,19 @@ PostgreSQL documentation
   </para>
 
   <para>
-   Although <command>initdb</command> will attempt to create the
-   specified data directory, it might not have permission if the parent
-   directory of the desired data directory is root-owned. To initialize
-   in such a setup, create an empty data directory as root, then use
-   <command>chown</command> to assign ownership of that directory to the
-   database user account, then <command>su</command> to become the
-   database user to run <command>initdb</command>.
+   <command>initdb</command> must be run by the intended
+   <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>.
+   The initialization process includes creating the data directory and
+   setting its permissions and ownership.  You must ensure either that the
+   cluster owner is able to create that directory, or that such a directory
+   already exists and is empty. See the notes section for more details
+   regarding the security of the data directory.
   </para>
 
   <para>
-   <command>initdb</command> must be run as the user that will own the
-   server process, because the server needs to have access to the
-   files and directories that <command>initdb</command> creates.
-   Since the server cannot be run as root, you must not run
-   <command>initdb</command> as root either.  (It will in fact refuse
-   to do so.)
-  </para>
-
-  <para>
-    For security reasons the new cluster created by <command>initdb</command>
-    will only be accessible by the cluster owner by default.  The
-    <option>--allow-group-access</option> option allows any user in the same
-    group as the cluster owner to read files in the cluster.  This is useful
-    for performing backups as a non-privileged user.
+   By default, the cluster owner's user name will be used for the newly created
+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>'s
+   role name. You can override this via the <option>--username</option> option.
   </para>
 
   <para>
@@ -196,7 +186,7 @@ PostgreSQL documentation
         <command>initdb</command>, but you can avoid writing it by
         setting the <envar>PGDATA</envar> environment variable, which
         can be convenient since the database server
-        (<command>postgres</command>) can find the database
+        (<command>postgres</command>) can find the data
         directory later by the same variable.
        </para>
       </listitem>
@@ -223,10 +213,14 @@ PostgreSQL documentation
       <term><option>--allow-group-access</option></term>
       <listitem>
        <para>
-        Allows users in the same group as the cluster owner to read all cluster
-        files created by <command>initdb</command>.  This option is ignored
-        on <productname>Windows</productname> as it does not support
-        <acronym>POSIX</acronym>-style group permissions.
+        Ignored by systems that do not use <acronym>POSIX</acronym>-style
+        group permissions (most notably Windows). Forces the group permission
+        flag to read+execute from the default of no permissions.  See notes for
+        more details.
+       </para>
+       <para>
+        This is mainly useful for enabling an unprivileged account to perform system
+        administration tasks, like backups, on the directory.
        </para>
       </listitem>
      </varlistentry>
@@ -338,7 +332,7 @@ PostgreSQL documentation
       <term><option>--pwfile=<replaceable>filename</replaceable></option></term>
       <listitem>
        <para>
-        Makes <command>initdb</command> read the database superuser's password
+        Makes <command>initdb</command> read the bootstrap superuser's password
         from a file.  The first line of the file is taken as the password.
        </para>
       </listitem>
@@ -349,7 +343,7 @@ PostgreSQL documentation
       <term><option>--sync-only</option></term>
       <listitem>
        <para>
-        Safely write all database files to disk and exit.  This does not
+        Safely write all database cluster files to disk and exit.  This does not
         perform any of the normal <application>initdb</application> operations.
         Generally, this option is useful for ensuring reliable recovery after
         changing <xref linkend="guc-fsync"/> from <literal>off</literal> to
@@ -374,12 +368,11 @@ PostgreSQL documentation
       <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
       <listitem>
        <para>
-        Selects the user name of the database superuser. This defaults
-        to the name of the effective user running
-        <command>initdb</command>. It is really not important what the
-        superuser's name is, but one might choose to keep the
-        customary name <systemitem>postgres</systemitem>, even if the operating
-        system user's name is different.
+        Selects the user name of the
+        <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>.
+        This defaults to the name of the 
+        <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>.
+        See the notes section for more details as to how these names are used.
        </para>
       </listitem>
      </varlistentry>
@@ -390,7 +383,7 @@ PostgreSQL documentation
       <listitem>
        <para>
         Makes <command>initdb</command> prompt for a password
-        to give the database superuser. If you don't plan on using password
+        to give the bootstrap superuser. If you don't plan on using password
         authentication, this is not important.  Otherwise you won't be
         able to use password authentication until you have a password
         set up.
@@ -422,7 +415,7 @@ PostgreSQL documentation
 
        <para>
         It may be useful to adjust this size to control the granularity of
-        WAL log shipping or archiving.  Also, in databases with a high volume
+        WAL log shipping or archiving.  Also, in clusters with a high volume
         of WAL, the sheer number of WAL files per directory can become a
         performance and management problem.  Increasing the WAL file size
         will reduce the number of WAL files.
@@ -569,10 +562,43 @@ PostgreSQL documentation
  <refsect1>
   <title>Notes</title>
 
+  <para>
+   On systems using <acronym>POSIX</acronym>-style permissions the data directory,
+   and all of its contents, will be forced to have permissions of 700, though you can
+   use <option>--allow-group-access</option> to instead force 750.  In either case,
+   the effective user running initdb (i.e., the
+   <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>)
+   will become the owner and group for data directory and its contents.
+  </para>
+
+  <para>
+   On systems with a special <literal>root</literal> user
+   <productname>PostgreSQL</productname> will fail if the
+   <glossterm linkend="glossary-cluster-owner">cluster owner</glossterm>
+   is <literal>root</literal>.  This probably conflicts with the fact that the
+   data directory is likely to be placed in a location owned by root.
+   In this situation the suggested process is to create an empty data directory,
+   as root, then use <command>chown</command> to assign ownership of that directory,
+   to the cluster owner, then <command>su</command> to become the
+   cluster owner to run <command>initdb</command>.
+  </para>
+
+  <para>
+   It is not important what the bootstrap superuser's name is, but one might choose to keep the
+   customary name <systemitem>postgres</systemitem>, even if the cluster owner's name is different.
+   In particular, because the default database name is <literal>postgres</literal>, and the libpq
+   default connection convention is that the name of the database being connected to matches the user name
+   making the connection.  However, peer authentication relies on the operating system user and
+   database user names being identical (unless you use an identity map) so leaving the bootstrap superuser
+   name equal to the cluster owner name makes connecting as the bootstrap superuser a bit less cumbersome.
+   Having all three be the conventional <literal>postgres</literal> meets both conventions.
+  </para>
+
   <para>
    <command>initdb</command> can also be invoked via
    <command>pg_ctl initdb</command>.
   </para>
+
  </refsect1>
 
  <refsect1>
#5Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: David G. Johnston (#4)
Re: Glossary and initdb definition work for "superuser" and database/cluster

On 2022-Nov-02, David G. Johnston wrote:

Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider moving some
of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1].

I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter 19
yet). Also added it to the glossary.

Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions. So I've extracted them from
your patch and pushed those. You can already see it at
https://www.postgresql.org/docs/devel/glossary.html

I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself. Some of
these sounded like security considerations rather than part of the
definitions. I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right. That
seems to me a serious deficiency. A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details. Maybe part of these
initdb considerations could be there, too.

Moved quite a bit of material to notes from the description and options and
expanded upon what had already been said based upon various discussions
I've been part of on the mailing lists.

Please rebase.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine." (Julien PUYDT)

#6David G. Johnston
david.g.johnston@gmail.com
In reply to: Alvaro Herrera (#5)
Re: Glossary and initdb definition work for "superuser" and database/cluster

On Fri, Nov 18, 2022 at 4:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:

On 2022-Nov-02, David G. Johnston wrote:

Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider moving

some

of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1].

I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter

19

yet). Also added it to the glossary.

Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions. So I've extracted them from
your patch and pushed those. You can already see it at
https://www.postgresql.org/docs/devel/glossary.html

Agreed on the not quite ready yet, and that the glossary is indeed
self-contained enough to go in by itself at this point. Thank you for
doing that.

I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself. Some of
these sounded like security considerations rather than part of the
definitions. I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right. That
seems to me a serious deficiency. A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details. Maybe part of these
initdb considerations could be there, too.

I'll consider that approach as well as other spots in the documentation on
this next pass.

David J.