Glossary and initdb definition work for "superuser" and database/cluster
Hey,
Recent threads have pointed out some long-standing doc language in initdb
that could be made more precise, especially in light of the relatively
recent addition of a glossary. Toward this end I'm attaching a patch that
defines three terms: "bootstrap superuser", "database superuser" and
"superuser". I didn't add any extra-glossary links for the later two but
did for the limited-in-scope bootstrap superuser that is really only
defined in initdb (actually, I suspect the authorization docs could use a
link too but haven't gone looking for an appropriate place yet).
In passing I also changed a few places where the documentation says
"database" when the thing being referred to is basically the file system
data directory, which is a cluster-scoped thing.
I did some grep'ing, though another pass or two is probably worthwhile.
For now I submit a preliminary patch for consideration and buy-in before
trying to polish it up.
David J.
Attachments:
initdb-and-glossary.diffapplication/octet-stream; name=initdb-and-glossary.diffDownload+73-18
On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:
Hey,
Recent threads have pointed out some long-standing doc language in initdb
that could be made more precise, especially in light of the relatively
recent addition of a glossary. Toward this end I'm attaching a patch that
defines three terms: "bootstrap superuser", "database superuser" and
"superuser". I didn't add any extra-glossary links for the later two but
did for the limited-in-scope bootstrap superuser that is really only
defined in initdb (actually, I suspect the authorization docs could use a
link too but haven't gone looking for an appropriate place yet).In passing I also changed a few places where the documentation says
"database" when the thing being referred to is basically the file system
data directory, which is a cluster-scoped thing.I did some grep'ing, though another pass or two is probably worthwhile.
For now I submit a preliminary patch for consideration and buy-in before
trying to polish it up.
I think this is wrong:
| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
|
| Selects the user name of the database superuser. This defaults to
| the name of the effective user running initdb [...]
It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.
+ This user owns all system catalog tables in each database. It also is the role + from which all granted permission originate. Because of these things this + role may not be dropped.
plural permissions
these comma
+ While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is + a database superuser it has special obligations and restrictions that plain database superusers do not.
comma it
+ <glossentry id="glossary-superuser"> + <glossterm>Superuser</glossterm> + <glossdef> + <para> + As used in this documentation it is a synonym for
comma it
Creating a database cluster consists of creating the directories in - which the database data will live, generating the shared catalog + which the cluster data will live, generating the shared catalog
+1
tables (tables that belong to the whole cluster rather than to any - particular database), and creating the <literal>postgres</literal>, - <literal>template1</literal>, and <literal>template0</literal> databases. + particular database), creating the <literal>postgres</literal>, + <literal>template1</literal>, and <literal>template0</literal> databases, + and creating the + <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm> + (<literal>postgres</literal>, by default).
"postgres" is wrong
For security reasons the new cluster created by <command>initdb</command> - will only be accessible by the cluster owner by default. The + will only be accessible by the cluster user by default. The
I prefer "cluster owner"
<command>initdb</command>, but you can avoid writing it by setting the <envar>PGDATA</envar> environment variable, which can be convenient since the database server - (<command>postgres</command>) can find the database + (<command>postgres</command>) can find the data directory later by the same variable.
+1
- Makes <command>initdb</command> read the database superuser's password + Makes <command>initdb</command> read the bootstrap superuser's password from a file. The first line of the file is taken as the password.
+1
- Safely write all database files to disk and exit. This does not + Safely write all database cluster files to disk and exit. This does not
+1
It may be useful to adjust this size to control the granularity of - WAL log shipping or archiving. Also, in databases with a high volume + WAL log shipping or archiving. Also, in clusters with a high volume of WAL, the sheer number of WAL files per directory can become a
+1
On Tue, Nov 1, 2022 at 5:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:
I think this is wrong:
| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
|
| Selects the user name of the database superuser. This defaults to
| the name of the effective user running initdb [...]It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.
Thanks. I feel bad for missing this one given that I've been working on
fixing up the default libpq user name wording.
+ This user owns all system catalog tables in each database. It also
is the role
+ from which all granted permission originate. Because of these
things this
+ role may not be dropped.
plural permissions
+1
these comma
things comma actually (+0.5)
+ While the <glossterm
linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
+ a database superuser it has special obligations and restrictions
that plain database superusers do not.
comma it
+ 0.5
tables (tables that belong to the whole cluster rather than to any
- particular database), and creating the <literal>postgres</literal>,
- <literal>template1</literal>, and <literal>template0</literal>databases.
+ particular database), creating the <literal>postgres</literal>, + <literal>template1</literal>, and <literal>template0</literal>databases,
+ and creating the + <glossterm linkend="glossary-bootstrap-superuser">boostrapsuperuser</glossterm>
+ (<literal>postgres</literal>, by default).
"postgres" is wrong
Yep, will give this another look to see if anywhere but the actual option
description wants to cover how this really works (or maybe just point the
reader there).
For security reasons the new cluster created by
<command>initdb</command>
- will only be accessible by the cluster owner by default. The + will only be accessible by the cluster user by default. TheI prefer "cluster owner"
I'll either need to change it back or fix the one in the next sentence...
I'm still leaning toward continuing to use cluster user like everywhere
else on the page instead of adding a new term. The fact that this doesn't
work on Windows makes having it in the description section at all
arguable. I'd rather rewrite it something like:
"On POSIX systems, the resulting data directory, and all of its contents,
will have permissions of 700, though you can use --allow-group-access to
instead get 750. In either case, the effective user running initdb will
become the owner and group for the files created within the data directory."
(I haven't tried to prove this owner:group dynamic, but having 700 or 750
and specifying the alternative does result in the directory having its
permission bits changed during initdb)
Feel free to suggest something if similar wording should be added for
non-POSIX systems.
I intend to try and integrate something like the above to replace the
existing paragraph in the next version.
Thank you for the review!
David J.
P.S. I'm now looking at the very first paragraph to initdb more closely,
not liking "single server instance" all that much and wondering how to fit
in "cluster user" there - possibly by saying something like "...managed by
a single server process, and physical data directory, whose effective user
and owner respectively is called the cluster user. That user must exist
and be used to execute this program."
Then the whole "initdb must be run as..." paragraph can probably just go
away. Moving the commentary about "root", again a non-Windows thing, to
the notes area.
On Tue, Nov 1, 2022 at 6:59 PM David G. Johnston <david.g.johnston@gmail.com>
wrote:
P.S. I'm now looking at the very first paragraph to initdb more closely,
not liking "single server instance" all that much and wondering how to fit
in "cluster user" there - possibly by saying something like "...managed by
a single server process, and physical data directory, whose effective user
and owner respectively is called the cluster user. That user must exist
and be used to execute this program."Then the whole "initdb must be run as..." paragraph can probably just go
away. Moving the commentary about "root", again a non-Windows thing, to
the notes area.
Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider moving some
of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1]https://www.postgresql.org/docs/current/creating-cluster.html.
I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter 19
yet). Also added it to the glossary.
Moved quite a bit of material to notes from the description and options and
expanded upon what had already been said based upon various discussions
I've been part of on the mailing lists.
Decided to call out, in the glossary, the effective equivalence of database
superuser and cluster owner. Which acts as an explanation as to why root
is prohibited to be a cluster owner.
David J.
[1]: https://www.postgresql.org/docs/current/creating-cluster.html
Attachments:
initdb-and-glossary-v2.patchapplication/octet-stream; name=initdb-and-glossary-v2.patchDownload+152-44
On 2022-Nov-02, David G. Johnston wrote:
Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider moving some
of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1].I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter 19
yet). Also added it to the glossary.
Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions. So I've extracted them from
your patch and pushed those. You can already see it at
https://www.postgresql.org/docs/devel/glossary.html
I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself. Some of
these sounded like security considerations rather than part of the
definitions. I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right. That
seems to me a serious deficiency. A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details. Maybe part of these
initdb considerations could be there, too.
Moved quite a bit of material to notes from the description and options and
expanded upon what had already been said based upon various discussions
I've been part of on the mailing lists.
Please rebase.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine." (Julien PUYDT)
On Fri, Nov 18, 2022 at 4:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org>
wrote:
On 2022-Nov-02, David G. Johnston wrote:
Version 2 attached, some significant re-working. Starting to think that
initdb isn't the place for some of this content - in particular the stuff
I'm deciding to move down to the Notes section. Might consider movingsome
of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
nearby...) [1].I settled on "cluster owner" over "cluster user" and made the terminology
consistent throughout initdb and the glossary (haven't looked at chapter19
yet). Also added it to the glossary.
Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions. So I've extracted them from
your patch and pushed those. You can already see it at
https://www.postgresql.org/docs/devel/glossary.html
Agreed on the not quite ready yet, and that the glossary is indeed
self-contained enough to go in by itself at this point. Thank you for
doing that.
I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself. Some of
these sounded like security considerations rather than part of the
definitions. I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right. That
seems to me a serious deficiency. A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details. Maybe part of these
initdb considerations could be there, too.
I'll consider that approach as well as other spots in the documentation on
this next pass.
David J.