initdb / bootstrap design
Hi,
[1]: /messages/by-id/20220216012953.6d7bzmsblqou3ru4@alap3.anarazel.de
To me the division of labor between initdb and bootstrap doesn't make much
sense anymore:
initdb reads postgres.bki, replaces a few tokens, starts postgres in bootstrap
mode, and then painstakenly feeds bootstrap.bki lines to the server.
Given that bootstrap mode parsing is a dedicated parser, only invoked from a
single point, what's the point of initdb doing the preprocessing and then
incurring pipe overhead?
Sure, there's a few tokens that we replace in initdb. As it turns out there's
only two rows that are actually variable. The username of the initial
superuser in pg_authid and the pg_database row for template 1, where encoding,
lc_collate and lc_ctype varies. The rest is all compile time constant
replacements we could do as part of genbki.pl.
It seems we could save a good number of context switches by opening
postgres.bki just before boot_yyparse() in BootstrapModeMain() and having the
parser read it. The pg_authid / pg_database rows we could just do via
explicit insertions in BootstrapModeMain(), provided by commandline args?
Similarly, since the introduction of extensions at the latest, the server
knows how to execute SQL from a file. Why don't we just process
information_schema.sql, system_views.sql et al that way?
If we don't need a dedicated "input" mode feeding boot_yyparse() in bootstrap
mode anymore (because bootstrap mode feeds it from postgres.bki directly), we
likely could avoid the restart between bootstrap and single user mode. Afaics
that only really is needed because we need to send SQL after
bootstrap_template1(). That'd likely be a nice speedup, because we don't need
to write the bootstrap contents from shared buffers to the OS just to read
them back in single user mode.
I don't plan to work on this immediately, but I thought it's worth bringing up
anyway.
Greetings,
Andres Freund
[1]: /messages/by-id/20220216012953.6d7bzmsblqou3ru4@alap3.anarazel.de
On Wed, Feb 16, 2022 at 9:12 AM Andres Freund <andres@anarazel.de> wrote:
To me the division of labor between initdb and bootstrap doesn't make much
sense anymore:
[...]
I don't plan to work on this immediately, but I thought it's worth bringing up
anyway.
Added TODO item "Rationalize division of labor between initdb and bootstrap"
--
John Naylor
EDB: http://www.enterprisedb.com
On 16.02.22 03:12, Andres Freund wrote:
Sure, there's a few tokens that we replace in initdb. As it turns out there's
only two rows that are actually variable. The username of the initial
superuser in pg_authid and the pg_database row for template 1, where encoding,
lc_collate and lc_ctype varies. The rest is all compile time constant
replacements we could do as part of genbki.pl.It seems we could save a good number of context switches by opening
postgres.bki just before boot_yyparse() in BootstrapModeMain() and having the
parser read it. The pg_authid / pg_database rows we could just do via
explicit insertions in BootstrapModeMain(), provided by commandline args?
I think we could do the locale setup by updating the pg_database row of
template1 after bootstrap, as in the attached patch. (The order of
proceedings in the surrounding function might need some refinement in a
final patch.) I suspect we could do the treatment of pg_authid similarly.
Attachments:
0001-Simplify-locale-setup-of-template1-in-initdb.patchtext/plain; charset=UTF-8; name=0001-Simplify-locale-setup-of-template1-in-initdb.patchDownload+8-22
Hi,
On 2022-02-16 11:47:31 +0100, Peter Eisentraut wrote:
On 16.02.22 03:12, Andres Freund wrote:
Sure, there's a few tokens that we replace in initdb. As it turns out there's
only two rows that are actually variable. The username of the initial
superuser in pg_authid and the pg_database row for template 1, where encoding,
lc_collate and lc_ctype varies. The rest is all compile time constant
replacements we could do as part of genbki.pl.It seems we could save a good number of context switches by opening
postgres.bki just before boot_yyparse() in BootstrapModeMain() and having the
parser read it. The pg_authid / pg_database rows we could just do via
explicit insertions in BootstrapModeMain(), provided by commandline args?I think we could do the locale setup by updating the pg_database row of
template1 after bootstrap, as in the attached patch.
Another solution could be to have bootstrap create template0 instead of
template1. I think for template0 it'd more accurate to have a hardcoded C
collation and ascii encoding (which I don't think we actually have?).
I suspect we could do the treatment of pg_authid similarly.
Yea.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Sure, there's a few tokens that we replace in initdb. As it turns out there's
only two rows that are actually variable. The username of the initial
superuser in pg_authid and the pg_database row for template 1, where encoding,
lc_collate and lc_ctype varies. The rest is all compile time constant
replacements we could do as part of genbki.pl.
I remembered the reason why it's done that way: if we replaced those
values during genbki.pl, the contents of postgres.bki would become
architecture-dependent, belying its distribution as a "share" file.
While we don't absolutely have to continue treating postgres.bki
as architecture-independent, I'm skeptical that there's enough win
here to justify a packaging change.
initdb is already plenty fast enough for any plausible production
usage; it's cases like check-world where we wish it were faster.
So I'm thinking what we really ought to pursue is the idea that's
been kicked around more than once of capturing the post-initdb
state of a cluster's files and just doing "cp -a" to duplicate that
later in the test run.
regards, tom lane
Hi,
On 2022-02-16 13:24:41 -0500, Tom Lane wrote:
I remembered the reason why it's done that way: if we replaced those
values during genbki.pl, the contents of postgres.bki would become
architecture-dependent, belying its distribution as a "share" file.
While we don't absolutely have to continue treating postgres.bki
as architecture-independent, I'm skeptical that there's enough win
here to justify a packaging change.
Hm. Architecturally I still would like to move it to be processed server
side. I'd like to eventually get rid of single user mode (but keep bootstrap,
at least for longer).
Seems we could make NAMEDATALEN, FLOAT8PASSBYVAL, ALIGNOF_POINTER,
FLOAT8PASSBYVAL stuff that bootparse knows about? And remove the need for
POSTGRES, ENCODING, LC_COLLATE, LC_CTYPE as discussed already?
initdb is already plenty fast enough for any plausible production
usage; it's cases like check-world where we wish it were faster.
It's not just our own usage though. I've seen it be a noticable time in test
suites of applications using postgres. And that's not really addressable with
the template approach, unless we want to move use of the template database
into initdb itself. I've thought about it, but then we'd need to do a lot more
than if it's just for our own tests.
So I'm thinking what we really ought to pursue is the idea that's
been kicked around more than once of capturing the post-initdb
state of a cluster's files and just doing "cp -a" to duplicate that
later in the test run.
Yea, we should pursue that independently of improving initdb's architecture /
speed. initdb will never be as fast as copying files around.
I kind of got stuck on how to deal with install.pl / vcregress.pl. For make
it's easy enough to create the template during during temp-install. But for
the msvc stuff is less clear when / where to create the template
database. Nearly everyone uses NO_TEMP_INSTALL on windows, because install is
so slow and happens in every test. But right now there's no command to create
the "temp" installation. Probably need something like a 'temp-install' command
for vcregress.pl and then convert the buildfarm to use that.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2022-02-16 13:24:41 -0500, Tom Lane wrote:
I remembered the reason why it's done that way: if we replaced those
values during genbki.pl, the contents of postgres.bki would become
architecture-dependent, belying its distribution as a "share" file.
Hm. Architecturally I still would like to move it to be processed server
side. I'd like to eventually get rid of single user mode (but keep bootstrap,
at least for longer).
Seems we could make NAMEDATALEN, FLOAT8PASSBYVAL, ALIGNOF_POINTER,
FLOAT8PASSBYVAL stuff that bootparse knows about? And remove the need for
POSTGRES, ENCODING, LC_COLLATE, LC_CTYPE as discussed already?
Yeah, I have no objection to doing it that way. It should be possible
to do those substitutions on a per-field basis, which'd be cleaner than
what initdb does now ...
regards, tom lane
On Wed, Feb 16, 2022 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
initdb is already plenty fast enough for any plausible production
usage; it's cases like check-world where we wish it were faster.It's not just our own usage though. I've seen it be a noticable time in test
suites of applications using postgres.
I'd just like to second this point.
I was working on an EDB proprietary software project for a while
which, because of the nature of what it did, ran initdb frequently in
its test suite. And it was unbelievably painful. The test suite just
took forever. Fortunately, it always ran initdb with the same options,
so somebody invented a mechanism for doing one initdb and saving the
results someplace and just copying them every time, and it made a huge
difference. Before that experience, I probably would have agreed with
the idea that there was no need at all for initdb to be any faster
than it is already. But, like, what if we'd been trying to run initdb
with different options for different tests, the way the core code
does? That seems like an entirely plausible thing to want to do, and
then caching becomes a real pain.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2/17/22 10:36, Robert Haas wrote:
On Wed, Feb 16, 2022 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
initdb is already plenty fast enough for any plausible production
usage; it's cases like check-world where we wish it were faster.It's not just our own usage though. I've seen it be a noticable time in test
suites of applications using postgres.I'd just like to second this point.
I was working on an EDB proprietary software project for a while
which, because of the nature of what it did, ran initdb frequently in
its test suite. And it was unbelievably painful. The test suite just
took forever. Fortunately, it always ran initdb with the same options,
so somebody invented a mechanism for doing one initdb and saving the
results someplace and just copying them every time, and it made a huge
difference. Before that experience, I probably would have agreed with
the idea that there was no need at all for initdb to be any faster
than it is already. But, like, what if we'd been trying to run initdb
with different options for different tests, the way the core code
does? That seems like an entirely plausible thing to want to do, and
then caching becomes a real pain.
Indeed. When initdb.c was written the testing landscape was very
different both for the community and for projects that used Postgres. So
we need to catch up.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Here's an initial patch that gets rid of the need for initdb to
change the contents of postgres.bki before feeding it to the
bootstrap backend. After this, we could look at having the
backend read the file directly.
I don't really detect any speed change from getting rid of initdb's
string manipulations, but TBH I was not expecting any. On my machine,
that was lost in the noise already, according to perf(1).
regards, tom lane
Attachments:
feed-postgres.bki-to-backend-unmodified-1.patchtext/x-diff; charset=us-ascii; name=feed-postgres.bki-to-backend-unmodified-1.patchDownload+72-64
Hi,
On 2022-02-19 18:35:18 -0500, Tom Lane wrote:
Here's an initial patch that gets rid of the need for initdb to
change the contents of postgres.bki before feeding it to the
bootstrap backend. After this, we could look at having the
backend read the file directly.
Cool!
I don't really detect any speed change from getting rid of initdb's
string manipulations, but TBH I was not expecting any. On my machine,
that was lost in the noise already, according to perf(1).
Yea, I'd not expect much either. The slowdown around the string stuff that I
did see was on windows.
I would however expect some, but not huge, speedup by getting rid of the
line-by-line reading/writing of postgres.bki, even without moving the handling
to the backend.
A quick way to prototype the moving the handlign to the backend would be to
just call postgres with input redirection from postgres.bki...
+ /* + * Ideally we'd change the superuser name with ALTER USER, but the backend + * will reject that with "session user cannot be renamed", so we must + * cheat. (In any case, we'd need a function to escape an identifier, not + * a string literal.) Likewise, we can't change template1's + * locale/encoding without cheating. + */ + static char *final_details[] = { + "UPDATE pg_authid SET rolname = E'SUPERUSER_NAME' WHERE rolname = 'POSTGRES';\n\n", + "UPDATE pg_database SET encoding = E'ENCODING', datcollate = E'LC_COLLATE', datctype = E'LC_CTYPE';\n\n", + NULL + }; + + detail_lines = replace_token(final_details, "SUPERUSER_NAME", + escape_quotes(username)); + detail_lines = replace_token(detail_lines, "ENCODING", + encodingid_to_string(encodingid)); + detail_lines = replace_token(detail_lines, "LC_COLLATE", + escape_quotes(lc_collate)); + detail_lines = replace_token(detail_lines, "LC_CTYPE", + escape_quotes(lc_ctype));
Hm, wouldn't it be less code to just use printf?
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
A quick way to prototype the moving the handlign to the backend would be to
just call postgres with input redirection from postgres.bki...
Hmm. I was thinking of inventing an include-file command in the
BKI language, and making initdb just send an INCLUDE command.
That's arguably overkill for the immediate need, but it looks like it
requires just a few lines of code (flex provides pretty much all of the
infrastructure already), and maybe we'd find another use for it later.
However, redirection does sound like a very easy answer ...
Hm, wouldn't it be less code to just use printf?
Meh --- it'd be different from the way we do it in the rest
of initdb, and it would not be "less code". Maybe it'd run
a shade faster, but I refuse to believe that that'd be
enough to matter.
regards, tom lane
Hi,
On February 19, 2022 4:39:38 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
A quick way to prototype the moving the handlign to the backend would be to
just call postgres with input redirection from postgres.bki...Hmm. I was thinking of inventing an include-file command in the
BKI language, and making initdb just send an INCLUDE command.
That's arguably overkill for the immediate need, but it looks like it
requires just a few lines of code (flex provides pretty much all of the
infrastructure already), and maybe we'd find another use for it later.However, redirection does sound like a very easy answer ...
Medium term I'd rather do neither, because I'd like to avoid the restart in-between bootstrap and the various sql files. But short term redirection redirection might be good enough - it does mostly work on windows I think ...
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
I wrote:
However, redirection does sound like a very easy answer ...
I tried it like that (full patch attached) and the results are
intensely disappointing. On my Mac laptop, the time needed for
50 iterations of initdb drops from 16.8 sec to 16.75 sec.
On my RHEL8 workstation, the change is actually in the wrong
direction, from 18.75s to 18.9s. I conclude that the time
spent on postgres.bki data transfer is so far down in the noise
as to be overwhelmed by irrelevancies. (Which, in fact, is
what perf told me before I started --- but I'd hoped that the
number of system calls would diminish noticeably. Seems not.)
Not sure that this is worth pursuing any further.
regards, tom lane
Attachments:
make-backend-read-postgres.bki-directly-1.patchtext/x-diff; charset=us-ascii; name=make-backend-read-postgres.bki-directly-1.patchDownload+89-81
Hi,
On 2022-02-19 20:46:26 -0500, Tom Lane wrote:
I tried it like that (full patch attached) and the results are intensely
disappointing. On my Mac laptop, the time needed for 50 iterations of
initdb drops from 16.8 sec to 16.75 sec.
Hm. I'd hoped for at least a little bit bigger win. But I think it enables
more, see below:
Not sure that this is worth pursuing any further.
I experimented with moving all the bootstrapping into --boot mode and got it
working. Albeit definitely with a few hacks (more below).
While I had hoped for a bit more of a win, it's IMO a nice improvement.
Executing 10 initdb -N --wal-segsize 1 in a loop:
HEAD:
assert:
8.06user 1.17system 0:09.25elapsed 99%CPU (0avgtext+0avgdata 91724maxresident)k
0inputs+549280outputs (40major+99824minor)pagefaults 0swaps
opt:
2.89user 0.99system 0:04.81elapsed 80%CPU (0avgtext+0avgdata 88864maxresident)k
0inputs+549280outputs (40major+99792minor)pagefaults 0swaps
default to lz4:
assert:
7.61user 1.03system 0:08.69elapsed 99%CPU (0avgtext+0avgdata 91508maxresident)k
0inputs+546400outputs (42major+99551minor)pagefaults 0swaps
opt:
2.55user 0.94system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 88816maxresident)k
0inputs+546400outputs (40major+99551minor)pagefaults 0swaps
bootstrap replace:
assert:
7.42user 1.00system 0:08.52elapsed 98%CPU (0avgtext+0avgdata 91656maxresident)k
0inputs+546400outputs (40major+97737minor)pagefaults 0swaps
opt:
2.49user 0.98system 0:03.49elapsed 99%CPU (0avgtext+0avgdata 88700maxresident)k
0inputs+546400outputs (40major+97728minor)pagefaults 0swaps
everything in bootstrap:
assert:
6.31user 0.94system 0:07.35elapsed 98%CPU (0avgtext+0avgdata 97812maxresident)k
0inputs+547360outputs (30major+88617minor)pagefaults 0swaps
opt:
2.42user 0.85system 0:03.28elapsed 99%CPU (0avgtext+0avgdata 94572maxresident)k
0inputs+547360outputs (30major+83712minor)pagefaults 0swaps
optimize WAL in bootstrap:
assert:
6.26user 0.96system 0:07.29elapsed 99%CPU (0avgtext+0avgdata 97844maxresident)k
0inputs+547360outputs (30major+88586minor)pagefaults 0swaps
opt:
2.43user 0.80system 0:03.24elapsed 99%CPU (0avgtext+0avgdata 94436maxresident)k
0inputs+547360outputs (30major+83664minor)pagefaults 0swaps
remote isatty in bootstrap:
assert:
6.15user 0.83system 0:06.99elapsed 99%CPU (0avgtext+0avgdata 97832maxresident)k
0inputs+465120outputs (30major+88559minor)pagefaults 0swaps
opt:
2.28user 0.85system 0:03.14elapsed 99%CPU (0avgtext+0avgdata 94604maxresident)k
0inputs+465120outputs (30major+83728minor)pagefaults 0swaps
That's IMO not bad.
On windows I see a higher gains, which makes sense, because filesystem IO is
slower. Freebsd as well, but the variance is oddly high, so I might be doing
something wrong.
The main reason I like this however isn't the speedup itself, but that after
this initdb doesn't depend on single user mode at all anymore.
About the prototype:
- Most of the bootstrap SQL is executed from bootstrap.c itself. But some
still comes from the client. E.g. password, a few information_schema
details and the database / authid changes.
- To execute the sql I mostly used extension.c's
read_whole_file()/execute_sql_string(). But VACUUM, CREATE DATABASE require
all the transactional hacks in portal.c etc. So I wrapped
exec_simple_query() for that phase.
Might be better to just call vacuum.c / database.c directly.
- for indexed relcache access to work the phase of
RelationCacheInitializePhase3() that's initially skipped needs to be
executed. I hacked that up by adding a RelationCacheInitializePhase3b() that
bootstrap.c can call, but that's obviously too ugly to live.
- InvalidateSystemCaches() is needed after bki processing. Otherwise I see an
"row is too big:" error. Didn't investigate yet.
- I definitely removed some validation that we'd probably want. But that seems
something to care about later...
- 0004 prevents a fair bit of WAL from being written. While XLogInsert did
some of that, it didn't block FPIs, which obviously are bulky. This reduces
WAL from ~5MB to ~100kB.
There's quite a bit of further speedup potential:
- One bottleneck, particularly in optimized mode, is the handling of huge node
trees for views. strToNode() and nodeRead() are > 10% alone
- Enabling index access sometime during the postgres.bki processing would make
invalidation handling for subsequent indexes faster. Or maybe we can disable
a few more invalidations. Inval processing is >10%
- more than 10% (assert) / 7% (optimized) is spent in
compute_scalar_stats()->qsort_arg(). Something seems off with that to me.
Completely crazy?
Greetings,
Andres Freund
Attachments:
v1-0001-Set-default_toast_compression-lz4-if-available.patchtext/x-diff; charset=us-asciiDownload+13-2
v1-0002-initdb-move-token-replacing-in-postgres.bki-to-ba.patchtext/x-diff; charset=us-asciiDownload+72-65
v1-0003-initdb-perform-everything-during-boot-mostly-in-b.patchtext/x-diff; charset=us-asciiDownload+535-516
v1-0004-initdb-Optimize-WAL-writing-during-initdb.patchtext/x-diff; charset=us-asciiDownload+32-12
v1-0005-initdb-call-isatty-only-once-in-bootparse.y.patchtext/x-diff; charset=us-asciiDownload+7-2
On 20.02.22 01:39, Tom Lane wrote:
Hm, wouldn't it be less code to just use printf?
Meh --- it'd be different from the way we do it in the rest
of initdb, and it would not be "less code". Maybe it'd run
a shade faster, but I refuse to believe that that'd be
enough to matter.
There is a PG_CMD_PRINTF() that is used for that purpose.