Move bki file pre-processing from initdb to bootstrap
Hi All,
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.
The flow of bki file processing will be as follows:
- In initdb gather the values used to replace the tokens in the bki file.
- Pass these values into postgres bootstrap startup using '-i' option
as key-value pairs.
- In bootstrap open the bki file (the bki file name was received as a
parameter).
- During the parsing of the bki file, replace the tokens received as
parameters with their values.
Related discussion can be found here:
/messages/by-id/20220216021219.ygzrtb3hd5bn7olz@alap3.anarazel.de
Note: Currently the patch breaks on windows due to placement of extra
quotes when passing parameters (Thanks to Thomas Munro for helping me
find that). Will follow up with v2 fixing the windows issues on
passing the parameters and format fixes.
Please review and provide feedback.
--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]
Attachments:
v1-0001-Move-the-pre-processing-for-tokens-in-bki-file-fr.patchapplication/octet-stream; name=v1-0001-Move-the-pre-processing-for-tokens-in-bki-file-fr.patchDownload+188-78
Krishnakumar R <kksrcv001@gmail.com> writes:
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.
You haven't provided any iota of evidence why this would be an
improvement.
regards, tom lane
On 01.09.23 14:37, Tom Lane wrote:
Krishnakumar R <kksrcv001@gmail.com> writes:
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.You haven't provided any iota of evidence why this would be an
improvement.
I had played with similar ideas in the past, because it would shave some
time of initdb, which would accumulate noticeably over a full test run.
But now with the initdb caching mechanism, I wonder whether this is
still needed.
Hi,
On 2023-09-01 14:59:57 +0200, Peter Eisentraut wrote:
On 01.09.23 14:37, Tom Lane wrote:
Krishnakumar R <kksrcv001@gmail.com> writes:
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.You haven't provided any iota of evidence why this would be an
improvement.I had played with similar ideas in the past, because it would shave some
time of initdb, which would accumulate noticeably over a full test run.But now with the initdb caching mechanism, I wonder whether this is still
needed.
I think it's still relevant - it's not just our own test infrastructure that
runs a lot of initdbs, it's also lots of projects using postgres.
The main reason I'd like to move this infrastructure to the backend is that I
really would like to get rid of single user mode. It adds complications all
over, it's barely tested, pointlessly hard to use. I wrote a rough prototype
of that a while back:
/messages/by-id/20220220214439.bhc35hhbaub6dush@alap3.anarazel.de
Greetings,
Andres Freund
On 01.09.23 10:01, Krishnakumar R wrote:
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.
I did some rough performance tests on this. I get about a 10%
improvement on initdb run time, so this appears to have merit.
I wonder whether we can reduce the number of symbols that we need this
treatment for.
For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER,
FLOAT8PASSBYVAL, these are known at build time, so we could have
genbki.pl substitute them at build time.
The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder
whether we can eliminate the need for them. Right now, these are only
used in the bki entry for the template1 database. How about initdb
creates template0 first, with hardcoded default encoding, collation,
etc., and then creates template1 from that, using the normal CREATE
DATABASE command with the appropriate options. Or initdb could just run
an UPDATE on pg_database to put the right settings in place.
I don't like this part so much, because it adds like 4 more places each
of these variables is mentioned, which increases the mental and testing
overhead for dealing with these features (which are an area of active
development).
In general, it would be good if this could be factored a bit more so
each variable doesn't have to be hardcoded in so many places.
Some more detailed comments on the code:
+ boot_yylval.str = pstrdup(yytext);
+ sprintf(boot_yylval.str, "%d", NAMEDATALEN);
This is weird. You are first assigning the text and then overwriting it
with the numeric value?
You can also use boot_yylval.ival for storing numbers.
+ if (bootp_null(ebootp, ebootp->username)) return
NULLVAL;
Add proper line breaks in the code.
+bool bootp_null(extra_bootstrap_params *e, char *s)
Add a comment what this function is supposed to do.
This function could be static.
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1)
You should use an option letter that isn't already in use in one of the
other modes of "postgres". We try to keep those consistent.
New options should be added to the --help output (usage() in main.c).
+ elog(INFO, "Open bki file %s\n", bki_file);
+ boot_yyin = fopen(bki_file, "r");
Why is this needed? It already reads the bki file from stdin?
+ printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i
%s=%s,%s=%s,%s=%s,"
+ "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c",
+ backend_exec,
+ wal_segment_size_mb * (1024 * 1024),
+ boot_options, extra_options,
+ data_checksums ? "-k" : "",
+ debug ? "-d 5" : "",
This appears to undo some of the changes done in cccdbc5d95.
+#define BOOT_LC_COLLATE "lc_collate"
+#define BOOT_LC_CTYPE "lc_ctype"
+#define BOOT_ICU_LOCALE "icu_locale"
etc. This doesn't look particularly useful. You can just use the
strings directly.
Thank you, Peter, Andres and Tom for your comments and thoughts.
Hi Peter,
For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER,
FLOAT8PASSBYVAL, these are known at build time, so we could have
genbki.pl substitute them at build time.
I have modified the patch to use genbki to generate these ones during
build time.
The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder
whether we can eliminate the need for them. Right now, these are only
used in the bki entry for the template1 database. How about initdb
creates template0 first, with hardcoded default encoding, collation,
etc., and then creates template1 from that, using the normal CREATE
DATABASE command with the appropriate options. Or initdb could just run
an UPDATE on pg_database to put the right settings in place.
Using a combination of this suggestion and discussions Andres pointed
to in this thread, updated the patch to add placeholder values first
into template1 and then do UPDATEs in initdb itself.
You should use an option letter that isn't already in use in one of the
other modes of "postgres". We try to keep those consistent.New options should be added to the --help output (usage() in main.c).
Used a -b option under bootstrap mode and added help.
elog(INFO, "Open bki file %s\n", bki_file);
+ boot_yyin = fopen(bki_file, "r");Why is this needed? It already reads the bki file from stdin?
We no longer open the bki file in initdb and pass to postgres to parse
from stdin, instead we open the bki file directly in bootstrap and
pass the file stream to the parser. Hence the need to switch the yyin.
Have added a comment in the commit logs to capture this.
The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.
Please let me know your thoughts and review comments.
--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]
Show quoted text
On Tue, Sep 19, 2023 at 3:18 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 01.09.23 10:01, Krishnakumar R wrote:
This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.I did some rough performance tests on this. I get about a 10%
improvement on initdb run time, so this appears to have merit.I wonder whether we can reduce the number of symbols that we need this
treatment for.For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER,
FLOAT8PASSBYVAL, these are known at build time, so we could have
genbki.pl substitute them at build time.The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder
whether we can eliminate the need for them. Right now, these are only
used in the bki entry for the template1 database. How about initdb
creates template0 first, with hardcoded default encoding, collation,
etc., and then creates template1 from that, using the normal CREATE
DATABASE command with the appropriate options. Or initdb could just run
an UPDATE on pg_database to put the right settings in place.I don't like this part so much, because it adds like 4 more places each
of these variables is mentioned, which increases the mental and testing
overhead for dealing with these features (which are an area of active
development).In general, it would be good if this could be factored a bit more so
each variable doesn't have to be hardcoded in so many places.Some more detailed comments on the code:
+ boot_yylval.str = pstrdup(yytext); + sprintf(boot_yylval.str, "%d", NAMEDATALEN);This is weird. You are first assigning the text and then overwriting it
with the numeric value?You can also use boot_yylval.ival for storing numbers.
+ if (bootp_null(ebootp, ebootp->username)) return
NULLVAL;Add proper line breaks in the code.
+bool bootp_null(extra_bootstrap_params *e, char *s)
Add a comment what this function is supposed to do.
This function could be static.
+ while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1)
You should use an option letter that isn't already in use in one of the
other modes of "postgres". We try to keep those consistent.New options should be added to the --help output (usage() in main.c).
+ elog(INFO, "Open bki file %s\n", bki_file); + boot_yyin = fopen(bki_file, "r");Why is this needed? It already reads the bki file from stdin?
+ printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i %s=%s,%s=%s,%s=%s," + "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c", + backend_exec, + wal_segment_size_mb * (1024 * 1024), + boot_options, extra_options, + data_checksums ? "-k" : "", + debug ? "-d 5" : "",This appears to undo some of the changes done in cccdbc5d95.
+#define BOOT_LC_COLLATE "lc_collate" +#define BOOT_LC_CTYPE "lc_ctype" +#define BOOT_ICU_LOCALE "icu_locale"etc. This doesn't look particularly useful. You can just use the
strings directly.
Attachments:
v2-0001-Remove-BKI-file-token-pre-processing-logic-from-i.patchapplication/x-patch; name=v2-0001-Remove-BKI-file-token-pre-processing-logic-from-i.patchDownload+139-88
The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.
With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.
--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]
Attachments:
v3-0001-Remove-BKI-file-token-pre-processing-logic-from-i.patchapplication/octet-stream; name=v3-0001-Remove-BKI-file-token-pre-processing-logic-from-i.patchDownload+162-89
On 06.10.23 02:24, Krishnakumar R wrote:
elog(INFO, "Open bki file %s\n", bki_file);
+ boot_yyin = fopen(bki_file, "r");Why is this needed? It already reads the bki file from stdin?
We no longer open the bki file in initdb and pass to postgres to parse
from stdin, instead we open the bki file directly in bootstrap and
pass the file stream to the parser. Hence the need to switch the yyin.
Have added a comment in the commit logs to capture this.
Why this change? I mean, there is nothing wrong with it, but I don't
follow how changing from reading from stdin to reading from a named file
is related to moving the parameter substitution from initdb to the backend.
One effect of this is that we would now have two different ways initdb
interacts with the backend. In bootstrap mode, it reads from a named
file, and the second run (the one that loads the system views etc.)
reads from stdin. It's already confusing enough, so any further
divergence should be adequately explained.
On 17.10.23 03:32, Krishnakumar R wrote:
The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.
I suggest that this patch set be split up into three incremental parts:
1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.
Each of these topics really needs a separate detailed consideration.
Thank you for review, Peter.
Makes sense on the split part. Was starting to think in same lines, at the
end of last iteration. Will come back shortly.
On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org>
wrote:
Show quoted text
On 17.10.23 03:32, Krishnakumar R wrote:
The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.I suggest that this patch set be split up into three incremental parts:
1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.Each of these topics really needs a separate detailed consideration.
On Sat, 11 Nov 2023 at 00:03, Krishnakumar R <kksrcv001@gmail.com> wrote:
Thank you for review, Peter.
Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back shortly.
On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.10.23 03:32, Krishnakumar R wrote:
The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.I suggest that this patch set be split up into three incremental parts:
1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.Each of these topics really needs a separate detailed consideration.
The patch which you submitted has been awaiting your attention for
quite some time now. As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible. Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.
Regards,
Vignesh