GenBKI emits useless open;close for catalogs without rows
Hi,
Whilst looking at PostgreSQL's bootstrapping process, I noticed that
postgres.bki contains quite a few occurrances of the pattern "open
$catname; close $catname".
I suppose this pattern isn't too expensive, but according to my
limited research a combined open+close cycle doens't do anything
meaningful, so it does waste some CPU cycles in the process.
The attached patch 1 removes the occurances of those combined
open/close statements in postgresql.bki. Locally it passes
check-world, so I assume that opening and closing a table is indeed
not required for initializing a data-less catalog during
bootstrapping.
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Attachments:
0002-Remove-the-bki-close-command.patchapplication/octet-stream; name=0002-Remove-the-bki-close-command.patchDownload
From 65b21f740fc09dcf80ed60a77a3a64d85605a203 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 1 Sep 2023 19:23:17 +0200
Subject: [PATCH 2/2] Remove the bki 'close' command.
Bootstrapping works without manually closing relations because they are
auto-closed when a different relation is opened, and apparently leaking
a single opened relation at the end of the runway isn't a problem in the
currently available tests.
---
src/backend/bootstrap/bootparse.y | 13 +------------
src/backend/bootstrap/bootscanner.l | 2 --
src/backend/catalog/genbki.pl | 12 ------------
3 files changed, 1 insertion(+), 26 deletions(-)
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 81a1b7bfec..e405ed1c7b 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -99,7 +99,7 @@ static int num_columns_read = 0;
/* NULLVAL is a reserved keyword */
%token NULLVAL
/* All the rest are unreserved, and should be handled in boot_ident! */
-%token <kw> OPEN XCLOSE XCREATE INSERT_TUPLE
+%token <kw> OPEN XCREATE INSERT_TUPLE
%token <kw> XDECLARE INDEX ON USING XBUILD INDICES UNIQUE XTOAST
%token <kw> OBJ_ID XBOOTSTRAP XSHARED_RELATION XROWTYPE_OID
%token <kw> XFORCE XNOT XNULL
@@ -120,7 +120,6 @@ Boot_Queries:
Boot_Query :
Boot_OpenStmt
- | Boot_CloseStmt
| Boot_CreateStmt
| Boot_InsertStmt
| Boot_DeclareIndexStmt
@@ -138,15 +137,6 @@ Boot_OpenStmt:
}
;
-Boot_CloseStmt:
- XCLOSE boot_ident
- {
- do_start();
- closerel($2);
- do_end();
- }
- ;
-
Boot_CreateStmt:
XCREATE boot_ident oidspec optbootstrap optsharedrelation optrowtypeoid LPAREN
{
@@ -467,7 +457,6 @@ boot_column_val:
boot_ident:
ID { $$ = $1; }
| OPEN { $$ = pstrdup($1); }
- | XCLOSE { $$ = pstrdup($1); }
| XCREATE { $$ = pstrdup($1); }
| INSERT_TUPLE { $$ = pstrdup($1); }
| XDECLARE { $$ = pstrdup($1); }
diff --git a/src/backend/bootstrap/bootscanner.l b/src/backend/bootstrap/bootscanner.l
index 6a9d4193f2..99cb7861c9 100644
--- a/src/backend/bootstrap/bootscanner.l
+++ b/src/backend/bootstrap/bootscanner.l
@@ -73,8 +73,6 @@ sid \'([^']|\'\')*\'
open { boot_yylval.kw = "open"; return OPEN; }
-close { boot_yylval.kw = "close"; return XCLOSE; }
-
create { boot_yylval.kw = "create"; return XCREATE; }
OID { boot_yylval.kw = "OID"; return OBJ_ID; }
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 67d63864b3..814d945cb1 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -549,8 +549,6 @@ EOM
{
print $def $line;
}
-
- my $opened = 0;
# Open it, unless it's a bootstrap catalog (create bootstrap does this
# automatically)
@@ -559,13 +557,8 @@ EOM
if (($catalog_data{$catname} || 0) > 0 || $catname eq 'pg_attribute')
{
print $bki "open $catname\n";
- $opened = 1;
}
}
- else
- {
- $opened = 1;
- }
# For pg_attribute.h, we generate data entries ourselves.
if ($catname eq 'pg_attribute')
@@ -681,11 +674,6 @@ EOM
}
}
- if ($opened eq 1)
- {
- print $bki "close $catname\n";
- }
-
printf $def "\n#endif\t\t\t\t\t\t\t/* %s_D_H */\n", uc $catname;
# Close and rename definition header
--
2.40.1
0001-Stop-emitting-open-nodata-close-patterns-in-genbki.p.patchapplication/octet-stream; name=0001-Stop-emitting-open-nodata-close-patterns-in-genbki.p.patchDownload
From 4410ff8859e4239467ba29bda8ded0fd4603d175 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Wed, 11 Jan 2023 02:13:04 +0100
Subject: [PATCH 1/2] Stop emitting open/nodata/close patterns in genbki.pl
Although opening and immediately closing the relation is not that
expensive, it's still cheaper to not touch it at all.
---
src/backend/catalog/genbki.pl | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 4a7205472c..67d63864b3 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -549,12 +549,22 @@ EOM
{
print $def $line;
}
+
+ my $opened = 0;
# Open it, unless it's a bootstrap catalog (create bootstrap does this
# automatically)
if (!$catalog->{bootstrap})
{
- print $bki "open $catname\n";
+ if (($catalog_data{$catname} || 0) > 0 || $catname eq 'pg_attribute')
+ {
+ print $bki "open $catname\n";
+ $opened = 1;
+ }
+ }
+ else
+ {
+ $opened = 1;
}
# For pg_attribute.h, we generate data entries ourselves.
@@ -671,7 +681,11 @@ EOM
}
}
- print $bki "close $catname\n";
+ if ($opened eq 1)
+ {
+ print $bki "close $catname\n";
+ }
+
printf $def "\n#endif\t\t\t\t\t\t\t/* %s_D_H */\n", uc $catname;
# Close and rename definition header
--
2.40.1
On 2023-Sep-01, Matthias van de Meent wrote:
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.
Hmm, what happens with the last relation in the bootstrap process? Is
closerel() called via some other path for that one?
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)
On Fri, 1 Sept 2023 at 19:43, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Sep-01, Matthias van de Meent wrote:
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.Hmm, what happens with the last relation in the bootstrap process? Is
closerel() called via some other path for that one?
There is a final cleanup() call that closes the last open boot_reldesc
relation (if any) at the end of BootstrapModeMain, after boot_yyparse
has completed and its changes have been committed.
- Matthias
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Sep-01, Matthias van de Meent wrote:
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.
Hmm, what happens with the last relation in the bootstrap process? Is
closerel() called via some other path for that one?
Taking a quick census of existing closerel() callers: there is
cleanup() in bootstrap.c, but it's called uncomfortably late
and outside any transaction, so I misdoubt that it works
properly if asked to actually shoulder any responsibility.
(A little code reshuffling could fix that.)
There are also a couple of low-level elog warnings in CREATE
that would likely get triggered, though I suppose we could just
remove those elogs.
I guess my reaction to this patch is "why bother?". It seems
unlikely to yield any measurable benefit, though of course
that guess could be wrong.
regards, tom lane
On Fri, 1 Sept 2023 at 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Sep-01, Matthias van de Meent wrote:
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.Hmm, what happens with the last relation in the bootstrap process? Is
closerel() called via some other path for that one?Taking a quick census of existing closerel() callers: there is
cleanup() in bootstrap.c, but it's called uncomfortably late
and outside any transaction, so I misdoubt that it works
properly if asked to actually shoulder any responsibility.
(A little code reshuffling could fix that.)
There are also a couple of low-level elog warnings in CREATE
that would likely get triggered, though I suppose we could just
remove those elogs.
Yes, that should be easy to fix.
I guess my reaction to this patch is "why bother?". It seems
unlikely to yield any measurable benefit, though of course
that guess could be wrong.
There is a small but measurable decrease in size of the generated bki
(2kb with both patches, on an initial 945kB), and there is some
related code that can be eliminated. If that's not worth bothering,
then I can drop the patch. Otherwise, I can update the patch to do the
cleanup that was within the transaction boundaries at the end of
boot_yyparse.
If decreasing the size of postgres.bki is not worth the effort, I'll
drop any effort on doing so, but considering that it is about 1MB of
our uncompressed distributables, I'd say decreases in size are worth
the effort, most of the time.
Kind regards,
Matthias van de Meent
On Tue, 12 Sept 2023 at 17:51, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Fri, 1 Sept 2023 at 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
On 2023-Sep-01, Matthias van de Meent wrote:
A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.Hmm, what happens with the last relation in the bootstrap process? Is
closerel() called via some other path for that one?Taking a quick census of existing closerel() callers: there is
cleanup() in bootstrap.c, but it's called uncomfortably late
and outside any transaction, so I misdoubt that it works
properly if asked to actually shoulder any responsibility.
(A little code reshuffling could fix that.)
There are also a couple of low-level elog warnings in CREATE
that would likely get triggered, though I suppose we could just
remove those elogs.Yes, that should be easy to fix.
I guess my reaction to this patch is "why bother?". It seems
unlikely to yield any measurable benefit, though of course
that guess could be wrong.There is a small but measurable decrease in size of the generated bki
(2kb with both patches, on an initial 945kB), and there is some
related code that can be eliminated. If that's not worth bothering,
then I can drop the patch. Otherwise, I can update the patch to do the
cleanup that was within the transaction boundaries at the end of
boot_yyparse.If decreasing the size of postgres.bki is not worth the effort, I'll
drop any effort on doing so, but considering that it is about 1MB of
our uncompressed distributables, I'd say decreases in size are worth
the effort, most of the time.
With the attached patch I've see a significant decrease in the size of
postgres.bki of about 25%, and a likely related decrease in wall clock
time spent in the bootstrap transaction: with timestamp logs inserted
around the boot_yyparse() transaction the measured time went from
around 49 ms on master to around 45 ms patched. In the grand scheme of
initdb that might not be a lot of time (initdb takes about 73ms
locally with syncing disabled) but it is a nice gain in performance.
Comparison:
master @ 9c13b681
$ du -b pg_install/share/postgres.bki
945220
$ initdb --no-instructions --auth=md5 --pwfile pwfile -N -D ~/test-dbinit/
[...]
2023-09-16 02:22:57.339 CEST [10422] LOG: Finished bootstrapping:
to_start: 10 ms, transaction: 49 ms, finishing: 1 ms, total: 59 ms
[...]
patched
$ du -b pg_install/share/postgres.bki
702574
$ initdb --no-instructions --auth=md5 --pwfile pwfile -N -D ~/test-dbinit/
[...]
2023-09-16 02:25:57.664 CEST [15645] LOG: Finished bootstrapping:
to_start: 10 ms, transaction: 45 ms, finishing: 1 ms, total: 54 ms
[...]
Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.
There are other potential avenues for further reducing the bki size,
e.g. through using smaller generated OIDs (reducing the number of
characters used per OID), applying RLE on sequential NULLs (there are
3k+ occurances of /( __){2,10}/ in the generated bki file remaining),
and other tricks, but several of those are likely to be detrimental to
the readability and manual verifiability of the bki.
Kind regards,
Matthias van de Meent
Attachments:
v2-0001-Update-BKI-syntax-bootstrap-performance.patchapplication/octet-stream; name=v2-0001-Update-BKI-syntax-bootstrap-performance.patchDownload
From 3d4683d2682dc3bdcef532489e851ad16ec9262d Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 15 Sep 2023 22:25:40 +0200
Subject: [PATCH v2] Update BKI syntax, bootstrap performance
- NULL values are 56% smaller (from 7 bytes to 3 bytes each)
- NULL-only postfixes are not emitted into the BKI file
- Inserts are batched into inserts of up to 1000 rows per catalog, reducing overhead of the 'INSERT' statement
- The insert tuple descriptor is reused across tuple insertions, reducing insert overhead
- OPEN/CLOSE are now implicit and depend on context: no need to manually open/close the relation in the bki.
- Removed some wasteful spaces from the generated file.
---
doc/src/sgml/bki.sgml | 13 ++---
src/backend/bootstrap/bootparse.y | 73 ++++++++++++++++----------
src/backend/bootstrap/bootscanner.l | 5 +-
src/backend/bootstrap/bootstrap.c | 13 +++--
src/backend/catalog/genbki.pl | 80 +++++++++++++++++++++++------
src/include/bootstrap/bootstrap.h | 1 +
6 files changed, 127 insertions(+), 58 deletions(-)
diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index f71644e398..5c057d0f8a 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -247,9 +247,9 @@
<listitem>
<para>
- Null values are represented by <literal>_null_</literal>.
- (Note that there is no way to create a value that is just that
- string.)
+ Null values are represented by <literal>_null_</literal> and
+ <literal>__</literal>. (Note that there is no way to create a value that
+ is just either of these strings.)
</para>
</listitem>
@@ -1068,11 +1068,8 @@ $ perl rewrite_dat_with_prokind.pl pg_proc.dat
of type <type>oid</type>, <type>int4</type> and <type>text</type>,
respectively, and insert two rows into the table:
<programlisting>
-create test_table 420 (oid = oid, cola = int4, colb = text)
-open test_table
-insert ( 421 1 'value 1' )
-insert ( 422 2 _null_ )
-close test_table
+create test_table 420 open (oid = oid, cola = int4, colb = text)
+insert ( 421 1 'value 1' ), ( 422 2 _null_ )
</programlisting>
</para>
</sect1>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 81a1b7bfec..c911e26018 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -62,6 +62,7 @@ do_end(void)
/* Reclaim memory allocated while processing this line */
MemoryContextSwitchTo(CurTransactionContext);
MemoryContextReset(per_line_ctx);
+ tupDesc = NULL;
CHECK_FOR_INTERRUPTS(); /* allow SIGINT to kill bootstrap run */
if (isatty(0))
{
@@ -91,7 +92,7 @@ static int num_columns_read = 0;
%type <list> boot_index_params
%type <ielem> boot_index_param
%type <str> boot_ident
-%type <ival> optbootstrap optsharedrelation boot_column_nullness
+%type <ival> optbootstrap optsharedrelation optopen boot_column_nullness
%type <oidval> oidspec optrowtypeoid
%token <str> ID
@@ -110,6 +111,12 @@ static int num_columns_read = 0;
TopLevel:
Boot_Queries
+ {
+ if (boot_reldesc)
+ {
+ closerel(NULL);
+ }
+ }
|
;
@@ -119,9 +126,7 @@ Boot_Queries:
;
Boot_Query :
- Boot_OpenStmt
- | Boot_CloseStmt
- | Boot_CreateStmt
+ Boot_CreateStmt
| Boot_InsertStmt
| Boot_DeclareIndexStmt
| Boot_DeclareUniqueIndexStmt
@@ -129,26 +134,8 @@ Boot_Query :
| Boot_BuildIndsStmt
;
-Boot_OpenStmt:
- OPEN boot_ident
- {
- do_start();
- boot_openrel($2);
- do_end();
- }
- ;
-
-Boot_CloseStmt:
- XCLOSE boot_ident
- {
- do_start();
- closerel($2);
- do_end();
- }
- ;
-
Boot_CreateStmt:
- XCREATE boot_ident oidspec optbootstrap optsharedrelation optrowtypeoid LPAREN
+ XCREATE boot_ident oidspec optbootstrap optsharedrelation optrowtypeoid optopen LPAREN
{
do_start();
numattr = 0;
@@ -192,7 +179,6 @@ Boot_CreateStmt:
if (boot_reldesc)
{
- elog(DEBUG4, "create bootstrap: warning, open relation exists, closing first");
closerel(NULL);
}
@@ -240,6 +226,12 @@ Boot_CreateStmt:
NULL);
elog(DEBUG4, "relation created with OID %u", id);
}
+
+ if ($7)
+ {
+ boot_openrel($2);
+ }
+
do_end();
}
;
@@ -251,15 +243,37 @@ Boot_InsertStmt:
elog(DEBUG4, "inserting row");
num_columns_read = 0;
}
+ Boot_InsertTuples
+ {
+ num_columns_read = 0;
+ do_end();
+ }
+ ;
+
+Boot_InsertTuples:
+ Boot_InsertTuple
+ | Boot_InsertTuples COMMA
+ {
+ elog(DEBUG4, "inserting row");
+ num_columns_read = 0;
+ }
+ Boot_InsertTuple
+ ;
+
+Boot_InsertTuple:
LPAREN boot_column_val_list RPAREN
{
+ if (boot_reldesc == NULL)
+ elog(FATAL, "relation not open");
+
+ while (num_columns_read < numattr)
+ InsertOneNull(num_columns_read++);
+
if (num_columns_read != numattr)
elog(ERROR, "incorrect number of columns in row (expected %d, got %d)",
numattr, num_columns_read);
- if (boot_reldesc == NULL)
- elog(FATAL, "relation not open");
InsertOneTuple();
- do_end();
+ num_columns_read = 0;
}
;
@@ -427,6 +441,11 @@ optrowtypeoid:
| { $$ = InvalidOid; }
;
+optopen:
+ OPEN { $$ = true; }
+ | { $$ = false; }
+ ;
+
boot_column_list:
boot_column_def
| boot_column_list COMMA boot_column_def
diff --git a/src/backend/bootstrap/bootscanner.l b/src/backend/bootstrap/bootscanner.l
index 6a9d4193f2..6564469951 100644
--- a/src/backend/bootstrap/bootscanner.l
+++ b/src/backend/bootstrap/bootscanner.l
@@ -60,8 +60,8 @@ sid \'([^']|\'\')*\'
/*
* Keyword tokens return the keyword text (as a constant string) in boot_yylval.kw,
* just in case that's needed because we want to treat the keyword as an
- * unreserved identifier. Note that _null_ is not treated as a keyword
- * for this purpose; it's the one "reserved word" in the bootstrap syntax.
+ * unreserved identifier. Note that _null_ and __ are not treated as a keyword
+ * for this purpose; they're the only "reserved words" in the bootstrap syntax.
*
* Notice that all the keywords are case-sensitive, and for historical
* reasons some must be upper case.
@@ -85,6 +85,7 @@ rowtype_oid { boot_yylval.kw = "rowtype_oid"; return XROWTYPE_OID; }
insert { boot_yylval.kw = "insert"; return INSERT_TUPLE; }
_null_ { return NULLVAL; }
+__ { return NULLVAL; }
"," { return COMMA; }
"=" { return EQUALS; }
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 4cc2efa95c..90c5907d30 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -64,7 +64,7 @@ Relation boot_reldesc; /* current relation descriptor */
Form_pg_attribute attrtypes[MAXATTR]; /* points to attribute info */
int numattr; /* number of attributes for cur. rel */
-
+TupleDesc tupDesc; /* cached tupDesc for cur. rel, or NULL */
/*
* Basic information associated with each type. This is used before
@@ -506,7 +506,6 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
if (boot_reldesc != NULL)
{
- elog(WARNING, "no open relations allowed with CREATE command");
closerel(NULL);
}
@@ -612,14 +611,18 @@ void
InsertOneTuple(void)
{
HeapTuple tuple;
- TupleDesc tupDesc;
int i;
elog(DEBUG4, "inserting row with %d columns", numattr);
- tupDesc = CreateTupleDesc(numattr, attrtypes);
+ /*
+ * tupDesc will be freed when the memory context is reset,
+ * and will be reset to NULL.
+ */
+ if (tupDesc == NULL)
+ tupDesc = CreateTupleDesc(numattr, attrtypes);
+
tuple = heap_form_tuple(tupDesc, values, Nulls);
- pfree(tupDesc); /* just free's tupDesc, not the attrtypes */
simple_heap_insert(boot_reldesc, tuple);
heap_freetuple(tuple);
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 4a7205472c..1e6893792c 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -28,6 +28,8 @@ my $include_path;
my $num_errors = 0;
+my $insert_batch_size = 1000;
+
GetOptions(
'output:s' => \$output_path,
'set-version:s' => \$major_version,
@@ -503,9 +505,19 @@ EOM
. $catalog->{bootstrap}
. $catalog->{rowtype_oid_clause};
+ # Open it, unless it's a bootstrap catalog (create bootstrap does this
+ # automatically)
+ if (!$catalog->{bootstrap})
+ {
+ if (($catalog_data{$catname} || 0) > 0)
+ {
+ print $bki " open";
+ }
+ }
+
my $first = 1;
- print $bki "\n (\n";
+ print $bki "\n(\n";
my $schema = $catalog->{columns};
my %attnames;
my $attnum = 0;
@@ -521,7 +533,7 @@ EOM
# Emit column definitions
if (!$first)
{
- print $bki " ,\n";
+ print $bki ",\n";
}
$first = 0;
@@ -539,7 +551,7 @@ EOM
# Emit Anum_* constants
printf $def "#define Anum_%s_%s %s\n", $catname, $attname, $attnum;
}
- print $bki "\n )\n";
+ print $bki "\n)\n";
# Emit Natts_* constant
print $def "\n#define Natts_$catname $attnum\n\n";
@@ -550,19 +562,13 @@ EOM
print $def $line;
}
- # Open it, unless it's a bootstrap catalog (create bootstrap does this
- # automatically)
- if (!$catalog->{bootstrap})
- {
- print $bki "open $catname\n";
- }
-
# For pg_attribute.h, we generate data entries ourselves.
if ($catname eq 'pg_attribute')
{
gen_pg_attribute($schema);
}
+ my $rows = 0;
# Ordinary catalog with a data file
foreach my $row (@{ $catalog_data{$catname} })
{
@@ -618,7 +624,7 @@ EOM
}
elsif ($atttype eq '_oid')
{
- if ($bki_values{$attname} ne '_null_')
+ if ($bki_values{$attname} ne '_null_' && $bki_values{$attname} ne '__')
{
$bki_values{$attname} =~ s/[{}]//g;
@lookupnames = split /,/, $bki_values{$attname};
@@ -654,7 +660,7 @@ EOM
}
# Write to postgres.bki
- print_bki_insert(\%bki_values, $schema);
+ print_bki_insert(\%bki_values, $schema, ($rows++ % $insert_batch_size) eq 0);
# Emit OID symbol
if (defined $bki_values{oid_symbol})
@@ -671,7 +677,6 @@ EOM
}
}
- print $bki "close $catname\n";
printf $def "\n#endif\t\t\t\t\t\t\t/* %s_D_H */\n", uc $catname;
# Close and rename definition header
@@ -833,6 +838,8 @@ sub gen_pg_attribute
push @attnames, $column->{name};
}
+ my $attrs_processed = 0;
+
foreach my $table_name (@catnames)
{
my $table = $catalogs{$table_name};
@@ -862,7 +869,10 @@ sub gen_pg_attribute
&& ($row{attlen} eq 'NAMEDATALEN' || $row{attlen} > 0));
# If it's bootstrapped, put an entry in postgres.bki.
- print_bki_insert(\%row, $schema) if $table->{bootstrap};
+ if ($table->{bootstrap})
+ {
+ print_bki_insert(\%row, $schema, ($attrs_processed++ % $insert_batch_size) eq 0);
+ }
# Store schemapg entries for later.
morph_row_for_schemapg(\%row, $schema);
@@ -892,7 +902,7 @@ sub gen_pg_attribute
$row{attstattarget} = '0';
morph_row_for_pgattr(\%row, $schema, $attr, 1);
- print_bki_insert(\%row, $schema);
+ print_bki_insert(\%row, $schema, ($attrs_processed++ % $insert_batch_size) eq 0);
}
}
}
@@ -964,6 +974,8 @@ sub print_bki_insert
{
my $row = shift;
my $schema = shift;
+ my $insert = shift;
+ my $n_null_skipped = 0;
my @bki_values;
@@ -977,6 +989,31 @@ sub print_bki_insert
# since that represents a NUL char in C code.
$bki_value = '' if $bki_value eq '\0';
+ # We have many nulls in our catalogs, which would use 6 bytes each if
+ # stored as _null_. We compress them by printing __ instead.
+ # While not as readable, the cost of each NULL column's values is
+ # decreased by 57%.
+ $bki_value = '__' if $bki_value eq '_null_';
+
+ # Various catalogs contain trailing NULL values. By defaulting
+ # missing trailing columns to NULL, we can skip emitting those columns,
+ # often removing several column values from the generated BKI.
+ if ($bki_value eq '__')
+ {
+ $n_null_skipped += 1;
+ next;
+ }
+ else
+ {
+ # We're not in a tail of only nulls, so write out the skipped null
+ # values.
+ while ($n_null_skipped > 0)
+ {
+ push @bki_values, '__';
+ $n_null_skipped -= 1;
+ }
+ }
+
# Handle single quotes by doubling them, because that's what the
# bootstrap scanner requires. We do not process backslashes
# specially; this allows escape-string-style backslash escapes
@@ -991,7 +1028,18 @@ sub print_bki_insert
push @bki_values, $bki_value;
}
- printf $bki "insert ( %s )\n", join(' ', @bki_values);
+
+ # The "insert" part of the statement is quite large at 6 bytes, so we
+ # combine multiple inserts for the same catalog into one large statement.
+ # This again saves a few bytes per statement.
+ if ($insert != 0)
+ {
+ printf $bki "insert (%s)\n", join(' ', @bki_values);
+ }
+ else
+ {
+ printf $bki ", (%s)\n", join(' ', @bki_values);
+ }
return;
}
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index e1cb73c5f2..0228195709 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -30,6 +30,7 @@
extern PGDLLIMPORT Relation boot_reldesc;
extern PGDLLIMPORT Form_pg_attribute attrtypes[MAXATTR];
extern PGDLLIMPORT int numattr;
+extern PGDLLIMPORT TupleDesc tupDesc;
extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
--
2.40.1
On 18/09/2023 17:50, Matthias van de Meent wrote:
(initdb takes about 73ms locally with syncing disabled)
That's impressive. It takes about 600 ms on my laptop. Of which about
140 ms goes into processing the BKI file. And that's with "initdb
-no-sync" option.
Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.
Overall this does not seem very worthwhile to me.
One thing caught my eye though: We currently have an "open" command
after every "create". Except for bootstrap relations; creating a
bootstrap relation opens it implicitly. That seems like a weird
inconsistency. If we make "create" to always open the relation, we can
both make it more consistent and save a few lines. That's maybe worth
doing, per the attached. It removes the "open" command altogether, as
it's not needed anymore.
Looking at "perf" profile of initdb, I also noticed that a small but
measurable amount of time is spent in the "isatty(0)" call in do_end().
Does anyone care about doing bootstrap mode interactively? We could
probably remove that.
--
Heikki Linnakangas
Neon (https://neon.tech)
Attachments:
remove-bootstrap-open.patchtext/x-patch; charset=UTF-8; name=remove-bootstrap-open.patchDownload
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 81a1b7bfec..043ad9325f 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -99,7 +99,7 @@ static int num_columns_read = 0;
/* NULLVAL is a reserved keyword */
%token NULLVAL
/* All the rest are unreserved, and should be handled in boot_ident! */
-%token <kw> OPEN XCLOSE XCREATE INSERT_TUPLE
+%token <kw> XCLOSE XCREATE INSERT_TUPLE
%token <kw> XDECLARE INDEX ON USING XBUILD INDICES UNIQUE XTOAST
%token <kw> OBJ_ID XBOOTSTRAP XSHARED_RELATION XROWTYPE_OID
%token <kw> XFORCE XNOT XNULL
@@ -119,8 +119,7 @@ Boot_Queries:
;
Boot_Query :
- Boot_OpenStmt
- | Boot_CloseStmt
+ Boot_CloseStmt
| Boot_CreateStmt
| Boot_InsertStmt
| Boot_DeclareIndexStmt
@@ -129,20 +128,11 @@ Boot_Query :
| Boot_BuildIndsStmt
;
-Boot_OpenStmt:
- OPEN boot_ident
- {
- do_start();
- boot_openrel($2);
- do_end();
- }
- ;
-
Boot_CloseStmt:
XCLOSE boot_ident
{
do_start();
- closerel($2);
+ boot_closerel($2);
do_end();
}
;
@@ -185,17 +175,17 @@ Boot_CreateStmt:
*/
mapped_relation = ($4 || shared_relation);
+ if (boot_reldesc)
+ {
+ elog(DEBUG4, "create: warning, open relation exists, closing first");
+ boot_closerel(NULL);
+ }
+
if ($4)
{
TransactionId relfrozenxid;
MultiXactId relminmxid;
- if (boot_reldesc)
- {
- elog(DEBUG4, "create bootstrap: warning, open relation exists, closing first");
- closerel(NULL);
- }
-
boot_reldesc = heap_create($2,
PG_CATALOG_NAMESPACE,
shared_relation ? GLOBALTABLESPACE_OID : 0,
@@ -239,6 +229,7 @@ Boot_CreateStmt:
InvalidOid,
NULL);
elog(DEBUG4, "relation created with OID %u", id);
+ boot_openrel(id);
}
do_end();
}
@@ -466,7 +457,6 @@ boot_column_val:
boot_ident:
ID { $$ = $1; }
- | OPEN { $$ = pstrdup($1); }
| XCLOSE { $$ = pstrdup($1); }
| XCREATE { $$ = pstrdup($1); }
| INSERT_TUPLE { $$ = pstrdup($1); }
diff --git a/src/backend/bootstrap/bootscanner.l b/src/backend/bootstrap/bootscanner.l
index 6a9d4193f2..efb9e36090 100644
--- a/src/backend/bootstrap/bootscanner.l
+++ b/src/backend/bootstrap/bootscanner.l
@@ -71,8 +71,6 @@ sid \'([^']|\'\')*\'
%%
-open { boot_yylval.kw = "open"; return OPEN; }
-
close { boot_yylval.kw = "close"; return XCLOSE; }
create { boot_yylval.kw = "create"; return XCREATE; }
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 5810f8825e..2fc01f9d4d 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -409,13 +409,10 @@ bootstrap_signals(void)
* ----------------
*/
void
-boot_openrel(char *relname)
+boot_openrel(Oid id)
{
int i;
- if (strlen(relname) >= NAMEDATALEN)
- relname[NAMEDATALEN - 1] = '\0';
-
/*
* pg_type must be filled before any OPEN command is executed, hence we
* can now populate Typ if we haven't yet.
@@ -424,12 +421,12 @@ boot_openrel(char *relname)
populate_typ_list();
if (boot_reldesc != NULL)
- closerel(NULL);
+ boot_closerel(NULL);
- elog(DEBUG4, "open relation %s, attrsize %d",
- relname, (int) ATTRIBUTE_FIXED_PART_SIZE);
+ elog(DEBUG4, "open relation %u, attrsize %d",
+ id, (int) ATTRIBUTE_FIXED_PART_SIZE);
- boot_reldesc = table_openrv(makeRangeVar(NULL, relname, -1), NoLock);
+ boot_reldesc = table_open(id, NoLock);
numattr = RelationGetNumberOfAttributes(boot_reldesc);
for (i = 0; i < numattr; i++)
{
@@ -450,11 +447,11 @@ boot_openrel(char *relname)
}
/* ----------------
- * closerel
+ * boot_closerel
* ----------------
*/
void
-closerel(char *relname)
+boot_closerel(char *relname)
{
if (relname)
{
@@ -498,7 +495,7 @@ DefineAttr(char *name, char *type, int attnum, int nullness)
if (boot_reldesc != NULL)
{
elog(WARNING, "no open relations allowed with CREATE command");
- closerel(NULL);
+ boot_closerel(NULL);
}
if (attrtypes[attnum] == NULL)
@@ -687,7 +684,7 @@ static void
cleanup(void)
{
if (boot_reldesc != NULL)
- closerel(NULL);
+ boot_closerel(NULL);
}
/* ----------------
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 380bc23c82..3f4bf2ecc2 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -533,13 +533,6 @@ EOM
print $def $line;
}
- # Open it, unless it's a bootstrap catalog (create bootstrap does this
- # automatically)
- if (!$catalog->{bootstrap})
- {
- print $bki "open $catname\n";
- }
-
# For pg_attribute.h, we generate data entries ourselves.
if ($catname eq 'pg_attribute')
{
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index e1cb73c5f2..e7aa2ec0da 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -34,8 +34,8 @@ extern PGDLLIMPORT int numattr;
extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
-extern void closerel(char *relname);
-extern void boot_openrel(char *relname);
+extern void boot_closerel(char *relname);
+extern void boot_openrel(Oid id);
extern void DefineAttr(char *name, char *type, int attnum, int nullness);
extern void InsertOneTuple(void);
Hi,
On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
On 18/09/2023 17:50, Matthias van de Meent wrote:
(initdb takes about 73ms locally with syncing disabled)
That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
goes into processing the BKI file. And that's with "initdb -no-sync" option.
I think there must be a digit missing in Matthias' numbers.
Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.Overall this does not seem very worthwhile to me.
Because the wins are too small?
FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
worthwhile to me. But it doesn't seem quite right to handle the batching in
the file format, it should be on the server side, no?
We really should stop emitting WAL during initdb...
Looking at "perf" profile of initdb, I also noticed that a small but
measurable amount of time is spent in the "isatty(0)" call in do_end(). Does
anyone care about doing bootstrap mode interactively? We could probably
remove that.
Heh, yea, that's pretty pointless.
Greetings,
Andres Freund
On Tue, 19 Sept 2023 at 20:05, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 18/09/2023 17:50, Matthias van de Meent wrote:
(initdb takes about 73ms locally with syncing disabled)
That's impressive. It takes about 600 ms on my laptop. Of which about
140 ms goes into processing the BKI file. And that's with "initdb
-no-sync" option.
Hmm, yes, I misinterpreted my own benchmark setup, the actual value
would be somewhere around 365ms: I thought I was doing 50*50 runs in
one timed run, but really I was doing only 50 runs. TO add insult to
injury, I divided the total time taken by 250 instead of either 50 or
2500... Thanks for correcting me on that.
Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.Overall this does not seem very worthwhile to me.
Reducing the size of redistributables sounds worthwhile to me, but if
none of these changes are worth the effort, then alright, nothing
gained, only time lost.
Looking at "perf" profile of initdb, I also noticed that a small but
measurable amount of time is spent in the "isatty(0)" call in do_end().
Does anyone care about doing bootstrap mode interactively? We could
probably remove that.
Yeah, that sounds like a good idea.
Kind regards,
Matthias van de Meent
On Fri, 22 Sept 2023 at 00:25, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
On 18/09/2023 17:50, Matthias van de Meent wrote:
(initdb takes about 73ms locally with syncing disabled)
That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
goes into processing the BKI file. And that's with "initdb -no-sync" option.I think there must be a digit missing in Matthias' numbers.
Yes, kind of. The run was on 50 iterations, not the assumed 250.
Also note that the improved measurements were recorded inside the
boostrap-mode PostgreSQL instance, not inside the initdb that was
processing the postgres.bki file. So it might well be that I didn't
improve the total timing by much.
Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.Overall this does not seem very worthwhile to me.
Because the wins are too small?
FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
worthwhile to me. But it doesn't seem quite right to handle the batching in
the file format, it should be on the server side, no?
The main reason I did batching in the file format is to reduce the
storage overhead of the current one "INSERT" per row. Batching
improved that by replacing the token with a different construct, but
it's not neccessarily the only solution. The actual parser still
inserts the tuples one by one in the relation, as I didn't spend time
on making a simple_heap_insert analog for bulk insertions.
We really should stop emitting WAL during initdb...
I think it's quite elegant that we're able to bootstrap the relation
data of a new PostgreSQL cluster from the WAL generated in another
cluster, even if it is indeed a bit wasteful. I do see your point
though - the WAL shouldn't be needed if we're already fsyncing the
files to disk.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
On 19.09.23 20:05, Heikki Linnakangas wrote:
One thing caught my eye though: We currently have an "open" command
after every "create". Except for bootstrap relations; creating a
bootstrap relation opens it implicitly. That seems like a weird
inconsistency. If we make "create" to always open the relation, we can
both make it more consistent and save a few lines. That's maybe worth
doing, per the attached. It removes the "open" command altogether, as
it's not needed anymore.
This seems like a good improvement to me.
It would restrict the bootstrap language so that you can only manipulate
a table right after creating it, but I don't see why that wouldn't be
sufficient.
On 08.11.23 08:16, Peter Eisentraut wrote:
On 19.09.23 20:05, Heikki Linnakangas wrote:
One thing caught my eye though: We currently have an "open" command
after every "create". Except for bootstrap relations; creating a
bootstrap relation opens it implicitly. That seems like a weird
inconsistency. If we make "create" to always open the relation, we can
both make it more consistent and save a few lines. That's maybe worth
doing, per the attached. It removes the "open" command altogether, as
it's not needed anymore.This seems like a good improvement to me.
It would restrict the bootstrap language so that you can only manipulate
a table right after creating it, but I don't see why that wouldn't be
sufficient.
Then again, this sort of achieves the opposite of what Matthias was
aiming for: You are now forcing some relations to be opened even though
we will end up closing it right away.
(In any case, documentation in bki.sgml would need to be updated for
this patch.)
On Wed, 8 Nov 2023 at 12:50, Peter Eisentraut <peter@eisentraut.org> wrote:
On 08.11.23 08:16, Peter Eisentraut wrote:
On 19.09.23 20:05, Heikki Linnakangas wrote:
One thing caught my eye though: We currently have an "open" command
after every "create". Except for bootstrap relations; creating a
bootstrap relation opens it implicitly. That seems like a weird
inconsistency. If we make "create" to always open the relation, we can
both make it more consistent and save a few lines. That's maybe worth
doing, per the attached. It removes the "open" command altogether, as
it's not needed anymore.This seems like a good improvement to me.
It would restrict the bootstrap language so that you can only manipulate
a table right after creating it, but I don't see why that wouldn't be
sufficient.Then again, this sort of achieves the opposite of what Matthias was
aiming for: You are now forcing some relations to be opened even though
we will end up closing it right away.(In any case, documentation in bki.sgml would need to be updated for
this patch.)
I have changed the status of the patch to WOA, feel free to update the
status once Peter's documentation comments are addressed.
Regards,
Vignesh
On 22 Sep 2023, at 18:50, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
Hi Matthias!
This is kind reminder that this thread is waiting for your response.
CF entry [0]https://commitfest.postgresql.org/47/4544/ is in "Waiting on Author", I'll move it to July CF.
Thanks!
Best regards, Andrey Borodin.
On Mon, Apr 08, 2024 at 11:11:07AM +0300, Andrey M. Borodin wrote:
This is kind reminder that this thread is waiting for your response.
CF entry [0] is in "Waiting on Author", I'll move it to July CF.
Hmm, is that productive? This patch has been waiting on author since
the 1st of February, and it was already moved from the CF 2024-01 to
2024-03. It would make more sense to me to mark it as RwF, then
resubmit if there is still interest in working on this topic rather
than move it again.
My personal inner rule is there is enough ground for a patch to be
marked as RwF if it has been waiting on author since the middle of a
commit fest, which would be the 15th of March for CF 2024-03. This
lets two weeks to authors to react.
--
Michael
On Tue, Apr 9, 2024 at 12:03 AM Michael Paquier <michael@paquier.xyz> wrote:
Hmm, is that productive? This patch has been waiting on author since
the 1st of February, and it was already moved from the CF 2024-01 to
2024-03. It would make more sense to me to mark it as RwF, then
resubmit if there is still interest in working on this topic rather
than move it again.
Done.
--
Robert Haas
EDB: http://www.enterprisedb.com