Cutting initdb's runtime (Perl question embedded)
Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names. The recent discussion about cutting TAP test
time prompted me to look into that question again. We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into. There seem to be a couple of plausible
routes to a fix:
1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki. The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields. The places that do need to do that approximate it like this:
# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields. Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace. We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{@attnames} = split /\s+/, $row->{bki_values};
We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly. I'm one of the world's worst
Perl programmers, but surely there's a way?
2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file. That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.
Thoughts?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/12/2017 04:12 PM, Tom Lane wrote:
1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki. The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields. The places that do need to do that approximate it like this:# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields. Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace. We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{@attnames} = split /\s+/, $row->{bki_values};We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly. I'm one of the world's worst
Perl programmers, but surely there's a way?2. Alternatively, we could teach bootstrap mode to build a hashtable
mapping proname to OID while it reads pg_proc data from postgres.bki,
and then make regprocin's bootstrap path consult the hashtable instead
of looking directly at the pg_proc file. That I'm quite sure is do-able,
but it seems like it's leaving money on the table compared to doing
the transformation earlier.Thoughts?
Looked at this an option 1 seems simple enough if I am not missing
something. I might hack something up later tonight. Either way I think
this improvement can be done separately from the proposed replacement of
the catalog header files. Trying to fix everything at once often leads
to nothing being fixed at all.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-12 10:12:47 -0400, Tom Lane wrote:
Andres mentioned, and I've confirmed locally, that a large chunk of
initdb's runtime goes into regprocin's brute-force lookups of function
OIDs from function names. The recent discussion about cutting TAP test
time prompted me to look into that question again. We had had some
grand plans for getting genbki.pl to perform the name-to-OID conversion
as part of a big rewrite, but since that project is showing few signs
of life, I'm thinking that a more localized performance fix would be
a good thing to look into. There seem to be a couple of plausible
routes to a fix:1. The best thing would still be to make genbki.pl do the conversion,
and write numeric OIDs into postgres.bki. The core stumbling block
here seems to be that for most catalogs, Catalog.pm and genbki.pl
never really break down a DATA line into fields --- and we certainly
have got to do that, if we're going to replace the values of regproc
fields. The places that do need to do that approximate it like this:# To construct fmgroids.h and fmgrtab.c, we need to inspect some
# of the individual data fields. Just splitting on whitespace
# won't work, because some quoted fields might contain internal
# whitespace. We handle this by folding them all to a simple
# "xxx". Fortunately, this script doesn't need to look at any
# fields that might need quoting, so this simple hack is
# sufficient.
$row->{bki_values} =~ s/"[^"]*"/"xxx"/g;
@{$row}{@attnames} = split /\s+/, $row->{bki_values};We would need a bullet-proof, non-hack, preferably not too slow way to
split DATA lines into fields properly. I'm one of the world's worst
Perl programmers, but surely there's a way?
I've done something like 1) before:
http://archives.postgresql.org/message-id/20150221230839.GE2037%40awork2.anarazel.de
I don't think the speeds matters all that much, because we'll only do it
when generating the .bki file - a couple ms more or less won't matter
much.
I IIRC spent some more time to also load the data files from a different
format:
https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shortlog;h=refs/heads/sane-catalog
although that's presumably heavily outdated now.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/12/2017 05:00 PM, Andreas Karlsson wrote:
Looked at this an option 1 seems simple enough if I am not missing
something. I might hack something up later tonight. Either way I think
this improvement can be done separately from the proposed replacement of
the catalog header files. Trying to fix everything at once often leads
to nothing being fixed at all.
Here is my proof of concept patch. It does basically the same thing as
Andres's patch except that it handles quoted values a bit better and
does not try to support anything other than the regproc type.
The patch speeds up initdb without fsync from 0.80 seconds to 0.55
seconds, which is a nice speedup, while adding a negligible amount of
extra work on compilation.
Two things which could be improved in this patch if people deem it
important:
- Refactor the code to be more generic, like Andres patch, so it is
easier to add lookups for other data types.
- Write something closer to a real .bki parser to actually understand
quoted values and _null_ when parsing the data. Right now this patch
only splits each row into its fields (while being aware of quotes) but
does not attempt to remove quotes. The PoC patch treats "foo" and foo as
different.
Andreas
Attachments:
bki-regproc-oids-v1.patchtext/x-patch; name=bki-regproc-oids-v1.patchDownload+32-57
Andreas Karlsson <andreas@proxel.se> writes:
Here is my proof of concept patch. It does basically the same thing as
Andres's patch except that it handles quoted values a bit better and
does not try to support anything other than the regproc type.
The patch speeds up initdb without fsync from 0.80 seconds to 0.55
seconds, which is a nice speedup, while adding a negligible amount of
extra work on compilation.
I've pushed this with some mostly-cosmetic adjustments:
* created a single subroutine that understands how to split DATA lines,
rather than having several copies of the regex
* rearranged the code so that the data structure returned by
Catalog::Catalogs() isn't scribbled on (which was already
happening before your patch, but it seemed pretty ugly to me)
* stripped out the bootstrap-time name lookup code from all of reg*
not just regproc.
There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-13 12:13:30 -0400, Tom Lane wrote:
Andreas Karlsson <andreas@proxel.se> writes:
Here is my proof of concept patch. It does basically the same thing as
Andres's patch except that it handles quoted values a bit better and
does not try to support anything other than the regproc type.The patch speeds up initdb without fsync from 0.80 seconds to 0.55
seconds, which is a nice speedup, while adding a negligible amount of
extra work on compilation.I've pushed this with some mostly-cosmetic adjustments:
Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.
Hm, biggest user of time in what workload? I've not noticed that
function particularly.
I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though. Maybe keep the code but #ifdef it
under some nondefault debugging symbol.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.Hm, biggest user of time in what workload? I've not noticed that
function particularly.
Just initdb. I presume it's because the catcaches will frequently be
relatively big there.
I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though. Maybe keep the code but #ifdef it
under some nondefault debugging symbol.
Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
so it gets compiled at least sometimes? Not a great fit, but ...
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.
Hm, biggest user of time in what workload? I've not noticed that
function particularly.
Just initdb. I presume it's because the catcaches will frequently be
relatively big there.
Hm. That ties into something I was looking at yesterday. The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y). Which seems pretty silly. I experimented
with collapsing all the transactions for consecutive DATA lines into
one transaction, but couldn't immediately make it work due to memory
management issues. I didn't try collapsing the entire run into a
single transaction, but maybe that would actually be easier, though
no doubt more wasteful of memory.
I agree that it doesn't seem like we need to spend a lot of time
cross-checking there, though. Maybe keep the code but #ifdef it
under some nondefault debugging symbol.
Hm, if we want to keep it, maybe tie it to CLOBBER_CACHE_ALWAYS or such,
so it gets compiled at least sometimes? Not a great fit, but ...
Don't like that, because CCA is by definition not the normal cache
behavior. It would make a bit of sense to tie it to CACHEDEBUG,
but as you say, it'd never get tested normally if we do that.
On the whole, though, we may be looking at diminishing returns here.
I just did some "perf" measurement of the overall "initdb" cycle,
and what I'm seeing suggests that bootstrap mode as such is now a
pretty small fraction of the overall cycle:
+ 51.07% 0.01% 28 postgres postgres [.] PostgresMain #
...
+ 13.52% 0.00% 0 postgres postgres [.] AuxiliaryProcessMain #
That says that the post-bootstrap steps are now the bulk of the time,
which agrees with naked-eye observation.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2017-04-13 12:56:14 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
Cool. I wonder if we also should remove AtEOXact_CatCache()'s
cross-checks - the resowner replacement has been in place for a while,
and seems robust enough. They're now the biggest user of time.Hm, biggest user of time in what workload? I've not noticed that
function particularly.Just initdb. I presume it's because the catcaches will frequently be
relatively big there.Hm. That ties into something I was looking at yesterday. The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y). Which seems pretty silly.
Indeed.
On the whole, though, we may be looking at diminishing returns here.
I just did some "perf" measurement of the overall "initdb" cycle,
and what I'm seeing suggests that bootstrap mode as such is now a
pretty small fraction of the overall cycle:+ 51.07% 0.01% 28 postgres postgres [.] PostgresMain # ... + 13.52% 0.00% 0 postgres postgres [.] AuxiliaryProcessMain #That says that the post-bootstrap steps are now the bulk of the time,
which agrees with naked-eye observation.
The AtEOXact_CatCache weren't only in bootstrap mode, but yea, it's by
far smaller wins in comparison to the regprocin thing.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/13/2017 06:13 PM, Tom Lane wrote:
I've pushed this with some mostly-cosmetic adjustments:
Thanks! I like your adjustments.
There's certainly lots more that could be done in the genbki code,
but I think all we can justify at this stage of the development
cycle is to get the low-hanging fruit for testing speedups.
Yeah, I also noticed that the genbki code seems to have gotten little
love and that much more can be done here.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2017-04-13 14:05:43 -0400, Tom Lane wrote:
Hm. That ties into something I was looking at yesterday. The only
reason that function is called so much is that bootstrap mode runs a
separate transaction for *each line of the bki file* (cf do_start,
do_end in bootparse.y). Which seems pretty silly.
Indeed.
I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki. So I've pushed a patch that does
that. We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/14/2017 11:54 PM, Tom Lane wrote:
I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki. So I've pushed a patch that does
that. We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.
Looked some at this and what take time now for me seems to mainly be
these four things (out of a total runtime of 560 ms).
1. setup_conversion: 140 ms
2. select_default_timezone: 90 ms
3. bootstrap_template1: 80 ms
4. setup_schema: 65 ms
These four take up about two thirds of the total runtime, so it seems
likely that we may still have relatively low hanging fruit (but not
worth committing for PostgreSQL 10).
I have not done profiling of these functions yet, so am not sure how
they best would be fixed but maybe setup_conversion could be converted
into bki entries to speed it up.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 15/04/17 13:44, Andreas Karlsson wrote:
On 04/14/2017 11:54 PM, Tom Lane wrote:
I failed to resist the temptation to poke at this, and found that
indeed nothing seems to break if we just use one transaction for the
whole processing of postgres.bki. So I've pushed a patch that does
that. We're definitely down to the point where worrying about the
speed of bootstrap mode, per se, is useless; the other steps in
initdb visibly take a lot more time.Looked some at this and what take time now for me seems to mainly be
these four things (out of a total runtime of 560 ms).1. setup_conversion: 140 ms
2. select_default_timezone: 90 ms
3. bootstrap_template1: 80 ms
4. setup_schema: 65 msThese four take up about two thirds of the total runtime, so it seems
likely that we may still have relatively low hanging fruit (but not
worth committing for PostgreSQL 10).I have not done profiling of these functions yet, so am not sure how
they best would be fixed but maybe setup_conversion could be converted
into bki entries to speed it up.Andreas
How much could be done concurrently?
Cheers.
Gavin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andreas Karlsson <andreas@proxel.se> writes:
Looked some at this and what take time now for me seems to mainly be
these four things (out of a total runtime of 560 ms).
1. setup_conversion: 140 ms
2. select_default_timezone: 90 ms
3. bootstrap_template1: 80 ms
4. setup_schema: 65 ms
FWIW, you can bypass select_default_timezone by setting environment
variable TZ to a valid timezone name before calling initdb. On
my workstation, that currently cuts the time for "initdb --no-sync"
from about 1.25 sec to just about exactly 1.0 sec.
I doubt that we want all the buildfarm members to switch over to
doing that, since then we'd lose coverage of select_default_timezone.
But it's interesting to speculate about having the buildfarm script
extract the selected timezone name out of postgresql.conf after the
first initdb it does, and then set TZ for remaining tests. We surely
don't need to test select_default_timezone more than once per
buildfarm run.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/15/2017 12:07 AM, Tom Lane wrote:
Andreas Karlsson <andreas@proxel.se> writes:
Looked some at this and what take time now for me seems to mainly be
these four things (out of a total runtime of 560 ms).
1. setup_conversion: 140 ms
2. select_default_timezone: 90 ms
3. bootstrap_template1: 80 ms
4. setup_schema: 65 msFWIW, you can bypass select_default_timezone by setting environment
variable TZ to a valid timezone name before calling initdb. On
my workstation, that currently cuts the time for "initdb --no-sync"
from about 1.25 sec to just about exactly 1.0 sec.I doubt that we want all the buildfarm members to switch over to
doing that, since then we'd lose coverage of select_default_timezone.
But it's interesting to speculate about having the buildfarm script
extract the selected timezone name out of postgresql.conf after the
first initdb it does, and then set TZ for remaining tests. We surely
don't need to test select_default_timezone more than once per
buildfarm run.
Yeah. The obvious place to do this would be the "make check" stage,
which runs very early. Unfortunately, pg_regress carefully removes its
data directory on success - kind of a pity that's not switchable.
Probably for now the simplest thing for buildfarm animals whose owners
are trying to squeeze every last second would be to put something like
TZ => 'Us/Eastern',
in the build_env section of the config file. But as you say we don't
want everyone doing that.
Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan wrote:
Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.
What about having an initdb option that runs select_default_timezone
only and reports the result, so that it can be used in the buildfarm
script to set TZ in all the regular initdb calls?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/15/2017 11:44 AM, Alvaro Herrera wrote:
Andrew Dunstan wrote:
Alternatively, we could have an initdb TAP test that explicitly removed
the environment setting so we'd get coverage of select_default_timezone,
and have the buildfarm set TZ to something if it's not already set.What about having an initdb option that runs select_default_timezone
only and reports the result, so that it can be used in the buildfarm
script to set TZ in all the regular initdb calls?
Seems like more work.
What I had in mind was the attached plus roughly this in the buildfarm
client:
$ENV{TZ} ||= 'US/Eastern';
or whatever zone we choose to use.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
initdb-test-no-tz.patchtext/x-diff; name=initdb-test-no-tz.patchDownload+10-2
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
What I had in mind was the attached plus roughly this in the buildfarm
client:
$ENV{TZ} ||= 'US/Eastern';
or whatever zone we choose to use.
How about letting the first "make check" run with whatever is in the
environment, and then forcing TZ to become set (much as above) for
all the remaining tests? I'm afraid what you've got here might
encourage a certain sameness of the test environments.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 04/15/2017 12:13 PM, Tom Lane wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
What I had in mind was the attached plus roughly this in the buildfarm
client:
$ENV{TZ} ||= 'US/Eastern';
or whatever zone we choose to use.How about letting the first "make check" run with whatever is in the
environment, and then forcing TZ to become set (much as above) for
all the remaining tests? I'm afraid what you've got here might
encourage a certain sameness of the test environments.
Sure. Just means putting this code a bit later in the file. "make check"
is only one initdb, so it won't cost much. I'm still inclined to force a
TAP test for initdb with no TZ set, though.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers