unused_oids script is broken with bsd sed
Hi.
BSD sed in macOS doesn't understand word boundary operator "\b". So after
372728b0d49 unused_oids doesn’t match all oids in new *.dat files marking
all of them as unused.
It is possible to write more portable regexps, e.g. change "\b" to
something like "^.*{*.*", but it seems easier for feature use to just
rewrite unused_oids in perl to match duplicate_oids. Also add in-place
complain about duplicates instead of running uniq through oids array.
--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
0001-Rewrite-unused_oids-in-perl.patchapplication/octet-stream; name=0001-Rewrite-unused_oids-in-perl.patch; x-unix-mode=0644Download+45-31
On 04/25/2018 06:22 AM, Stas Kelvich wrote:
Hi.
BSD sed in macOS doesn't understand word boundary operator "\b". So after
372728b0d49 unused_oids doesn’t match all oids in new *.dat files marking
all of them as unused.It is possible to write more portable regexps, e.g. change "\b" to
something like "^.*{*.*", but it seems easier for feature use to just
rewrite unused_oids in perl to match duplicate_oids. Also add in-place
complain about duplicates instead of running uniq through oids array.
+many for rewriting in perl. Do you want to have a go at that? If not I
will.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
+many for rewriting in perl. Do you want to have a go at that? If not I
will.
+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.
regards, tom lane
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
+many for rewriting in perl. Do you want to have a go at that? If not I
will.+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.regards, tom lane
Hm, I attached patch in first message, but seems that my mail client again
messed with attachment. However archive caught it:
/messages/by-id/attachment/60920/0001-Rewrite-unused_oids-in-perl.patch
--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.
Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.
--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.
I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:
my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');
This is preferred over using awk, which would have its own portability
issues (Windows for starters).
While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.
-John Naylor
On Wed, Apr 25, 2018 at 09:55:54PM +0700, John Naylor wrote:
On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');
While we're improving things, there's not really a reason this program
should need to be run from any particular spot. It can detect where it
is and act on that information.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 04/25/2018 04:59 PM, David Fetter wrote:
While we're improving things, there's not really a reason this program
should need to be run from any particular spot. It can detect where it
is and act on that information.
+1
I would appreciate this change, and if Stats does not feel like adding
it to his patch I can write a patch for this myself.
Andreas
On 04/25/2018 10:55 AM, John Naylor wrote:
On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');This is preferred over using awk, which would have its own portability
issues (Windows for starters).While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.
The logic for getting the set of oids should be centralized, if not in
Catalog.pm then in a script which serves both for dup0licate_oids and
unused_oids.
Here is something I cobbled together for the latter approach. It could
probably improve by using Catalog::FindDefinedSymbol()
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
On 25 Apr 2018, at 17:55, John Naylor <jcnaylor@gmail.com> wrote:
On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');This is preferred over using awk, which would have its own portability
issues (Windows for starters).While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.-John Naylor
New version is attached. I've put iterator into Catalog.pm to avoid copy-pasting
it over two scripts. Also instead of greping through *.dat and *.h files I've
used parsers provided in Catalog module. That way should be more sustainable
to format changes.
Anyone who wanted to help with scripts -- feel free to rewrite.
Attachments:
0001-Rewrite-unused_oids-in-perl.patchapplication/octet-stream; name=0001-Rewrite-unused_oids-in-perl.patch; x-unix-mode=0644Download+85-49
On 4/25/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
+many for rewriting in perl. Do you want to have a go at that? If not I
will.+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.
If we're going to use Catalog.pm for that, it seems more convenient to
expose toast and index oids directly rather than in strings formatted
specifically for the bki file, as in the attached. Thoughts?
-John Naylor
Attachments:
expose-toast-and-index-oids.patchtext/x-patch; charset=US-ASCII; name=expose-toast-and-index-oids.patchDownload+14-20
John Naylor <jcnaylor@gmail.com> writes:
On 4/25/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.
If we're going to use Catalog.pm for that, it seems more convenient to
expose toast and index oids directly rather than in strings formatted
specifically for the bki file, as in the attached. Thoughts?
Good idea, I like this better than what Stas did in his patch.
I'm a bit inclined to preserve the old variable names (toast_name,
toast_oid etc) as the hash keys because they seem more greppable
than the generic "oid", "name" that you have here. But maybe that
isn't an interesting consideration.
regards, tom lane
On 4/26/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:
New version is attached. I've put iterator into Catalog.pm to avoid
copy-pasting
it over two scripts. Also instead of greping through *.dat and *.h files
I've
used parsers provided in Catalog module. That way should be more
sustainable
to format changes.Anyone who wanted to help with scripts -- feel free to rewrite.
Looks pretty good. Some comments:
Just after you posted, I sent a patch that tweaks the API of
Catalog.pm for toast and index oids. If you use that API in your
patch, you can get rid of the extra bookkeeping you added for those
oids.
The original scripts skipped the relation and rowtype oids for
bootstrap catalogs. You've replaced that with two data-level tests for
pg_class plus pg_type composite types. I think the original test was a
bit cleaner in this regard.
For this type of call:
+my @oids = @{ Catalog::FindAllOidsFromHeaders(@input_files) };
+foreach my $oid (@oids)
this style is used by other callers of the module:
+my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
+foreach my $oid (@{ $oids })
For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.
-John Naylor
John Naylor <jcnaylor@gmail.com> writes:
Just after you posted, I sent a patch that tweaks the API of
Catalog.pm for toast and index oids. If you use that API in your
patch, you can get rid of the extra bookkeeping you added for those
oids.
I've adjusted these two patches to play together, and pushed them.
The original scripts skipped the relation and rowtype oids for
bootstrap catalogs. You've replaced that with two data-level tests for
pg_class plus pg_type composite types. I think the original test was a
bit cleaner in this regard.
Yeah, I thought so too. Changed.
For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.
I didn't touch this point.
I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters. I think we should
do what you said upthread and teach genbki.pl to complain about duplicate
oids, so that we can remove duplicate_oids from the build process.
I went ahead and pushed things as-is so we can confirm that duplicate_oids
has no portability issues that the buildfarm could find. But let's try
to get the build process change in place after a day or so.
regards, tom lane
On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
John Naylor <jcnaylor@gmail.com> writes:
For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.I didn't touch this point.
Patch 0002 is my attempt at this. Not sure how portable this is. It's
also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.
I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters.
If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.
I think we should
do what you said upthread and teach genbki.pl to complain about duplicate
oids, so that we can remove duplicate_oids from the build process.I went ahead and pushed things as-is so we can confirm that duplicate_oids
has no portability issues that the buildfarm could find. But let's try
to get the build process change in place after a day or so.
Patch 0001
-moves the compile-time test into genbki.pl
-removes a usability quirk from unused_oids
-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact
-updates the documentation.
-John Naylor
Attachments:
0001-Remove-standalone-script-duplicate_oids.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-standalone-script-duplicate_oids.patchDownload+80-93
0002-Remove-requirement-to-call-unused_oids-from-its-dire.patchtext/x-patch; charset=US-ASCII; name=0002-Remove-requirement-to-call-unused_oids-from-its-dire.patchDownload+28-15
On 04/26/2018 01:37 PM, John Naylor wrote:> Patch 0002 is my attempt at
this. Not sure how portable this is. It's
also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.
What about using FindBin (http://perldoc.perl.org/FindBin.html)?
Andreas
Andreas Karlsson <andreas@proxel.se> writes:
What about using FindBin (http://perldoc.perl.org/FindBin.html)?
A quick check suggests that that's present in the core perl
distribution pretty far back, so I think it'll work.
Question is, is it any better than John's "dirname(__FILE__)"
solution?
regards, tom lane
I wrote:
Andreas Karlsson <andreas@proxel.se> writes:
What about using FindBin (http://perldoc.perl.org/FindBin.html)?
A quick check suggests that that's present in the core perl
distribution pretty far back, so I think it'll work.
Question is, is it any better than John's "dirname(__FILE__)"
solution?
I experimented a bit, and found that actually what we seem to
need is FindBin::RealBin. That's the only one of these solutions
that gives the right answer if someone's put a symlink to
unused_oids or duplicate_oids into their PATH --- the other ones
give you the directory containing the symlink. Now, maybe that
would be an uncommon usage, but it hardly seems out of the question.
regards, tom lane
John Naylor <jcnaylor@gmail.com> writes:
On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters.
If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.
Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly. Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed? I'm not sure; that would cost something too.
-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact
I don't think anyone's asked for duplicate_oids to be removed altogether,
and I'm afraid that doing so would break existing workflows. For that
matter, now that I think about it, changing the behavior of unused_oids
as we did yesterday was ill-considered as well, so I undid that part.
I've pushed the current-directory-independence change by itself so that
we can see if any buildfarm members think it's problematic. (They'll
still be testing duplicate_oids, as things stand.) Once we have some
confirmation on the FindBin stuff actually being portable, I'll push
the changes to make genbki do duplicate checks and remove duplicate_oids
from the build sequence.
regards, tom lane
On Thu, Apr 26, 2018 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly. Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed? I'm not sure; that would cost something too.
It seems like a good idea to me.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company