unused_oids script is broken with bsd sed

Started by Stas Kelvichover 7 years ago27 messages
#1Stas Kelvich
s.kelvich@postgrespro.ru
1 attachment(s)

Hi.

BSD sed in macOS doesn't understand word boundary operator "\b". So after
372728b0d49 unused_oids doesn’t match all oids in new *.dat files marking
all of them as unused.

It is possible to write more portable regexps, e.g. change "\b" to
something like "^.*{*.*", but it seems easier for feature use to just
rewrite unused_oids in perl to match duplicate_oids. Also add in-place
complain about duplicates instead of running uniq through oids array.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-Rewrite-unused_oids-in-perl.patchapplication/octet-stream; name=0001-Rewrite-unused_oids-in-perl.patch; x-unix-mode=0644Download
From 756aeca9f6bf48c795cb84de7a131e5db8a4b33e Mon Sep 17 00:00:00 2001
From: Stas Kelvich <stanconn@gmail.com>
Date: Wed, 25 Apr 2018 13:00:28 +0300
Subject: [PATCH] Rewrite unused_oids in perl

unused_oids script was broken with bsd sed after 372728b0d49. It was
possible to write more portable regexps but it seems easier to just rewrite
unused_oids in perl to match duplicate_oids. Also add in-place complain
about duplicates instead of running uniq through oids array.
---
 src/include/catalog/unused_oids | 75 ++++++++++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index f71222d50d..c225cbaa70 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/usr/bin/perl
 #
 # unused_oids
 #
@@ -16,42 +16,57 @@
 #	run this script in src/include/catalog.
 #
 
+use strict;
+use warnings;
 
-AWK="awk"
+BEGIN
+{
+	@ARGV = (glob("pg_*.h"), glob("pg_*.dat"), qw(indexing.h toasting.h));
+}
 
-# Get FirstBootstrapObjectId from access/transam.h
-FIRSTOBJECTID=`grep '#define[ 	]*FirstBootstrapObjectId' ../access/transam.h | $AWK '{ print $3 }'`
-export FIRSTOBJECTID
+my $AWK = 'awk';
+my $FirstObjectId =`grep '#define[ 	]*FirstBootstrapObjectId' ../access/transam.h | $AWK '{ print \$3 }'`;
+if ($FirstObjectId !~ /^\d+$/)
+{
+	die "Counld not find FirstBootstrapObjectId in ../access/transam.h";
+}
+my @oids = ($FirstObjectId);
 
 # this part (down to the uniq step) should match the duplicate_oids script
 # note: we exclude BKI_BOOTSTRAP relations since they are expected to have
 # matching data entries in pg_class.dat and pg_type.dat
+while (<>)
+{
+	next if /^CATALOG\(.*BKI_BOOTSTRAP/;
+	next
+	  unless /\boid *=> *'(\d+)'/
+		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/
+		  || /^CATALOG\([^,]*, *(\d+)/
+		  || /^DECLARE_INDEX\([^,]*, *(\d+)/
+		  || /^DECLARE_UNIQUE_INDEX\([^,]*, *(\d+)/
+		  || /^DECLARE_TOAST\([^,]*, *(\d+), *(\d+)/;
+	push @oids, $1;
+	push @oids, $2 if $2;
+}
 
-cat pg_*.h pg_*.dat toasting.h indexing.h |
-egrep -v -e '^CATALOG\(.*BKI_BOOTSTRAP' | \
-sed -n	-e 's/.*\boid *=> *'\''\([0-9][0-9]*\)'\''.*$/\1/p' \
-	-e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*BKI_ROWTYPE_OID(\([0-9][0-9]*\),.*$/\1,\2/p' \
-	-e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_UNIQUE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_TOAST([^,]*, *\([0-9][0-9]*\), *\([0-9][0-9]*\).*$/\1,\2/p' | \
-tr ',' '\n' | \
-sort -n | \
-uniq | \
-$AWK '
-BEGIN {
-	last = 0;
-}
-/^[0-9]/ {
-	if ($1 > last + 1) {
-		if ($1 > last + 2) {
-			print last + 1, "-", $1 - 1;
-		} else {
-			print last + 1;
+my $prev_oid = 0;
+foreach my $oid (sort { $a <=> $b } @oids)
+{
+	if ($oid > $prev_oid + 1)
+	{
+		if ($oid > $prev_oid + 2)
+		{
+			print "@{[ $prev_oid + 1 ]} - @{[ $oid - 1 ]}\n";
+		}
+		else
+		{
+			print "@{[ $prev_oid + 1 ]}\n";
 		}
 	}
-	last = $1;
+	elsif ($oid == $prev_oid)
+	{
+		print "Duplicate oid detected: $oid\n";
+		exit 1;
+	}
+	$prev_oid = $oid;
 }
-END {
-	print last + 1, "-", ENVIRON["FIRSTOBJECTID"]-1;
-}'
-- 
2.16.2

#2Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Stas Kelvich (#1)
Re: unused_oids script is broken with bsd sed

On 04/25/2018 06:22 AM, Stas Kelvich wrote:

Hi.

BSD sed in macOS doesn't understand word boundary operator "\b". So after
372728b0d49 unused_oids doesn’t match all oids in new *.dat files marking
all of them as unused.

It is possible to write more portable regexps, e.g. change "\b" to
something like "^.*{*.*", but it seems easier for feature use to just
rewrite unused_oids in perl to match duplicate_oids. Also add in-place
complain about duplicates instead of running uniq through oids array.

+many for rewriting in perl. Do you want to have a go at that? If not I
will.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: unused_oids script is broken with bsd sed

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

+many for rewriting in perl. Do you want to have a go at that? If not I
will.

+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.

regards, tom lane

#4Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Tom Lane (#3)
Re: unused_oids script is broken with bsd sed

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

+many for rewriting in perl. Do you want to have a go at that? If not I
will.

+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.

regards, tom lane

Hm, I attached patch in first message, but seems that my mail client again
messed with attachment. However archive caught it:

/messages/by-id/attachment/60920/0001-Rewrite-unused_oids-in-perl.patch

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#5Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Tom Lane (#3)
Re: unused_oids script is broken with bsd sed

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#6John Naylor
jcnaylor@gmail.com
In reply to: Stas Kelvich (#5)
Re: unused_oids script is broken with bsd sed

On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:

my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');

This is preferred over using awk, which would have its own portability
issues (Windows for starters).

While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.

-John Naylor

#7David Fetter
david@fetter.org
In reply to: John Naylor (#6)
Re: unused_oids script is broken with bsd sed

On Wed, Apr 25, 2018 at 09:55:54PM +0700, John Naylor wrote:

On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:

my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');

While we're improving things, there's not really a reason this program
should need to be run from any particular spot. It can detect where it
is and act on that information.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#8Andreas Karlsson
andreas@proxel.se
In reply to: David Fetter (#7)
Re: unused_oids script is broken with bsd sed

On 04/25/2018 04:59 PM, David Fetter wrote:

While we're improving things, there's not really a reason this program
should need to be run from any particular spot. It can detect where it
is and act on that information.

+1

I would appreciate this change, and if Stats does not feel like adding
it to his patch I can write a patch for this myself.

Andreas

#9Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: John Naylor (#6)
1 attachment(s)
Re: unused_oids script is broken with bsd sed

On 04/25/2018 10:55 AM, John Naylor wrote:

On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:

my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');

This is preferred over using awk, which would have its own portability
issues (Windows for starters).

While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.

The logic for getting the set of oids should be centralized, if not in
Catalog.pm then in a script which serves both for dup0licate_oids and
unused_oids.

Here is something I cobbled together for the latter approach. It could
probably improve by using Catalog::FindDefinedSymbol()

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

unused_oids.plapplication/x-perl; name=unused_oids.plDownload
#10Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: John Naylor (#6)
1 attachment(s)
Re: unused_oids script is broken with bsd sed

On 25 Apr 2018, at 17:55, John Naylor <jcnaylor@gmail.com> wrote:

On 4/25/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

On 25 Apr 2018, at 17:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think we should rewrite
both of them to use the Catalog.pm infrastructure.

Okay, seems reasonable. I'll put shared code in Catalog.pm and
update patch.

I don't think you need any new code in Catalog.pm, I believe the
suggestion was just to use that module as a stable interface to the
data. Looking at your patch, I'll mention that we have an idiom for
extracting #define'd OID symbols, e.g.:

my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
'access/transam.h', \@include_path, 'FirstBootstrapObjectId');

This is preferred over using awk, which would have its own portability
issues (Windows for starters).

While I'm thinking out loud, it might be worthwhile to patch genbki.pl
for the duplicate test, since they're run at the same time anyway (see
catalog/Makefile), and we've already read all the data.

-John Naylor

New version is attached. I've put iterator into Catalog.pm to avoid copy-pasting
it over two scripts. Also instead of greping through *.dat and *.h files I've
used parsers provided in Catalog module. That way should be more sustainable
to format changes.

Anyone who wanted to help with scripts -- feel free to rewrite.

Attachments:

0001-Rewrite-unused_oids-in-perl.patchapplication/octet-stream; name=0001-Rewrite-unused_oids-in-perl.patch; x-unix-mode=0644Download
From a86a01be7f35d2f71cdfc3e887d29e2d5edf820a Mon Sep 17 00:00:00 2001
From: Stas Kelvich <stanconn@gmail.com>
Date: Wed, 25 Apr 2018 13:00:28 +0300
Subject: [PATCH] Rewrite unused_oids in perl

unused_oids script was broken with bsd sed after 372728b0d49. It was
possible to write more portable regexps but it seems easier to just rewrite
unused_oids in perl to match duplicate_oids. Also add in-place complain
about duplicates instead of running uniq through oids array.
---
 src/backend/catalog/Catalog.pm     | 47 ++++++++++++++++++++++++++++
 src/include/catalog/duplicate_oids | 23 +++++---------
 src/include/catalog/unused_oids    | 63 ++++++++++++++++++--------------------
 3 files changed, 85 insertions(+), 48 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 6305a2b362..5caf428b6a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -40,7 +40,9 @@ sub ParseHeader
 
 		$catalog{columns} = [];
 		$catalog{toasting} = [];
+		$catalog{toasting_oids} = [];
 		$catalog{indexing} = [];
+		$catalog{indexing_oids} = [];
 		$catalog{client_code} = [];
 
 		open(my $ifh, '<', $input_file) || die "$input_file: $!";
@@ -89,6 +91,7 @@ sub ParseHeader
 				my ($toast_name, $toast_oid, $index_oid) = ($1, $2, $3);
 				push @{ $catalog{toasting} },
 				  "declare toast $toast_oid $index_oid on $toast_name\n";
+				push @{ $catalog{toasting_oids} }, ($toast_oid, $index_oid);
 			}
 			elsif (/^DECLARE_(UNIQUE_)?INDEX\(\s*(\w+),\s*(\d+),\s*(.+)\)/)
 			{
@@ -99,6 +102,7 @@ sub ParseHeader
 					"declare %sindex %s %s %s\n",
 					$is_unique ? 'unique ' : '',
 					$index_name, $index_oid, $using);
+				push @{ $catalog{indexing_oids} }, ($index_oid);
 			}
 			elsif (/^BUILD_INDICES/)
 			{
@@ -389,4 +393,47 @@ sub FindDefinedSymbolFromData
 	die "no definition found for $symbol\n";
 }
 
+sub FindAllOidsFromHeaders
+{
+	my @input_files = @_;
+	my @oids = ();
+
+	foreach my $header (@input_files)
+	{
+		$header =~ /(.+)\.h$/
+		or die "Input files need to be header files.\n";
+		my $datfile = "$1.dat";
+
+		my $catalog = Catalog::ParseHeader($header);
+
+		push @oids, $catalog->{rowtype_oid} if ($catalog->{rowtype_oid});
+		push @oids, $catalog->{relation_oid} if ($catalog->{relation_oid});
+
+		# pg_class row oids are duplicated in typrelid of pg_type
+		if (exists($catalog->{catname}) and ($catalog->{catname} eq 'pg_class'))
+		{
+			next;
+		}
+
+		if (-e $datfile)
+		{
+			my $catdata = Catalog::ParseData($datfile, $catalog->{columns}, 0);
+			foreach my $row (@$catdata)
+			{
+				# filter entries for bootstrapped catalogs entries in pg_type
+				if (($catalog->{catname} eq 'pg_type') and ($row->{typcategory} eq 'C'))
+				{
+					next;
+				}
+				push @oids, $row->{oid} if defined $row->{oid};
+			}
+		}
+
+		push @oids, @{ $catalog->{toasting_oids} };
+		push @oids, @{ $catalog->{indexing_oids} };
+	}
+
+	return \@oids;
+}
+
 1;
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
index 8c143cf06f..c818dc8a67 100755
--- a/src/include/catalog/duplicate_oids
+++ b/src/include/catalog/duplicate_oids
@@ -1,27 +1,20 @@
 #!/usr/bin/perl
 
+use lib '../../backend/catalog/';
+use Catalog;
+
 use strict;
 use warnings;
 
-BEGIN
-{
-	@ARGV = (glob("pg_*.h"), glob("pg_*.dat"), qw(indexing.h toasting.h));
-}
+my @input_files =  (glob("pg_*.h"), qw(indexing.h toasting.h));
+
+my @oids = @{ Catalog::FindAllOidsFromHeaders(@input_files) };
 
 my %oidcounts;
 
-while (<>)
+foreach my $oid (@oids)
 {
-	next if /^CATALOG\(.*BKI_BOOTSTRAP/;
-	next
-	  unless /\boid *=> *'(\d+)'/
-		  || /^CATALOG\([^,]*, *(\d+).*BKI_ROWTYPE_OID\((\d+),/
-		  || /^CATALOG\([^,]*, *(\d+)/
-		  || /^DECLARE_INDEX\([^,]*, *(\d+)/
-		  || /^DECLARE_UNIQUE_INDEX\([^,]*, *(\d+)/
-		  || /^DECLARE_TOAST\([^,]*, *(\d+), *(\d+)/;
-	$oidcounts{$1}++;
-	$oidcounts{$2}++ if $2;
+	$oidcounts{$oid}++;
 }
 
 my $found = 0;
diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index f71222d50d..4b0e16cc02 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/usr/bin/perl
 #
 # unused_oids
 #
@@ -15,43 +15,40 @@
 #
 #	run this script in src/include/catalog.
 #
+use lib '../../backend/catalog/';
+use Catalog;
 
+use strict;
+use warnings;
 
-AWK="awk"
+my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
+my @oids;
 
-# Get FirstBootstrapObjectId from access/transam.h
-FIRSTOBJECTID=`grep '#define[ 	]*FirstBootstrapObjectId' ../access/transam.h | $AWK '{ print $3 }'`
-export FIRSTOBJECTID
+push @oids, @{ Catalog::FindAllOidsFromHeaders(@input_files) };
 
-# this part (down to the uniq step) should match the duplicate_oids script
-# note: we exclude BKI_BOOTSTRAP relations since they are expected to have
-# matching data entries in pg_class.dat and pg_type.dat
+my $FirstBootstrapObjectId = Catalog::FindDefinedSymbol(
+	'access/transam.h', [".."], 'FirstBootstrapObjectId');
+push @oids, $FirstBootstrapObjectId;
 
-cat pg_*.h pg_*.dat toasting.h indexing.h |
-egrep -v -e '^CATALOG\(.*BKI_BOOTSTRAP' | \
-sed -n	-e 's/.*\boid *=> *'\''\([0-9][0-9]*\)'\''.*$/\1/p' \
-	-e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*BKI_ROWTYPE_OID(\([0-9][0-9]*\),.*$/\1,\2/p' \
-	-e 's/^CATALOG([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_UNIQUE_INDEX([^,]*, *\([0-9][0-9]*\).*$/\1/p' \
-	-e 's/^DECLARE_TOAST([^,]*, *\([0-9][0-9]*\), *\([0-9][0-9]*\).*$/\1,\2/p' | \
-tr ',' '\n' | \
-sort -n | \
-uniq | \
-$AWK '
-BEGIN {
-	last = 0;
-}
-/^[0-9]/ {
-	if ($1 > last + 1) {
-		if ($1 > last + 2) {
-			print last + 1, "-", $1 - 1;
-		} else {
-			print last + 1;
+
+my $prev_oid = 0;
+foreach my $oid (sort { $a <=> $b } @oids)
+{
+	if ($oid > $prev_oid + 1)
+	{
+		if ($oid > $prev_oid + 2)
+		{
+			print "@{[ $prev_oid + 1 ]} - @{[ $oid - 1 ]}\n";
 		}
+		else
+		{
+			print "@{[ $prev_oid + 1 ]}\n";
+		}
+	}
+	elsif ($oid == $prev_oid)
+	{
+		print "Duplicate oid detected: $oid\n";
+		exit 1;
 	}
-	last = $1;
+	$prev_oid = $oid;
 }
-END {
-	print last + 1, "-", ENVIRON["FIRSTOBJECTID"]-1;
-}'
-- 
2.16.2

#11John Naylor
jcnaylor@gmail.com
In reply to: Tom Lane (#3)
1 attachment(s)
Re: unused_oids script is broken with bsd sed

On 4/25/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

+many for rewriting in perl. Do you want to have a go at that? If not I
will.

+1 ... I was mildly astonished that this didn't already have to happen
as part of the bootstrap data conversion effort. It's certainly not
hard to imagine future extensions to the .dat file format that would
break this script, and duplicate_oids too. I think we should rewrite
both of them to use the Catalog.pm infrastructure.

If we're going to use Catalog.pm for that, it seems more convenient to
expose toast and index oids directly rather than in strings formatted
specifically for the bki file, as in the attached. Thoughts?

-John Naylor

Attachments:

expose-toast-and-index-oids.patchtext/x-patch; charset=US-ASCII; name=expose-toast-and-index-oids.patchDownload
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 6305a2b..601d680 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -86,23 +86,13 @@ sub ParseHeader
 			# Push the data into the appropriate data structure.
 			if (/^DECLARE_TOAST\(\s*(\w+),\s*(\d+),\s*(\d+)\)/)
 			{
-				my ($toast_name, $toast_oid, $index_oid) = ($1, $2, $3);
 				push @{ $catalog{toasting} },
-				  "declare toast $toast_oid $index_oid on $toast_name\n";
+				  {name => $1, oid => $2, index_oid => $3};
 			}
 			elsif (/^DECLARE_(UNIQUE_)?INDEX\(\s*(\w+),\s*(\d+),\s*(.+)\)/)
 			{
-				my ($is_unique, $index_name, $index_oid, $using) =
-				  ($1, $2, $3, $4);
 				push @{ $catalog{indexing} },
-				  sprintf(
-					"declare %sindex %s %s %s\n",
-					$is_unique ? 'unique ' : '',
-					$index_name, $index_oid, $using);
-			}
-			elsif (/^BUILD_INDICES/)
-			{
-				push @{ $catalog{indexing} }, "build indices\n";
+				  {is_unique => $1, name => $2, oid => $3, using => $4};
 			}
 			elsif (/^CATALOG\((\w+),(\d+),(\w+)\)/)
 			{
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 9cf2626..bb6a99d 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -100,13 +100,18 @@ foreach my $header (@input_files)
 		$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
 	}
 
-	foreach my $toast_decl (@{ $catalog->{toasting} })
+	foreach my $toast (@{ $catalog->{toasting} })
 	{
-		push @toast_decls, $toast_decl;
+		push @toast_decls,
+		  sprintf "declare toast %s %s on %s\n",
+		    $toast->{oid}, $toast->{index_oid}, $toast->{name};
 	}
-	foreach my $index_decl (@{ $catalog->{indexing} })
+	foreach my $index (@{ $catalog->{indexing} })
 	{
-		push @index_decls, $index_decl;
+		push @index_decls,
+		  sprintf "declare %sindex %s %s %s\n",
+			$index->{is_unique} ? 'unique ' : '',
+			$index->{name}, $index->{oid}, $index->{using};
 	}
 }
 
@@ -463,6 +468,9 @@ foreach my $declaration (@index_decls)
 	print $bki $declaration;
 }
 
+# last command in the BKI file: build the indexes declared above
+print $bki "build indices\n";
+
 
 # Now generate schemapg.h
 
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 42499e2..2491582 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -47,7 +47,6 @@ extern void CatalogTupleDelete(Relation heapRel, ItemPointer tid);
  */
 #define DECLARE_INDEX(name,oid,decl) extern int no_such_variable
 #define DECLARE_UNIQUE_INDEX(name,oid,decl) extern int no_such_variable
-#define BUILD_INDICES
 
 
 /*
@@ -361,7 +360,4 @@ DECLARE_UNIQUE_INDEX(pg_subscription_subname_index, 6115, on pg_subscription usi
 DECLARE_UNIQUE_INDEX(pg_subscription_rel_srrelid_srsubid_index, 6117, on pg_subscription_rel using btree(srrelid oid_ops, srsubid oid_ops));
 #define SubscriptionRelSrrelidSrsubidIndexId 6117
 
-/* last step of initialization script: build the indexes declared above */
-BUILD_INDICES
-
 #endif							/* INDEXING_H */
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#11)
Re: unused_oids script is broken with bsd sed

John Naylor <jcnaylor@gmail.com> writes:

On 4/25/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think we should rewrite
both of them to use the Catalog.pm infrastructure.

If we're going to use Catalog.pm for that, it seems more convenient to
expose toast and index oids directly rather than in strings formatted
specifically for the bki file, as in the attached. Thoughts?

Good idea, I like this better than what Stas did in his patch.

I'm a bit inclined to preserve the old variable names (toast_name,
toast_oid etc) as the hash keys because they seem more greppable
than the generic "oid", "name" that you have here. But maybe that
isn't an interesting consideration.

regards, tom lane

#13John Naylor
jcnaylor@gmail.com
In reply to: Stas Kelvich (#10)
Re: unused_oids script is broken with bsd sed

On 4/26/18, Stas Kelvich <s.kelvich@postgrespro.ru> wrote:

New version is attached. I've put iterator into Catalog.pm to avoid
copy-pasting
it over two scripts. Also instead of greping through *.dat and *.h files
I've
used parsers provided in Catalog module. That way should be more
sustainable
to format changes.

Anyone who wanted to help with scripts -- feel free to rewrite.

Looks pretty good. Some comments:

Just after you posted, I sent a patch that tweaks the API of
Catalog.pm for toast and index oids. If you use that API in your
patch, you can get rid of the extra bookkeeping you added for those
oids.

The original scripts skipped the relation and rowtype oids for
bootstrap catalogs. You've replaced that with two data-level tests for
pg_class plus pg_type composite types. I think the original test was a
bit cleaner in this regard.

For this type of call:

+my @oids = @{ Catalog::FindAllOidsFromHeaders(@input_files) };
+foreach my $oid (@oids)

this style is used by other callers of the module:

+my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
+foreach my $oid (@{ $oids })

For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.

-John Naylor

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#13)
Re: unused_oids script is broken with bsd sed

John Naylor <jcnaylor@gmail.com> writes:

Just after you posted, I sent a patch that tweaks the API of
Catalog.pm for toast and index oids. If you use that API in your
patch, you can get rid of the extra bookkeeping you added for those
oids.

I've adjusted these two patches to play together, and pushed them.

The original scripts skipped the relation and rowtype oids for
bootstrap catalogs. You've replaced that with two data-level tests for
pg_class plus pg_type composite types. I think the original test was a
bit cleaner in this regard.

Yeah, I thought so too. Changed.

For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.

I didn't touch this point.

I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters. I think we should
do what you said upthread and teach genbki.pl to complain about duplicate
oids, so that we can remove duplicate_oids from the build process.

I went ahead and pushed things as-is so we can confirm that duplicate_oids
has no portability issues that the buildfarm could find. But let's try
to get the build process change in place after a day or so.

regards, tom lane

#15John Naylor
jcnaylor@gmail.com
In reply to: Tom Lane (#14)
2 attachment(s)
Re: unused_oids script is broken with bsd sed

On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

John Naylor <jcnaylor@gmail.com> writes:

For those following along, these scripts still assume we're in the
catalog directory. I can hack on that part tomorrow if no one else
has.

I didn't touch this point.

Patch 0002 is my attempt at this. Not sure how portable this is. It's
also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.

I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters.

If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.

I think we should
do what you said upthread and teach genbki.pl to complain about duplicate
oids, so that we can remove duplicate_oids from the build process.

I went ahead and pushed things as-is so we can confirm that duplicate_oids
has no portability issues that the buildfarm could find. But let's try
to get the build process change in place after a day or so.

Patch 0001
-moves the compile-time test into genbki.pl
-removes a usability quirk from unused_oids
-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact
-updates the documentation.

-John Naylor

Attachments:

0001-Remove-standalone-script-duplicate_oids.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-standalone-script-duplicate_oids.patchDownload
From 9cf5c517b6a9359fb1863e3315f5412860db45ff Mon Sep 17 00:00:00 2001
From: John Naylor <jcnaylor@gmail.com>
Date: Thu, 26 Apr 2018 15:54:53 +0700
Subject: [PATCH 1/2] Remove standalone script duplicate_oids

Commit 5602265f770 added duplicate checking to unused_oids, so use
that script for manual checks. Also change it so that it doesn't
exit when encountering a duplicate -- doing so slows down fixing
multiple duplicates.

This left unused_oids as the only caller of FindAllOidsFromHeaders().
Since that function no longer offers any useful abstraction, remove
it from Catalog.pm and inline it into unused_oids.

Make genbki.pl responsible for compile time duplicate checks. This
simplifies the Makefile a bit and keeps from having to read the
catalog files twice.
---
 doc/src/sgml/bki.sgml              |  9 +++----
 src/backend/catalog/Catalog.pm     | 51 --------------------------------------
 src/backend/catalog/Makefile       |  3 +--
 src/backend/catalog/genbki.pl      | 31 ++++++++++++++++++++++-
 src/include/catalog/duplicate_oids | 29 ----------------------
 src/include/catalog/unused_oids    | 49 +++++++++++++++++++++++++++++++++---
 6 files changed, 80 insertions(+), 92 deletions(-)
 delete mode 100755 src/include/catalog/duplicate_oids

diff --git a/doc/src/sgml/bki.sgml b/doc/src/sgml/bki.sgml
index 5a4cd39..087272c 100644
--- a/doc/src/sgml/bki.sgml
+++ b/doc/src/sgml/bki.sgml
@@ -377,13 +377,12 @@
     script <filename>src/include/catalog/unused_oids</filename>.
     It prints inclusive ranges of unused OIDs (e.g., the output
     line <quote>45-900</quote> means OIDs 45 through 900 have not been
-    allocated yet).  Currently, OIDs 1-9999 are reserved for manual
+    allocated yet), as well as any duplicate OIDs found.
+    Currently, OIDs 1-9999 are reserved for manual
     assignment; the <filename>unused_oids</filename> script simply looks
     through the catalog headers and <filename>.dat</filename> files
-    to see which ones do not appear.  You can also use
-    the <filename>duplicate_oids</filename> script to check for mistakes.
-    (That script is run automatically at compile time, and will stop the
-    build if a duplicate is found.)
+    to see which ones do not appear.  In addition, if genbki.pl detects
+    duplicates at compile time, it will stop the build.
    </para>
 
    <para>
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 0b057a8..823e09a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -384,55 +384,4 @@ sub FindDefinedSymbolFromData
 	die "no definition found for $symbol\n";
 }
 
-# Extract an array of all the OIDs assigned in the specified catalog headers
-# and their associated data files (if any).
-sub FindAllOidsFromHeaders
-{
-	my @input_files = @_;
-
-	my @oids = ();
-
-	foreach my $header (@input_files)
-	{
-		$header =~ /(.+)\.h$/
-		  or die "Input files need to be header files.\n";
-		my $datfile = "$1.dat";
-
-		my $catalog = Catalog::ParseHeader($header);
-
-		# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
-		# as those are expected to appear in the initial data for pg_class
-		# and pg_type.  For regular catalogs, include these OIDs.
-		if (!$catalog->{bootstrap})
-		{
-			push @oids, $catalog->{relation_oid}
-			  if ($catalog->{relation_oid});
-			push @oids, $catalog->{rowtype_oid} if ($catalog->{rowtype_oid});
-		}
-
-		# Not all catalogs have a data file.
-		if (-e $datfile)
-		{
-			my $catdata =
-			  Catalog::ParseData($datfile, $catalog->{columns}, 0);
-
-			foreach my $row (@$catdata)
-			{
-				push @oids, $row->{oid} if defined $row->{oid};
-			}
-		}
-
-		foreach my $toast (@{ $catalog->{toasting} })
-		{
-			push @oids, $toast->{toast_oid}, $toast->{toast_index_oid};
-		}
-		foreach my $index (@{ $catalog->{indexing} })
-		{
-			push @oids, $index->{index_oid};
-		}
-	}
-
-	return \@oids;
-}
-
 1;
diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile
index d25d98a..a54197d 100644
--- a/src/backend/catalog/Makefile
+++ b/src/backend/catalog/Makefile
@@ -84,8 +84,7 @@ generated-header-symlinks: $(top_builddir)/src/include/catalog/header-stamp
 # configure run, even in distribution tarballs.  So depending on configure.in
 # instead is cheating a bit, but it will achieve the goal of updating the
 # version number when it changes.
-bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in $(top_srcdir)/src/include/catalog/duplicate_oids
-	cd $(top_srcdir)/src/include/catalog && $(PERL) ./duplicate_oids
+bki-stamp: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(POSTGRES_BKI_DATA) $(top_srcdir)/configure.in
 	$(PERL) -I $(catalogdir) $< --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS)
 	touch $@
 
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index 83b6158..199a068 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -78,6 +78,7 @@ my %catalogs;
 my %catalog_data;
 my @toast_decls;
 my @index_decls;
+my %oidcounts;		# number of times we see an OID
 foreach my $header (@input_files)
 {
 	$header =~ /(.+)\.h$/
@@ -94,10 +95,27 @@ foreach my $header (@input_files)
 		$catalogs{$catname} = $catalog;
 	}
 
+	# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
+	# as those are expected to appear in the initial data for pg_class
+	# and pg_type.  For regular catalogs, include these OIDs.
+	if (!$catalog->{bootstrap})
+	{
+		$oidcounts{ $catalog->{relation_oid} }++
+		  if ($catalog->{relation_oid});
+		$oidcounts{ $catalog->{rowtype_oid} }++
+		  if ($catalog->{rowtype_oid});
+	}
+
 	# Not all catalogs have a data file.
 	if (-e $datfile)
 	{
-		$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
+		my $data = Catalog::ParseData($datfile, $schema, 0);
+		$catalog_data{$catname} = $data;
+		foreach my $row (@$data)
+		{
+			$oidcounts{ $row->{oid} }++ if defined $row->{oid};
+		}
+
 	}
 
 	# If the header file contained toast or index info, build BKI
@@ -119,6 +137,17 @@ foreach my $header (@input_files)
 	}
 }
 
+# Exit if we have any duplicate OIDs.
+my $found = 0;
+foreach my $oid (keys %oidcounts)
+{
+	next unless $oidcounts{$oid} > 1;
+	print "Duplicate oids detected:\n" if !$found;
+	$found = 1;
+	print "$oid\n";
+}
+die if $found;
+
 # Fetch some special data that we will substitute into the output file.
 # CAUTION: be wary about what symbols you substitute into the .bki file here!
 # It's okay to substitute things that are expected to be really constant
diff --git a/src/include/catalog/duplicate_oids b/src/include/catalog/duplicate_oids
deleted file mode 100755
index 0d7aa15..0000000
--- a/src/include/catalog/duplicate_oids
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/usr/bin/perl
-
-use lib '../../backend/catalog/';
-use Catalog;
-
-use strict;
-use warnings;
-
-my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
-
-my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
-
-my %oidcounts;
-
-foreach my $oid (@{$oids})
-{
-	$oidcounts{$oid}++;
-}
-
-my $found = 0;
-
-foreach my $oid (sort { $a <=> $b } keys %oidcounts)
-{
-	next unless $oidcounts{$oid} > 1;
-	$found = 1;
-	print "$oid\n";
-}
-
-exit $found;
diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index a727225..333b547 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -23,16 +23,58 @@ use warnings;
 
 my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
 
-my $oids = Catalog::FindAllOidsFromHeaders(@input_files);
+# Extract an array of all the OIDs assigned in the specified catalog headers
+# and their associated data files (if any).
+my @oids;
+foreach my $header (@input_files)
+{
+	$header =~ /(.+)\.h$/
+	  or die "Input files need to be header files.\n";
+	my $datfile = "$1.dat";
+
+	my $catalog = Catalog::ParseHeader($header);
+
+	# We ignore the pg_class OID and rowtype OID of bootstrap catalogs,
+	# as those are expected to appear in the initial data for pg_class
+	# and pg_type.  For regular catalogs, include these OIDs.
+	if (!$catalog->{bootstrap})
+	{
+		push @oids, $catalog->{relation_oid}
+		  if ($catalog->{relation_oid});
+		push @oids, $catalog->{rowtype_oid}
+		  if ($catalog->{rowtype_oid});
+	}
+
+	# Not all catalogs have a data file.
+	if (-e $datfile)
+	{
+		my $catdata =
+		  Catalog::ParseData($datfile, $catalog->{columns}, 0);
+
+		foreach my $row (@$catdata)
+		{
+			push @oids, $row->{oid} if defined $row->{oid};
+		}
+	}
+
+	foreach my $toast (@{ $catalog->{toasting} })
+	{
+		push @oids, $toast->{toast_oid}, $toast->{toast_index_oid};
+	}
+	foreach my $index (@{ $catalog->{indexing} })
+	{
+		push @oids, $index->{index_oid};
+	}
+}
 
 # Also push FirstBootstrapObjectId to serve as a terminator for the last gap.
 my $FirstBootstrapObjectId =
   Catalog::FindDefinedSymbol('access/transam.h', [".."],
 	'FirstBootstrapObjectId');
-push @{$oids}, $FirstBootstrapObjectId;
+push @oids, $FirstBootstrapObjectId;
 
 my $prev_oid = 0;
-foreach my $oid (sort { $a <=> $b } @{$oids})
+foreach my $oid (sort { $a <=> $b } @oids)
 {
 	if ($oid > $prev_oid + 1)
 	{
@@ -48,7 +90,6 @@ foreach my $oid (sort { $a <=> $b } @{$oids})
 	elsif ($oid == $prev_oid)
 	{
 		print "Duplicate oid detected: $oid\n";
-		exit 1;
 	}
 	$prev_oid = $oid;
 }
-- 
2.7.4

0002-Remove-requirement-to-call-unused_oids-from-its-dire.patchtext/x-patch; charset=US-ASCII; name=0002-Remove-requirement-to-call-unused_oids-from-its-dire.patchDownload
From 337d1eee0473dd5742993161256692bed25097fa Mon Sep 17 00:00:00 2001
From: John Naylor <jcnaylor@gmail.com>
Date: Thu, 26 Apr 2018 17:50:28 +0700
Subject: [PATCH 2/2] Remove requirement to call unused_oids from its directory

No doc change, since that fact wasn't mentioned before.

Also do an editing pass over the boilerplate comments.
---
 src/include/catalog/unused_oids | 42 +++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/src/include/catalog/unused_oids b/src/include/catalog/unused_oids
index 333b547..120f744 100755
--- a/src/include/catalog/unused_oids
+++ b/src/include/catalog/unused_oids
@@ -1,33 +1,47 @@
-#!/usr/bin/perl
+#!/usr/bin/perl -w
+#----------------------------------------------------------------------
 #
 # unused_oids
+#    Finds blocks of manually-assignable oids that have not already been
+#    claimed by previous hackers.  The main use is for finding available
+#    OIDs for new internal functions.  The numbers printed are inclusive
+#    ranges of unused OIDs. It also reports any duplicate OIDs found.
 #
-# src/include/catalog/unused_oids
+#    Before using a large empty block, make sure you aren't about
+#    to take over what was intended as expansion space for something
+#    else.
 #
-#	finds blocks of manually-assignable oids that have not already been
-#	claimed by post_hackers.  primarily useful for finding available
-#	oids for new internal functions.  the numbers printed are inclusive
-#	ranges of unused oids.
+#    Note: You must pass the location of the Catalog.pm module to the
+#    Perl interpreter:
+#    perl -I /path/to/Catalog.pm /path/to/unused_oids
 #
-#	before using a large empty block, make sure you aren't about
-#	to take over what was intended as expansion space for something
-#	else.
+# Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+# Portions Copyright (c) 1994, Regents of the University of California
 #
-#	run this script in src/include/catalog.
+# src/include/catalog/unused_oids
 #
-use lib '../../backend/catalog/';
+#----------------------------------------------------------------------
+
 use Catalog;
+use File::Basename qw(dirname);
+my $catdir = dirname(__FILE__);
 
 use strict;
 use warnings;
 
-my @input_files = (glob("pg_*.h"), qw(indexing.h toasting.h));
+
+opendir(my $cd, $catdir)
+  or die "Can't opendir $catdir $!";
+my @filenames = grep { /(pg_\w+|toasting|indexing)\.h$/ } readdir($cd);
+closedir $cd;
 
 # Extract an array of all the OIDs assigned in the specified catalog headers
 # and their associated data files (if any).
 my @oids;
-foreach my $header (@input_files)
+foreach my $filename (@filenames)
 {
+	my $header = "$catdir/$filename";
+
 	$header =~ /(.+)\.h$/
 	  or die "Input files need to be header files.\n";
 	my $datfile = "$1.dat";
@@ -69,7 +83,7 @@ foreach my $header (@input_files)
 
 # Also push FirstBootstrapObjectId to serve as a terminator for the last gap.
 my $FirstBootstrapObjectId =
-  Catalog::FindDefinedSymbol('access/transam.h', [".."],
+  Catalog::FindDefinedSymbol('access/transam.h', ["$catdir/.."],
 	'FirstBootstrapObjectId');
 push @oids, $FirstBootstrapObjectId;
 
-- 
2.7.4

#16Andreas Karlsson
andreas@proxel.se
In reply to: John Naylor (#15)
Re: unused_oids script is broken with bsd sed

On 04/26/2018 01:37 PM, John Naylor wrote:> Patch 0002 is my attempt at
this. Not sure how portable this is. It's

also clunkier than before in that you need to tell it where to find
Catalog.pm, since it seems Perl won't compile unless "use lib" points
to a string and not an expression. Suggestions welcome. I also edited
the boilerplate comments.

What about using FindBin (http://perldoc.perl.org/FindBin.html)?

Andreas

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Karlsson (#16)
Re: unused_oids script is broken with bsd sed

Andreas Karlsson <andreas@proxel.se> writes:

What about using FindBin (http://perldoc.perl.org/FindBin.html)?

A quick check suggests that that's present in the core perl
distribution pretty far back, so I think it'll work.
Question is, is it any better than John's "dirname(__FILE__)"
solution?

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: unused_oids script is broken with bsd sed

I wrote:

Andreas Karlsson <andreas@proxel.se> writes:

What about using FindBin (http://perldoc.perl.org/FindBin.html)?

A quick check suggests that that's present in the core perl
distribution pretty far back, so I think it'll work.
Question is, is it any better than John's "dirname(__FILE__)"
solution?

I experimented a bit, and found that actually what we seem to
need is FindBin::RealBin. That's the only one of these solutions
that gives the right answer if someone's put a symlink to
unused_oids or duplicate_oids into their PATH --- the other ones
give you the directory containing the symlink. Now, maybe that
would be an uncommon usage, but it hardly seems out of the question.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Naylor (#15)
Re: unused_oids script is broken with bsd sed

John Naylor <jcnaylor@gmail.com> writes:

On 4/26/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I notice that duplicate_oids is now a good factor of 10 slower than it was
before (~0.04 sec to ~0.4 sec, on my machine). While this doesn't seem
like a problem for manual use, it seems annoying as part of the standard
build process, especially on slower buildfarm critters.

If you're concerned about build speed, I think the generated *_d.h
headers cause a lot more files to be rebuilt than before. I'll think
about how to recover that.

Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly. Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed? I'm not sure; that would cost something too.

-removes duplicate_oids (not sure if you intended this, but since
unused_oids has already been committed to report duplicates, we may as
well standardize on that) and cleans up after that fact

I don't think anyone's asked for duplicate_oids to be removed altogether,
and I'm afraid that doing so would break existing workflows. For that
matter, now that I think about it, changing the behavior of unused_oids
as we did yesterday was ill-considered as well, so I undid that part.

I've pushed the current-directory-independence change by itself so that
we can see if any buildfarm members think it's problematic. (They'll
still be testing duplicate_oids, as things stand.) Once we have some
confirmation on the FindBin stuff actually being portable, I'll push
the changes to make genbki do duplicate checks and remove duplicate_oids
from the build sequence.

regards, tom lane

#20Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
Re: unused_oids script is broken with bsd sed

On Thu, Apr 26, 2018 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly. Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed? I'm not sure; that would cost something too.

It seems like a good idea to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#20)
1 attachment(s)
Re: unused_oids script is broken with bsd sed

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Apr 26, 2018 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Personally, I use ccache which doesn't seem to care too much, but I agree
than in some usages, extra touches of headers would be costly. Perhaps
it's worth adding logic to avoid overwriting an existing output file
unless it changed? I'm not sure; that would cost something too.

It seems like a good idea to me.

I took a quick look into this. It's very easy to do so far as the Perl
code is concerned: just teach Catalog::RenameTempFile that if the target
file already exists, run File::Compare::compare to see if its contents are
identical to the temp file, and if so unlink the temp file rather than
renaming. I've tried this, it works, and I can't measure any difference
in the runtime of genbki.pl (at least on my usual dev machine). So it
seems like there's little downside.

However, RenameTempFile is also used by Gen_fmgrtab.pl, and having the
same sort of no-touch semantics for fmgroids.h and friends would have some
additional fallout. The makefiles would think they have to keep
re-running Gen_fmgrtab.pl if fmgroids.h is older than the mod time on any
input file, and that's certainly no good. We can fix that by inventing a
stamp file for the Gen_fmgrtab.pl run, analogous to bki-stamp for the
genbki.pl run, but that means changes in the makefiles that go a bit
beyond the realm of triviality.

A compromise answer is to arrange things so that genbki.pl has no-touch
semantics but Gen_fmgrtab.pl doesn't, requiring either two support
functions in Catalog.pm or an extra argument to RenameTempFile. But
that's really ugly. Besides, if we're trying to avoid excess recompiles
due to useless touches of common header files, then failing to have
no-touch behavior for fmgroids.h is leaving a lot of money on the table.
So I don't like that answer at all.

Anyway, I'm happy to go make this happen; it looks like only another hour
or so's worth of work to fix the makefiles. But I wonder if anyone will
say this is too much churn for post-feature-freeze and we should shelve
it till v12.

regards, tom lane

Attachments:

wip-dont-overwrite-unchanged-headers.patchtext/x-diff; charset=us-ascii; name=wip-dont-overwrite-unchanged-headers.patchDownload
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index eee7cb3..1b31cdd 100644
*** a/src/backend/catalog/Catalog.pm
--- b/src/backend/catalog/Catalog.pm
*************** package Catalog;
*** 16,21 ****
--- 16,24 ----
  use strict;
  use warnings;
  
+ use File::Compare;
+ 
+ 
  # Parses a catalog header file into a data structure describing the schema
  # of the catalog.
  sub ParseHeader
*************** sub AddDefaultValues
*** 336,350 ****
  }
  
  # Rename temporary files to final names.
! # Call this function with the final file name and the .tmp extension
  # Note: recommended extension is ".tmp$$", so that parallel make steps
! # can't use the same temp files
  sub RenameTempFile
  {
  	my $final_name = shift;
  	my $extension  = shift;
  	my $temp_name  = $final_name . $extension;
! 	rename($temp_name, $final_name) || die "rename: $temp_name: $!";
  }
  
  # Find a symbol defined in a particular header file and extract the value.
--- 339,367 ----
  }
  
  # Rename temporary files to final names.
! # Call this function with the final file name and the .tmp extension.
! #
! # If the final file already exists and has identical contents, don't
! # overwrite it; this behavior avoids unnecessary recompiles due to updating
! # the mod date on header files.
! #
  # Note: recommended extension is ".tmp$$", so that parallel make steps
! # can't use the same temp files.
  sub RenameTempFile
  {
  	my $final_name = shift;
  	my $extension  = shift;
  	my $temp_name  = $final_name . $extension;
! 
! 	if (-f $final_name &&
! 		compare($temp_name, $final_name) == 0)
! 	{
! 		unlink $temp_name || die "unlink: $temp_name: $!";
! 	}
! 	else
! 	{
! 		rename($temp_name, $final_name) || die "rename: $temp_name: $!";
! 	}
  }
  
  # Find a symbol defined in a particular header file and extract the value.
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: unused_oids script is broken with bsd sed

Tom Lane wrote:

However, RenameTempFile is also used by Gen_fmgrtab.pl, and having the
same sort of no-touch semantics for fmgroids.h and friends would have some
additional fallout. The makefiles would think they have to keep
re-running Gen_fmgrtab.pl if fmgroids.h is older than the mod time on any
input file, and that's certainly no good. We can fix that by inventing a
stamp file for the Gen_fmgrtab.pl run, analogous to bki-stamp for the
genbki.pl run, but that means changes in the makefiles that go a bit
beyond the realm of triviality.

Sounds OK to me -- a stamp file is already established technique, so it
shouldn't go *too much* beyond triviality. I do note that
msvc/Solution.pm runs Gen_fmgrtab.pl, but it shouldn't require any
changes anyhow.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: unused_oids script is broken with bsd sed

On Thu, May 3, 2018 at 3:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I'm happy to go make this happen; it looks like only another hour
or so's worth of work to fix the makefiles. But I wonder if anyone will
say this is too much churn for post-feature-freeze and we should shelve
it till v12.

I think we can be a bit more liberal about build system changes than
we would be with core code. So I'd say go for it.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#22)
Re: unused_oids script is broken with bsd sed

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

However, RenameTempFile is also used by Gen_fmgrtab.pl, and having the
same sort of no-touch semantics for fmgroids.h and friends would have some
additional fallout. The makefiles would think they have to keep
re-running Gen_fmgrtab.pl if fmgroids.h is older than the mod time on any
input file, and that's certainly no good. We can fix that by inventing a
stamp file for the Gen_fmgrtab.pl run, analogous to bki-stamp for the
genbki.pl run, but that means changes in the makefiles that go a bit
beyond the realm of triviality.

Sounds OK to me -- a stamp file is already established technique, so it
shouldn't go *too much* beyond triviality.

Yeah, what I'm envisioning is to change the makefile rules around these
files to look as much as possible like the ones around the BKI files,
which are (we hope) already debugged. So it doesn't seem like a high
risk change ... at least so far as the makefiles are concerned.

I do note that
msvc/Solution.pm runs Gen_fmgrtab.pl, but it shouldn't require any
changes anyhow.

Hmm. Actually, given the IsNewer checks there, it looks like Solution.pm
is basically hand-rolling makefile-like dependency checking, which means
it would be fooled by no-touch updates in the same way as make is, causing
rebuilds to do unnecessary work. We could live with that for awhile
maybe, but ultimately Solution.pm would need to be fixed to use a stamp
file like the makefile logic.

I could take a stab at that, but I don't have any way to test it myself.

regards, tom lane

#25Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Tom Lane (#21)
Re: unused_oids script is broken with bsd sed

On 5/3/18 15:37, Tom Lane wrote:

I took a quick look into this. It's very easy to do so far as the Perl
code is concerned:

I think in order to introduce warts like that, we have to have really
great savings. I haven't seen any actual analysis what the current
problem is, other than one person expression a suspicion.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#25)
Re: unused_oids script is broken with bsd sed

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 5/3/18 15:37, Tom Lane wrote:

I took a quick look into this. It's very easy to do so far as the Perl
code is concerned:

I think in order to introduce warts like that, we have to have really
great savings. I haven't seen any actual analysis what the current
problem is, other than one person expression a suspicion.

Well, the work's already done, and personally I think the code is
cleaner after 9bf28f96c and bad51a49a regardless of whether there's any
performance benefit. You could call 1f1cd9b5d a wart if you wanted.
But we've done largely the same thing to pgindent, for one, in the past
year or so, and I find that to be a usability improvement, independently
of whether there's any build performance win. My editor gets cranky
when files change under it for no reason.

regards, tom lane

#27John Naylor
jcnaylor@gmail.com
In reply to: Tom Lane (#26)
Re: unused_oids script is broken with bsd sed

On 5/4/18, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, the work's already done, and personally I think the code is
cleaner after 9bf28f96c and bad51a49a regardless of whether there's any
performance benefit. You could call 1f1cd9b5d a wart if you wanted.
But we've done largely the same thing to pgindent, for one, in the past
year or so, and I find that to be a usability improvement, independently
of whether there's any build performance win. My editor gets cranky
when files change under it for no reason.

Thanks for looking into that. I meant to do so, but it got stuck on
the back burner. I'm happy to report that, running non-parallel make
on my old laptop goes from over 5 minutes to about 5 seconds, when
touching a catalog file.

-John Naylor