[PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

Started by David Christensenover 10 years ago5 messages

david@endpoint.com

over 10 years ago

1 attachment(s)

Fixes a build issue I ran into while adding some columns to system tables:

Throws a build error if we encounter a different number of fields in a
DATA() line than we expect for the catalog in question.

Previously, it was possible to silently ignore any mismatches at build
time which could result in symbol undefined errors at link time. Now
we stop and identify the infringing line as soon as we encounter it,
which greatly speeds up the debugging process.

Attachments:

0001-Teach-Catalog.pm-how-many-attributes-there-should-be.patchapplication/octet-stream; name=0001-Teach-Catalog.pm-how-many-attributes-there-should-be.patchDownload

>From 7b5f737fe2538ec7b52e047d24b4f3f03c0fae95 Mon Sep 17 00:00:00 2001
From: David Christensen <david@endpoint.com>
Date: Mon, 5 Oct 2015 20:22:36 -0400
Subject: [PATCH] Teach Catalog.pm how many attributes there should be per
 DATA() line

Throws a build error if we encounter a different number of fields in a
DATA() line than we expect for the catalog in question.

Previously, it was possible to silently ignore any mismatches at build
time which could result in symbol undefined errors at link time.  Now
we stop and identify the infringing line as soon as we encounter it,
which greatly speeds up the debugging process.
---
 src/backend/catalog/Catalog.pm | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index 5e70418..41a83e2 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -46,6 +46,9 @@ sub Catalogs
 
 		open(INPUT_FILE, '<', $input_file) || die "$input_file: $!";
 
+		my ($catname) = ($input_file =~ m/(\w+)\.h$/);
+		my $natts_pat = "Natts_$catname";
+
 		# Scan the input file.
 		while (<INPUT_FILE>)
 		{
@@ -70,8 +73,15 @@ sub Catalogs
 			s/\s+/ /g;
 
 			# Push the data into the appropriate data structure.
-			if (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
+			if (/$natts_pat\s+(\d+)/)
+			{
+				$catalog{natts} = $1;
+			}
+			elsif (/^DATA\(insert(\s+OID\s+=\s+(\d+))?\s+\(\s*(.*)\s*\)\s*\)$/)
 			{
+				check_natts($catalog{natts},$3) or
+				  die sprintf "Wrong number of Natts in DATA() line %s:%d\n", $input_file, INPUT_FILE->input_line_number;
+
 				push @{ $catalog{data} }, { oid => $2, bki_values => $3 };
 			}
 			elsif (/^DESCR\(\"(.*)\"\)$/)
@@ -216,4 +226,19 @@ sub RenameTempFile
 	rename($temp_name, $final_name) || die "rename: $temp_name: $!";
 }
 
+# verify the number of fields in the passed-in bki structure
+sub check_natts
+{
+	my ($natts, $bki_val) = @_;
+	unless ($natts)
+	{
+		warn "No Natts defined yet, silently skipping check...\n";
+		return 1;
+	}
+
+	# we're working with a copy and need to count the fields only, so collapse
+	$bki_val =~ s/"[^"]*?"/xxx/g;
+
+	return (split /\s+/ => $bki_val) == $natts;
+}
 1;
-- 
2.3.5

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: David Christensen (#1)

Re: [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

On Tue, Oct 6, 2015 at 9:15 AM, David Christensen <david@endpoint.com> wrote:

Fixes a build issue I ran into while adding some columns to system tables:

Throws a build error if we encounter a different number of fields in a
DATA() line than we expect for the catalog in question.

Previously, it was possible to silently ignore any mismatches at build
time which could result in symbol undefined errors at link time. Now
we stop and identify the infringing line as soon as we encounter it,
which greatly speeds up the debugging process.

I think this is a GREAT idea, but this line made me laugh[1]Because if you're producing a warning, it's not silent!:

+ warn "No Natts defined yet, silently skipping check...\n";

I suggest that we make that a fatal error. Like "Could not find
definition Natts_pg_proc before start of DATA".

Secondly, I don't think we should check this at this point in the
code, because then it blindly affects everybody who uses Catalog.pm.
Let's pick one specific place to do this check. I suggest genbki.pl,
inside the loop with this comment: "# Ordinary catalog with DATA
line(s)"

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: Because if you're producing a warning, it's not silent!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Christensen

david@endpoint.com

over 10 years ago

In reply to: Robert Haas (#2)

Re: [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

On Oct 8, 2015, at 11:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Oct 6, 2015 at 9:15 AM, David Christensen <david@endpoint.com> wrote:

Fixes a build issue I ran into while adding some columns to system tables:

Throws a build error if we encounter a different number of fields in a
DATA() line than we expect for the catalog in question.

Previously, it was possible to silently ignore any mismatches at build
time which could result in symbol undefined errors at link time. Now
we stop and identify the infringing line as soon as we encounter it,
which greatly speeds up the debugging process.

I think this is a GREAT idea, but this line made me laugh[1]:

+ warn "No Natts defined yet, silently skipping check...\n";

I suggest that we make that a fatal error. Like "Could not find
definition Natts_pg_proc before start of DATA”.

That’s fine with me; my main consideration was to make sure nothing broke in the status quo, including dependencies I was unaware of.

Secondly, I don't think we should check this at this point in the
code, because then it blindly affects everybody who uses Catalog.pm.
Let's pick one specific place to do this check. I suggest genbki.pl,
inside the loop with this comment: "# Ordinary catalog with DATA
line(s)"

I’m happy to move it around, but If everything is in order, how will this affect things at all? If we’re in a good state this condition should never trigger.
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: David Christensen (#3)

Re: [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

On Thu, Oct 8, 2015 at 12:43 PM, David Christensen <david@endpoint.com> wrote:

I’m happy to move it around, but If everything is in order, how will this affect things at all? If we’re in a good state this condition should never trigger.

Right, but I think it ought to be Catalog.pm's job to parse the config
file. The job of complaining about what it contains is a job worth
doing, but it's not the same job. Personally, I hate it when parsers
take it upon themselves to do semantic analysis, because then what
happens if you want to write, say, a tool to repair a broken file?
You need to be able to read it in without erroring out so that you can
frob whatever's busted and write it back out, and the parser is
getting in your way. Maybe that's not going to come up here, but I'm
just explaining my general philosophy on these things...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Christensen

david@endpoint.com

over 10 years ago

In reply to: Robert Haas (#4)

Re: [PATCH] Teach Catalog.pm how many attributes there should be per DATA() line

On Oct 9, 2015, at 2:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 8, 2015 at 12:43 PM, David Christensen <david@endpoint.com> wrote:

I’m happy to move it around, but If everything is in order, how will this affect things at all? If we’re in a good state this condition should never trigger.

Right, but I think it ought to be Catalog.pm's job to parse the config
file. The job of complaining about what it contains is a job worth
doing, but it's not the same job. Personally, I hate it when parsers
take it upon themselves to do semantic analysis, because then what
happens if you want to write, say, a tool to repair a broken file?
You need to be able to read it in without erroring out so that you can
frob whatever's busted and write it back out, and the parser is
getting in your way. Maybe that's not going to come up here, but I'm
just explaining my general philosophy on these things…

Not disagreeing with you in general, but this is a very specific use case and I think we lose the niceness of being able to tie back to the specific line number for the file in question—the alternative being to track that information as well in a separate structure which we then pass around, which seems like overkill.

The only two consumers of the catalog-specific data lines (at least via direct access in Perl) are genbki.pl and Gen_fmgtab.pl. We would need to add these checks anyway in both call sites, so to me it seems important to bail early if we see any issues, so I think putting the failure as soon as we notice it with as much context to fix it (i.e., as written) is the right choice. We can certainly pretty up the messages.

The consistency of the system catalogs in the development state is something that is fundamental to whether there is any information that is sensible to query, and by definition if we are missing columns in the data rows this is a mistake and whatever parsed data in here will be worse than useless (as who knows the order of the missing column, data can/will end up being misaligned). Thus I don’t believe that we’d want other (hypothetical) Catalog.pm consumers to try to use data that we know is bad.
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
david@endpoint.com
785-727-1171

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers