Bootstrap DATA is a pita

Started by Andres Freundalmost 11 years ago46 messages

andres@2ndquadrant.com

almost 11 years ago

Hi,

I've been for a long while been rather annoyed about how cumbersome it
is to add catalog rows using the bootstrap format. Especially pg_proc.h,
pg_operator.h, pg_amop.h, pg_amproc.h and some more are really unwieldy.

I think this needs to be improved. And while I'm not going to start
working on it tonight, I do plan to work on it if we can agree on a
design that I think is worth implementing.

The things that bug me most are:

1) When adding new rows it's rather hard to kno which columns are which,
and you have to specify a lot you really don't care about. Especially
in pg_proc that's rather annoying.

2) Having to assign oids for many things that don't actually need is
bothersome and greatly increases the likelihood of conflicts. There's
some rows for which we need fixed oids (pg_type ones for example),
but e.g. for the majority of pg_proc it's unnecessary.

3) Adding a new column to a system catalog, especially pg_proc.h,
basically requires writing a complex regex or program to modify the
header.

Therefore I propose that we add another format to generate the .bki
insert lines.

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

To address 1) we just need to make each row a hash and allow leaving out
columns that have some default value.

2) is a bit more complex. Generally many rows don't need a fixed oid at
all and many others primarily need it to handle object descriptions. The
latter seems best best solved by not making it dependant on the oid
anymore.

3) Seems primarily solved by not requiring default values to be
specified anymore. Also it should be much easier to add new values
automatically to a parseable format.

I think we'll need to generate oid #defines for some catalog contents,
but that seems solveable.

Maybe something rougly like:

# pg_type.data
CatalogData(
'pg_type',
[
{
oid => 2249,
data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
oiddefine => 'CSTRINGOID'
}
]
);

# pg_proc.data
CatalogData(
'pg_proc',
[
{
oid => 1242,
data => {proname => 'boolin', proretttype => 16, proargtypes => [2275], provolatile => 'i'},
description => 'I/O',
},
{
data => {proname => 'mode_final', proretttype => 2283, proargtypes => [2281, 2283]},
description => 'aggregate final function',
}
]
);

There'd need to be some logic to assign default values for columns, and
maybe even simple logic e.g. to determine arguments like pronargs based
on proargtypes.

This is far from fully though through, but I think something very
roughly along these lines could be a remarkable improvement in the ease
of adding new catalog contents.

Comments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

josh@agliodbs.com

almost 11 years ago

In reply to: Andres Freund (#1)

Re: Bootstrap DATA is a pita

On 02/20/2015 03:41 PM, Andres Freund wrote:

What I think we should do is to add pg_<catalog>.data files that contain
the actual data that are automatically parsed by Catalog.pm. Those
contain the rows in some to-be-decided format. I was considering using
json, but it turns out only perl 5.14 started shipping JSON::PP as part
of the standard library. So I guess it's best we just make it a big perl
array + hashes.

What about YAML? That might have been added somewhat earlier.

Or what about just doing CSV?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WM2d385d926a004baeca60214726f2433a0f96c2bd2bd9b5601f44e81dbde9f79c31a9be021bd4168a801a4628f76235dc@asav-3.01.com

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Josh Berkus (#2)

Re: Bootstrap DATA is a pita

On 2/20/15 8:46 PM, Josh Berkus wrote:

What about YAML? That might have been added somewhat earlier.

YAML isn't included in Perl, but there is

Module::Build::YAML - Provides just enough YAML support so that
Module::Build works even if YAML.pm is not installed

which might work.

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Andres Freund (#1)

Re: Bootstrap DATA is a pita

I violently support this proposal.

Maybe something rougly like:

# pg_type.data
CatalogData(
'pg_type',
[
{
oid => 2249,
data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
oiddefine => 'CSTRINGOID'
}
]
);

One concern I have with this is that in my experience different tools
and editors have vastly different ideas on how to format these kinds of
nested structures. I'd try out YAML, or even a homemade fake YAML over
this.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

almost 11 years ago

In reply to: Peter Eisentraut (#4)

Re: Bootstrap DATA is a pita

On 21/02/15 04:22, Peter Eisentraut wrote:

I violently support this proposal.

Maybe something rougly like:

# pg_type.data
CatalogData(
'pg_type',
[
{
oid => 2249,
data => {typname => 'cstring', typlen => -2, typbyval => 1, fake => '...'},
oiddefine => 'CSTRINGOID'
}
]
);

One concern I have with this is that in my experience different tools
and editors have vastly different ideas on how to format these kinds of
nested structures. I'd try out YAML, or even a homemade fake YAML over
this.

+1 for the idea and +1 for YAML(-like) syntax.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Peter Eisentraut (#3)

Re: Bootstrap DATA is a pita

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:

On 2/20/15 8:46 PM, Josh Berkus wrote:

What about YAML? That might have been added somewhat earlier.

YAML isn't included in Perl, but there is

Module::Build::YAML - Provides just enough YAML support so that
Module::Build works even if YAML.pm is not installed

I'm afraid not:

sub Load {
shift if ($_[0] eq __PACKAGE__ || ref($_[0]) eq __PACKAGE__);
die "not yet implemented";
}

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

One concern I have with this is that in my experience different tools
and editors have vastly different ideas on how to format these kinds of
nested structures. I'd try out YAML, or even a homemade fake YAML over
this.

Yes, that's a good point. I have zero desire to open-code a format
though, I think that's a bad idea. We could say we just include
Yaml::Tiny, that's what it's made for.

To allow for changing things programatically without noise I was
wondering whether we shouldn't just load/dump the file at some point of
the build process. Then we're sure the indentation is correct and it can
be changed programatically wihtout requiring manual fixup of comments.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Andres Freund (#6)

Re: Bootstrap DATA is a pita

On 02/21/2015 05:04 AM, Andres Freund wrote:

Yes, that's a good point. I have zero desire to open-code a format
though, I think that's a bad idea. We could say we just include
Yaml::Tiny, that's what it's made for.

Personally, I think I would prefer that we use JSON (and yes, there's a
JSON::Tiny module, which definitely lives up to its name).

For one thing, we've made a feature of supporting JSON, so arguably we
should eat the same dog food.

I also dislike YAML's line oriented format. I'd like to be able to add a
pg_proc entry in a handful of lines instead of 29 or more (pg_proc has
27 attributes, but some of them are arrays, and there's an oid and in
most cases a description to add as well). We could reduce that number by
defaulting some of the attributes (pronamespace, proowner and prolang,
for example) and possibly infering others (pronargs?). Even so it's
going to take up lots of lines of vertical screen real estate. A JSON
format could be more vertically compact. The price for that is that JSON
strings have to be quoted, which I know lots of people hate.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Andrew Dunstan (#7)

Re: Bootstrap DATA is a pita

On 02/21/2015 09:39 AM, Andrew Dunstan wrote:

On 02/21/2015 05:04 AM, Andres Freund wrote:

Yes, that's a good point. I have zero desire to open-code a format
though, I think that's a bad idea. We could say we just include
Yaml::Tiny, that's what it's made for.

Personally, I think I would prefer that we use JSON (and yes, there's
a JSON::Tiny module, which definitely lives up to its name).

For one thing, we've made a feature of supporting JSON, so arguably we
should eat the same dog food.

I also dislike YAML's line oriented format. I'd like to be able to add
a pg_proc entry in a handful of lines instead of 29 or more (pg_proc
has 27 attributes, but some of them are arrays, and there's an oid and
in most cases a description to add as well). We could reduce that
number by defaulting some of the attributes (pronamespace, proowner
and prolang, for example) and possibly infering others (pronargs?).
Even so it's going to take up lots of lines of vertical screen real
estate. A JSON format could be more vertically compact. The price for
that is that JSON strings have to be quoted, which I know lots of
people hate.

Followup:

The YAML spec does support explicit flows like JSON, which would
overcome my objections above, but unfortunately these are not supported
by YAML::Tiny.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Andres Freund (#6)

Re: Bootstrap DATA is a pita

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:

On 2/20/15 8:46 PM, Josh Berkus wrote:

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

Yeah. One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

This will be easier to edit, either manually or programmatically I think.
The code that turns it into a .bki file will need to know the exact set
of columns in each system catalog, but it would have had to know that
anyway I believe, if you're expecting it to insert default values.

Ideally the column defaults could come from BKI_ macros in the catalog/*.h
files; it would be good if we could keep those files as the One Source of
Truth for catalog schema info, even as we split out the initial data.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#9)

Re: Bootstrap DATA is a pita

On 2015-02-21 11:34:09 -0500, Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:

On 2/20/15 8:46 PM, Josh Berkus wrote:

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

Yeah. One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

I don't really like that - then stuff like oid, description, comment (?)
have to not conflict with any catalog columns. I think it's easier to
have them separate.

This will be easier to edit, either manually or programmatically I think.
The code that turns it into a .bki file will need to know the exact set
of columns in each system catalog, but it would have had to know that
anyway I believe, if you're expecting it to insert default values.

There'll need to be some awareness of columns, sure. But I think
programatically editing the values will still be simpler if you don't
need to discern whether a key is a column or some genbki specific value.

Ideally the column defaults could come from BKI_ macros in the catalog/*.h
files; it would be good if we could keep those files as the One Source of
Truth for catalog schema info, even as we split out the initial data.

Hm, yea.

One thing I was considering was to do the regtype and regproc lookups
directly in the tool. That'd have two advantages: 1) it'd make it
possible to refer to typenames in pg_proc, 2) It'd be much faster. Right
now most of initdb's time is doing syscache lookups during bootstrap,
because it can't use indexes... A simple hash lookup during bki
generation could lead to quite measurable savings during lookup.

We could then even rip the bootstrap code out of regtypein/regprocin...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Andrew Dunstan (#8)

Re: Bootstrap DATA is a pita

Andrew Dunstan <andrew@dunslane.net> writes:

On 02/21/2015 09:39 AM, Andrew Dunstan wrote:

Personally, I think I would prefer that we use JSON (and yes, there's
a JSON::Tiny module, which definitely lives up to its name).
For one thing, we've made a feature of supporting JSON, so arguably we
should eat the same dog food.

We've also made a feature of supporting XML, and a lot earlier, so that
argument seems pretty weak.

My only real requirement on the format choice is that it should absolutely
not require any Perl module that's not in a bog-standard installation.
I've gotten the buildfarm code running on several ancient machines now and
in most cases getting the module dependencies dealt with was pure hell.
No non-core modules for a basic build please. I don't care whether they
are "tiny".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Tom Lane (#11)

Re: Bootstrap DATA is a pita

On 02/21/2015 11:43 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 02/21/2015 09:39 AM, Andrew Dunstan wrote:

Personally, I think I would prefer that we use JSON (and yes, there's
a JSON::Tiny module, which definitely lives up to its name).
For one thing, we've made a feature of supporting JSON, so arguably we
should eat the same dog food.

We've also made a feature of supporting XML, and a lot earlier, so that
argument seems pretty weak.

Fair enough

My only real requirement on the format choice is that it should absolutely
not require any Perl module that's not in a bog-standard installation.
I've gotten the buildfarm code running on several ancient machines now and
in most cases getting the module dependencies dealt with was pure hell.
No non-core modules for a basic build please. I don't care whether they
are "tiny".

The point about using the "tiny" modules is that they are so small and
self-contained they can either be reasonably shipped with our code or
embedded directly in the script that uses them, so no extra build
dependency would be created.

However, I rather like your suggestion of this:

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

which is pure perl syntax and wouldn't need any extra module, and has
the advantage over JSON that key names won't need to be quoted, making
it more readable.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Andrew Dunstan (#12)

Re: Bootstrap DATA is a pita

On February 21, 2015 7:20:04 PM CET, Andrew Dunstan <andrew@dunslane.net> wrote:

On 02/21/2015 11:43 AM, Tom Lane wrote:

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

which is pure perl syntax and wouldn't need any extra module, and has
the advantage over JSON that key names won't need to be quoted, making
it more readable.

Yea, my original post suggested using actual perl hashes to avoid problems with the availability of libraries. So far I've not really heard a convincing alternative. Peter's problem with formatting seems to be most easily solved by rewriting the file automatically...

Andres

--
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Andres Freund (#10)

1 attachment(s)

Re: Bootstrap DATA is a pita

On 2015-02-21 17:43:09 +0100, Andres Freund wrote:

One thing I was considering was to do the regtype and regproc lookups
directly in the tool. That'd have two advantages: 1) it'd make it
possible to refer to typenames in pg_proc, 2) It'd be much faster. Right
now most of initdb's time is doing syscache lookups during bootstrap,
because it can't use indexes... A simple hash lookup during bki
generation could lead to quite measurable savings during lookup.

I've *very* quickly hacked this up. Doing this for all regproc columns
gives a consistent speedup in an assert enabled from ~0m3.589s to
~0m2.544s. My guess is that the relative speedup in optimized mode would
actually be even bigger as now most of the time is spent in
AtEOXact_CatCache.

Given that pg_proc is unlikely to get any smaller and that the current
code is essentially O(lookups * #pg_proc), this alone seems to be worth
a good bit.

The same trick should also allow us to simply refer to type names in
pg_proc et al. If we had a way to denote a column being of type
relnamespace/relauthid we could replace
$row->{bki_values} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
$row->{bki_values} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
as well.

The changes in pg_proc.h are just to demonstrate that using names
instead of oids works.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-WIP-resolve-regtype-regproc-in-genbki.pl.patchtext/x-patch; charset=us-asciiDownload

>From 39e6d60969327575b4797186c4577df8edd21fa5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 22 Feb 2015 00:06:18 +0100
Subject: [PATCH] WIP: resolve regtype/regproc in genbki.pl

Faster, and allows us to rely on them earlier.
---
 src/backend/catalog/Catalog.pm           | 15 ++++++
 src/backend/catalog/genbki.pl            | 91 +++++++++++++++++++++++++++++---
 src/backend/utils/adt/regproc.c          | 82 +++-------------------------
 src/include/c.h                          |  3 ++
 src/include/catalog/pg_proc.h            | 30 +++++------
 src/test/regress/expected/opr_sanity.out |  8 +--
 6 files changed, 128 insertions(+), 101 deletions(-)

diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index c7b1c17..64af70b 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -196,6 +196,21 @@ sub Catalogs
 				}
 			}
 		}
+
+		# allow to lookup columns by name
+		$catalog{columns_byname} = {};
+		my @columnnames;
+		my @columntypes;
+
+		foreach my $column (@{ $catalog{columns} })
+		{
+		    $catalog{column_byname}{$column->{'name'}} = $column;
+		    push @columnnames, $column->{'name'};
+		    push @columntypes, $column->{'type'};
+		}
+		$catalog{columnnames} = \@columnnames;
+		$catalog{columntypes} = \@columntypes;
+
 		$catalogs{$catname} = \%catalog;
 		close INPUT_FILE;
 	}
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index a5c78ee..8c55fc1 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -104,6 +104,89 @@ my %schemapg_entries;
 my @tables_needing_macros;
 our @types;
 
+my %catalogs_by_name;
+
+# in a first pass, parse data and build some lookup tables
+foreach my $catname (@{ $catalogs->{names} })
+{
+    my $catalog = $catalogs->{$catname};
+    my %byname;
+    my %byoid;
+    my $name;
+
+    # Column to use for lookup mapping
+    if ($catname eq 'pg_type')
+    {
+	$name = 'typname';
+    }
+    elsif ($catname eq 'pg_proc')
+    {
+	$name = 'proname';
+    }
+
+    foreach my $row (@{ $catalog->{data} })
+    {
+	my %valuesbyname;
+
+	# substitute constant values we acquired above
+	$row->{bki_values} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
+	$row->{bki_values} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
+
+	# split data into actual columns
+	my @values = split /\s+/, $row->{bki_values};
+
+	# store values in a more useful format
+	$row->{values} = \@values;
+
+	# build lookup table if necessary
+	if ($name and defined $row->{oid})
+	{
+	    @valuesbyname{ @{ $catalog->{columnnames} } } = @values;
+	    $byname{$valuesbyname{$name}} = $row->{oid};
+	    $byoid{$row->{oid}} = $valuesbyname{$name};
+	}
+    }
+    if (%byname)
+    {
+	$catalog->{byname} = \%byname;
+	$catalog->{byoid} = \%byoid;
+    }
+}
+
+# in a second pass, resolve resolve references and similar things in the data
+foreach my $catname (@{ $catalogs->{names} })
+{
+    my $catalog = $catalogs->{$catname};
+
+    foreach my $row (@{ $catalog->{data} })
+    {
+	my $colno = 0;
+	foreach my $column (@{ $catalog->{columns} })
+	{
+	    my $value = $row->{values}->[$colno];
+
+	    if ($column->{type} eq 'regproc')
+	    {
+		if ($value ne '-' && $value !~ /^\d+$/)
+		{
+		    my $replacement = $catalogs->{pg_proc}->{byname}->{$value};
+		    $row->{values}->[$colno] = $replacement;
+		}
+	    }
+	    elsif ($column->{type} eq 'regtype')
+	    {
+		if ($value ne '-' && $value !~ /^\d+$/)
+		{
+		    my $replacement = $catalogs->{pg_type}->{byname}->{$value};
+		    $row->{values}->[$colno] = $replacement;
+		}
+	    }
+
+	    $colno++;
+	}
+    }
+}
+
 # produce output, one catalog at a time
 foreach my $catname (@{ $catalogs->{names} })
 {
@@ -160,10 +243,6 @@ foreach my $catname (@{ $catalogs->{names} })
 		foreach my $row (@{ $catalog->{data} })
 		{
 
-			# substitute constant values we acquired above
-			$row->{bki_values} =~ s/\bPGUID\b/$BOOTSTRAP_SUPERUSERID/g;
-			$row->{bki_values} =~ s/\bPGNSP\b/$PG_CATALOG_NAMESPACE/g;
-
 			# Save pg_type info for pg_attribute processing below
 			if ($catname eq 'pg_type')
 			{
@@ -175,9 +254,9 @@ foreach my $catname (@{ $catalogs->{names} })
 
 			# Write to postgres.bki
 			my $oid = $row->{oid} ? "OID = $row->{oid} " : '';
-			printf BKI "insert %s( %s)\n", $oid, $row->{bki_values};
+			printf BKI "insert %s( %s)\n", $oid, join(' ', @{$row->{values}});
 
-		   # Write comments to postgres.description and postgres.shdescription
+			# Write comments to postgres.description and postgres.shdescription
 			if (defined $row->{descr})
 			{
 				printf DESCR "%s\t%s\t0\t%s\n", $row->{oid}, $catname,
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 3d1bb32..f7c99ff 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -84,51 +84,11 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Else it's a name, possibly schema-qualified */
 
 	/*
-	 * In bootstrap mode we assume the given name is not schema-qualified, and
-	 * just search pg_proc for a unique match.  This is needed for
-	 * initializing other system catalogs (pg_namespace may not exist yet, and
-	 * certainly there are no schemas other than pg_catalog).
+	 * We should never get here in bootstrap mode, as all references should
+	 * have been resolved by genbki.pl.
 	 */
 	if (IsBootstrapProcessingMode())
-	{
-		int			matches = 0;
-		Relation	hdesc;
-		ScanKeyData skey[1];
-		SysScanDesc sysscan;
-		HeapTuple	tuple;
-
-		ScanKeyInit(&skey[0],
-					Anum_pg_proc_proname,
-					BTEqualStrategyNumber, F_NAMEEQ,
-					CStringGetDatum(pro_name_or_oid));
-
-		hdesc = heap_open(ProcedureRelationId, AccessShareLock);
-		sysscan = systable_beginscan(hdesc, ProcedureNameArgsNspIndexId, true,
-									 NULL, 1, skey);
-
-		while (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
-		{
-			result = (RegProcedure) HeapTupleGetOid(tuple);
-			if (++matches > 1)
-				break;
-		}
-
-		systable_endscan(sysscan);
-		heap_close(hdesc, AccessShareLock);
-
-		if (matches == 0)
-			ereport(ERROR,
-					(errcode(ERRCODE_UNDEFINED_FUNCTION),
-				 errmsg("function \"%s\" does not exist", pro_name_or_oid)));
-
-		else if (matches > 1)
-			ereport(ERROR,
-					(errcode(ERRCODE_AMBIGUOUS_FUNCTION),
-					 errmsg("more than one function named \"%s\"",
-							pro_name_or_oid)));
-
-		PG_RETURN_OID(result);
-	}
+		elog(ERROR, "regprocin with textual values is not supported in bootstrap mode");
 
 	/*
 	 * Normal case: parse the name into components and see if it matches any
@@ -1196,41 +1156,11 @@ regtypein(PG_FUNCTION_ARGS)
 	/* Else it's a type name, possibly schema-qualified or decorated */
 
 	/*
-	 * In bootstrap mode we assume the given name is not schema-qualified, and
-	 * just search pg_type for a match.  This is needed for initializing other
-	 * system catalogs (pg_namespace may not exist yet, and certainly there
-	 * are no schemas other than pg_catalog).
+	 * We should never get here in bootstrap mode, as all references should
+	 * have been resolved by genbki.pl.
 	 */
 	if (IsBootstrapProcessingMode())
-	{
-		Relation	hdesc;
-		ScanKeyData skey[1];
-		SysScanDesc sysscan;
-		HeapTuple	tuple;
-
-		ScanKeyInit(&skey[0],
-					Anum_pg_type_typname,
-					BTEqualStrategyNumber, F_NAMEEQ,
-					CStringGetDatum(typ_name_or_oid));
-
-		hdesc = heap_open(TypeRelationId, AccessShareLock);
-		sysscan = systable_beginscan(hdesc, TypeNameNspIndexId, true,
-									 NULL, 1, skey);
-
-		if (HeapTupleIsValid(tuple = systable_getnext(sysscan)))
-			result = HeapTupleGetOid(tuple);
-		else
-			ereport(ERROR,
-					(errcode(ERRCODE_UNDEFINED_OBJECT),
-					 errmsg("type \"%s\" does not exist", typ_name_or_oid)));
-
-		/* We assume there can be only one match */
-
-		systable_endscan(sysscan);
-		heap_close(hdesc, AccessShareLock);
-
-		PG_RETURN_OID(result);
-	}
+		elog(ERROR, "regtypein with textual values is not supported in bootstrap mode");
 
 	/*
 	 * Normal case: invoke the full parser to deal with special cases such as
diff --git a/src/include/c.h b/src/include/c.h
index ee615ee..292d843 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -347,6 +347,9 @@ typedef double float8;
 typedef Oid regproc;
 typedef regproc RegProcedure;
 
+
+typedef Oid regtype;
+
 typedef uint32 TransactionId;
 
 typedef uint32 LocalTransactionId;
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 4268b99..712b53c 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -52,7 +52,7 @@ CATALOG(pg_proc,1255) BKI_BOOTSTRAP BKI_ROWTYPE_OID(81) BKI_SCHEMA_MACRO
 	char		provolatile;	/* see PROVOLATILE_ categories below */
 	int16		pronargs;		/* number of arguments */
 	int16		pronargdefaults;	/* number of arguments with defaults */
-	Oid			prorettype;		/* OID of result type */
+	regtype		prorettype;		/* OID of result type */
 
 	/*
 	 * variable-length fields start here, but we allow direct access to
@@ -153,7 +153,7 @@ DATA(insert OID = 1245 (  charin		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 18
 DESCR("I/O");
 DATA(insert OID =  33 (  charout		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2275 "18" _null_ _null_ _null_ _null_ charout _null_ _null_ _null_ ));
 DESCR("I/O");
-DATA(insert OID =  34 (  namein			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 19 "2275" _null_ _null_ _null_ _null_ namein _null_ _null_ _null_ ));
+DATA(insert OID =  34 (  namein			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 name "2275" _null_ _null_ _null_ _null_ namein _null_ _null_ _null_ ));
 DESCR("I/O");
 DATA(insert OID =  35 (  nameout		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2275 "19" _null_ _null_ _null_ _null_ nameout _null_ _null_ _null_ ));
 DESCR("I/O");
@@ -674,11 +674,11 @@ DATA(insert OID =  401 (  text			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25
 DESCR("convert char(n) to text");
 DATA(insert OID =  406 (  text			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 25 "19" _null_ _null_ _null_ _null_ name_text _null_ _null_ _null_ ));
 DESCR("convert name to text");
-DATA(insert OID =  407 (  name			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 19 "25" _null_ _null_ _null_ _null_ text_name _null_ _null_ _null_ ));
+DATA(insert OID =  407 (  name			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 name "25" _null_ _null_ _null_ _null_ text_name _null_ _null_ _null_ ));
 DESCR("convert text to name");
 DATA(insert OID =  408 (  bpchar		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 1042 "19" _null_ _null_ _null_ _null_ name_bpchar _null_ _null_ _null_ ));
 DESCR("convert name to char(n)");
-DATA(insert OID =  409 (  name			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 19 "1042" _null_ _null_ _null_ _null_	bpchar_name _null_ _null_ _null_ ));
+DATA(insert OID =  409 (  name			   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 name "1042" _null_ _null_ _null_ _null_	bpchar_name _null_ _null_ _null_ ));
 DESCR("convert char(n) to name");
 
 DATA(insert OID = 440 (  hashgettuple	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 16 "2281 2281" _null_ _null_ _null_ _null_	hashgettuple _null_ _null_ _null_ ));
@@ -819,7 +819,7 @@ DATA(insert OID = 680 (  oidvectorge	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0
 DATA(insert OID = 681 (  oidvectorgt	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "30 30" _null_ _null_ _null_ _null_ oidvectorgt _null_ _null_ _null_ ));
 
 /* OIDS 700 - 799 */
-DATA(insert OID = 710 (  getpgusername	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ current_user _null_ _null_ _null_ ));
+DATA(insert OID = 710 (  getpgusername	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ current_user _null_ _null_ _null_ ));
 DESCR("deprecated, use current_user instead");
 DATA(insert OID = 716 (  oidlt			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "26 26" _null_ _null_ _null_ _null_ oidlt _null_ _null_ _null_ ));
 DATA(insert OID = 717 (  oidle			   PGNSP PGUID 12 1 0 0 0 f f f t t f i 2 0 16 "26 26" _null_ _null_ _null_ _null_ oidle _null_ _null_ _null_ ));
@@ -851,9 +851,9 @@ DATA(insert OID = 741 (  text_le		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16
 DATA(insert OID = 742 (  text_gt		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_gt _null_ _null_ _null_ ));
 DATA(insert OID = 743 (  text_ge		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_ge _null_ _null_ _null_ ));
 
-DATA(insert OID = 745 (  current_user	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ current_user _null_ _null_ _null_ ));
+DATA(insert OID = 745 (  current_user	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ current_user _null_ _null_ _null_ ));
 DESCR("current user name");
-DATA(insert OID = 746 (  session_user	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ session_user _null_ _null_ _null_ ));
+DATA(insert OID = 746 (  session_user	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ session_user _null_ _null_ _null_ ));
 DESCR("session user name");
 
 DATA(insert OID = 744 (  array_eq		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "2277 2277" _null_ _null_ _null_ _null_ array_eq _null_ _null_ _null_ ));
@@ -1017,7 +1017,7 @@ DATA(insert OID =  859 (  namenlike		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0
 DATA(insert OID =  860 (  bpchar		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 1042 "18" _null_ _null_ _null_ _null_	char_bpchar _null_ _null_ _null_ ));
 DESCR("convert char to char(n)");
 
-DATA(insert OID = 861 ( current_database	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ current_database _null_ _null_ _null_ ));
+DATA(insert OID = 861 ( current_database	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ current_database _null_ _null_ _null_ ));
 DESCR("name of the current database");
 DATA(insert OID = 817 (  current_query		  PGNSP PGUID 12 1 0 0 0 f f f f f f v 0 0 25 "" _null_ _null_ _null_ _null_  current_query _null_ _null_ _null_ ));
 DESCR("get the currently executing query");
@@ -1605,12 +1605,12 @@ DESCR("absolute value");
 
 /* OIDS 1400 - 1499 */
 
-DATA(insert OID = 1400 (  name		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 19 "1043" _null_ _null_ _null_ _null_	text_name _null_ _null_ _null_ ));
+DATA(insert OID = 1400 (  name		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 name "1043" _null_ _null_ _null_ _null_	text_name _null_ _null_ _null_ ));
 DESCR("convert varchar to name");
 DATA(insert OID = 1401 (  varchar	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 1043 "19" _null_ _null_ _null_ _null_	name_text _null_ _null_ _null_ ));
 DESCR("convert name to varchar");
 
-DATA(insert OID = 1402 (  current_schema	PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ current_schema _null_ _null_ _null_ ));
+DATA(insert OID = 1402 (  current_schema	PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ current_schema _null_ _null_ _null_ ));
 DESCR("current schema name");
 DATA(insert OID = 1403 (  current_schemas	PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 1003 "16" _null_ _null_ _null_ _null_	current_schemas _null_ _null_ _null_ ));
 DESCR("current schema search list");
@@ -1967,11 +1967,11 @@ DESCR("convert int8 number to hex");
 /* for character set encoding support */
 
 /* return database encoding name */
-DATA(insert OID = 1039 (  getdatabaseencoding	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ getdatabaseencoding _null_ _null_ _null_ ));
+DATA(insert OID = 1039 (  getdatabaseencoding	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ getdatabaseencoding _null_ _null_ _null_ ));
 DESCR("encoding name of current database");
 
 /* return client encoding name i.e. session encoding */
-DATA(insert OID = 810 (  pg_client_encoding    PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 19 "" _null_ _null_ _null_ _null_ pg_client_encoding _null_ _null_ _null_ ));
+DATA(insert OID = 810 (  pg_client_encoding    PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 name "" _null_ _null_ _null_ _null_ pg_client_encoding _null_ _null_ _null_ ));
 DESCR("encoding name of current database");
 
 DATA(insert OID = 1713 (  length		   PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 23 "17 19" _null_ _null_ _null_ _null_ length_in_encoding _null_ _null_ _null_ ));
@@ -1989,7 +1989,7 @@ DESCR("convert string with specified encoding names");
 DATA(insert OID = 1264 (  pg_char_to_encoding	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 23 "19" _null_ _null_ _null_ _null_ PG_char_to_encoding _null_ _null_ _null_ ));
 DESCR("convert encoding name to encoding id");
 
-DATA(insert OID = 1597 (  pg_encoding_to_char	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 19 "23" _null_ _null_ _null_ _null_ PG_encoding_to_char _null_ _null_ _null_ ));
+DATA(insert OID = 1597 (  pg_encoding_to_char	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 name "23" _null_ _null_ _null_ _null_ PG_encoding_to_char _null_ _null_ _null_ ));
 DESCR("convert encoding id to encoding name");
 
 DATA(insert OID = 2319 (  pg_encoding_max_length   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 23 "23" _null_ _null_ _null_ _null_ pg_encoding_max_length_sql _null_ _null_ _null_ ));
@@ -2005,7 +2005,7 @@ DATA(insert OID = 1640 (  pg_get_viewdef	   PGNSP PGUID 12 1 0 0 0 f f f f t f s
 DESCR("select statement of a view");
 DATA(insert OID = 1641 (  pg_get_viewdef	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "26" _null_ _null_ _null_ _null_ pg_get_viewdef _null_ _null_ _null_ ));
 DESCR("select statement of a view");
-DATA(insert OID = 1642 (  pg_get_userbyid	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 19 "26" _null_ _null_ _null_ _null_ pg_get_userbyid _null_ _null_ _null_ ));
+DATA(insert OID = 1642 (  pg_get_userbyid	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 name "26" _null_ _null_ _null_ _null_ pg_get_userbyid _null_ _null_ _null_ ));
 DESCR("role name by OID (with fallback)");
 DATA(insert OID = 1643 (  pg_get_indexdef	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 25 "26" _null_ _null_ _null_ _null_ pg_get_indexdef _null_ _null_ _null_ ));
 DESCR("index description");
@@ -3765,7 +3765,7 @@ DATA(insert OID = 2420 (  oidvectorrecv		   PGNSP PGUID 12 1 0 0 0 f f f f t f i
 DESCR("I/O");
 DATA(insert OID = 2421 (  oidvectorsend		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 17 "30" _null_ _null_ _null_ _null_ oidvectorsend _null_ _null_ _null_ ));
 DESCR("I/O");
-DATA(insert OID = 2422 (  namerecv			   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 19 "2281" _null_ _null_ _null_ _null_	namerecv _null_ _null_ _null_ ));
+DATA(insert OID = 2422 (  namerecv			   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 name "2281" _null_ _null_ _null_ _null_	namerecv _null_ _null_ _null_ ));
 DESCR("I/O");
 DATA(insert OID = 2423 (  namesend			   PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 17 "19" _null_ _null_ _null_ _null_ namesend _null_ _null_ _null_ ));
 DESCR("I/O");
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 6b248f2..0eee056 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -154,10 +154,10 @@ WHERE p1.oid != p2.oid AND
     p2.prosrc NOT LIKE E'range\\_constructor_' AND
     (p1.prorettype < p2.prorettype)
 ORDER BY 1, 2;
- prorettype | prorettype 
-------------+------------
-         25 |       1043
-       1114 |       1184
+         prorettype          |        prorettype        
+-----------------------------+--------------------------
+ text                        | character varying
+ timestamp without time zone | timestamp with time zone
 (2 rows)
 
 SELECT DISTINCT p1.proargtypes[0], p2.proargtypes[0]
-- 
2.3.0.149.gf3f4077.dirty

#15

Greg Stark

stark@mit.edu

almost 11 years ago

In reply to: Andres Freund (#14)

Re: Bootstrap DATA is a pita

On Sat, Feb 21, 2015 at 11:08 PM, Andres Freund <andres@2ndquadrant.com> wrote:

The changes in pg_proc.h are just to demonstrate that using names
instead of oids works.

Fwiw I always thought it was strange how much of our bootstrap was
done in a large static text file. Very little of it is actually needed
for bootstrapping and we could get by with a very small set followed
by a bootstrap script written in standard SQL, not unlike how the
system views are created. It's much easier to type CREATE OPERATOR and
CREATE OPERATOR CLASS with all the symbolic names instead of having to
fill in the table.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Tom Lane (#9)

Re: Bootstrap DATA is a pita

On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:

On 2/20/15 8:46 PM, Josh Berkus wrote:

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

Yeah. One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring. I agree that the current format is a pain in the tail, but
pg_proc.h is >5k lines already. I don't want it to be 100k lines
instead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#16)

Re: Bootstrap DATA is a pita

On 2015-03-03 21:49:21 -0500, Robert Haas wrote:

On Sat, Feb 21, 2015 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-02-20 22:19:54 -0500, Peter Eisentraut wrote:

On 2/20/15 8:46 PM, Josh Berkus wrote:

Or what about just doing CSV?

I don't think that would actually address the problems. It would just
be the same format as now with different delimiters.

Yea, we need hierarchies and named keys.

Yeah. One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring. I agree that the current format is a pain in the tail, but
pg_proc.h is >5k lines already. I don't want it to be 100k lines
instead.

Do you have a better suggestion? Sure it'll be a long file, but it still
seems vastly superiour to what we have now.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Andres Freund (#17)

Re: Bootstrap DATA is a pita

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring. I agree that the current format is a pain in the tail, but
pg_proc.h is >5k lines already. I don't want it to be 100k lines
instead.

Do you have a better suggestion? Sure it'll be a long file, but it still
seems vastly superiour to what we have now.

Not really. What had occurred to me is to try to improve the format
of the DATA lines (e.g. by allowing names to be used instead of OIDs)
but that wouldn't allow defaulted fields to be omitted, which is
certainly a big win. I wonder whether some home-grown single-line
format might be better than using a pre-existing format, but I'm not
too sure it would.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#18)

Re: Bootstrap DATA is a pita

On 2015-03-04 08:47:44 -0500, Robert Haas wrote:

Even this promises to vastly increase the number of lines in the file,
and make it harder to compare entries by grepping out some common
substring. I agree that the current format is a pain in the tail, but
pg_proc.h is >5k lines already. I don't want it to be 100k lines
instead.

Do you have a better suggestion? Sure it'll be a long file, but it still
seems vastly superiour to what we have now.

Not really. What had occurred to me is to try to improve the format
of the DATA lines (e.g. by allowing names to be used instead of OIDs)

That's a separate patch so far, so if we decide to only want thta we can
do it.

but that wouldn't allow defaulted fields to be omitted, which is
certainly a big win. I wonder whether some home-grown single-line
format might be better than using a pre-existing format, but I'm not
too sure it would.

I can't see readability of anything being good unless the column names
are there - we just have too many columns in some of the tables. I think
having more lines is a acceptable price to pay. We can easily start to
split the files at some point if we want, that'd just be a couple lines
of code.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Robert Haas (#16)

Re: Bootstrap DATA is a pita

On 3/3/15 9:49 PM, Robert Haas wrote:

Yeah. One thought though is that I don't think we need the "data" layer
in your proposal; that is, I'd flatten the representation to something
more like

{
oid => 2249,
oiddefine => 'CSTRINGOID',
typname => 'cstring',
typlen => -2,
typbyval => 1,
...
}

Even this promises to vastly increase the number of lines in the file,

I think lines are cheap. Columns are much harder to deal with.

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Peter Eisentraut (#20)

Re: Bootstrap DATA is a pita

Peter Eisentraut <peter_e@gmx.net> writes:

On 3/3/15 9:49 PM, Robert Haas wrote:

Even this promises to vastly increase the number of lines in the file,

I think lines are cheap. Columns are much harder to deal with.

Yeah. pg_proc.h is already impossible to work with in a standard
80-column window. I don't want to find that the lines mostly wrap even
when I expand my editor window to full screen width, but that is certainly
what will happen if we adopt column labelling *and* insist that entries
remain all on one line. (As a data point, the maximum usable Emacs window
width on my Mac laptop seems to be about 230 characters.)

It's possible that gaining the ability to depend on per-column defaults
would reduce the typical number of fields so much that pg_proc.h entries
would still fit on a line of 100-some characters ... but I'd want to see
proof before assuming that. And pg_proc isn't even our widest catalog.
Some of the ones that are wider, like pg_am, don't seem like there would
be any scope whatsoever for saving space with per-column defaults.

So while I can see the attraction of trying to fit things on one line,
I doubt it's gonna work very well. I'd rather go over to a
one-value-per-line format and live with lots of lines.

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is. I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Peter Eisentraut (#20)

Re: Bootstrap DATA is a pita

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Tom Lane (#21)

Re: Bootstrap DATA is a pita

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is. I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.

I wonder if we should have a tool in our repository to help people
edit the file. So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Robert Haas (#22)

Re: Bootstrap DATA is a pita

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

Well, grep is not going to work too well anymore, but extracting a
specific field from an entry is going to be beyond the competence of
simple grep/sed tools anyway if we allow column default substitutions.

I think a fairer question is "can you do that in a one-liner Perl script",
which seems like it might be achievable given an appropriate choice of
data markup language.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Tom Lane (#21)

Re: Bootstrap DATA is a pita

On 03/04/2015 09:42 AM, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On 3/3/15 9:49 PM, Robert Haas wrote:

Even this promises to vastly increase the number of lines in the file,

I think lines are cheap. Columns are much harder to deal with.

Yeah. pg_proc.h is already impossible to work with in a standard
80-column window. I don't want to find that the lines mostly wrap even
when I expand my editor window to full screen width, but that is certainly
what will happen if we adopt column labelling *and* insist that entries
remain all on one line. (As a data point, the maximum usable Emacs window
width on my Mac laptop seems to be about 230 characters.)

It's possible that gaining the ability to depend on per-column defaults
would reduce the typical number of fields so much that pg_proc.h entries
would still fit on a line of 100-some characters ... but I'd want to see
proof before assuming that. And pg_proc isn't even our widest catalog.
Some of the ones that are wider, like pg_am, don't seem like there would
be any scope whatsoever for saving space with per-column defaults.

So while I can see the attraction of trying to fit things on one line,
I doubt it's gonna work very well. I'd rather go over to a
one-value-per-line format and live with lots of lines.

Is it necessarily an all or nothing deal?

Taking a previous example, we could have something like:

{
oid => 2249, oiddefine => 'CSTRINGOID', typname => 'cstring',
typlen => -2, typbyval => 1,
...
}

which would allow us to fit within a reasonable edit window (for my
normal window and font that's around 180 characters) and still reduce
the number of lines.

I'm not wedded to it, but it's a thought.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Stephen Frost

sfrost@snowman.net

almost 11 years ago

In reply to: Robert Haas (#23)

Re: Bootstrap DATA is a pita

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is. I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.

I wonder if we should have a tool in our repository to help people
edit the file. So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

or

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.

Alright, I'll bite on this- we have this really neat tool for editing
data in bulk, or individual values, or pulling out data to look at based
on particular values or even functions... It's called PostgreSQL.

What if we had an easy way to export an existing table into whatever
format we decide to use for initdb to use? For that matter, what if
that file was simple to import into PG?

What about having a way to load all the catalog tables from their git
repo files into a "pg_dev" schema? Maybe even include a make target or
initdb option which does that? (the point here being to provide a way
to modify the tables and compare the results to the existing tables
without breaking the instance one is using for this)

I have to admit that I've never tried to do that with the existing
format, but seems like an interesting idea to consider. I further
wonder if it'd be possible to generate the table structures too..

Thanks!

Stephen

#27

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Robert Haas (#22)

Re: Bootstrap DATA is a pita

On 03/04/2015 09:51 AM, Robert Haas wrote:

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

I think grep will be the wrong tool for this format, but if we're
settling on a perl format, a few perl one-liners should be able to work
pretty well. It might be worth shipping a small perl module that
provided some functions, or a script doing common tasks (or both).

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#23)

Re: Bootstrap DATA is a pita

On 2015-03-04 09:55:01 -0500, Robert Haas wrote:

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wonder if we should have a tool in our repository to help people
edit the file. So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

or

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.

The stuff I've started to work on basically allows to load the stuff
that Catalog.pm provides (in an extended format), edit it in memory, and
then serialize it again. So such a thing could relatively easily be
added if somebody wants to do so. I sure hope though that the need for
it will become drastically lower with the new format.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Andrew Dunstan (#25)

Re: Bootstrap DATA is a pita

On Wed, Mar 4, 2015 at 10:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

Is it necessarily an all or nothing deal?

Taking a previous example, we could have something like:

{
oid => 2249, oiddefine => 'CSTRINGOID', typname => 'cstring',
typlen => -2, typbyval => 1,
...
}

which would allow us to fit within a reasonable edit window (for my normal
window and font that's around 180 characters) and still reduce the number of
lines.

I'm not wedded to it, but it's a thought.

Another advantage of this is that it would probably make git less
likely to fumble a rebase. If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk. I would personally consider a substantial increase in the rate
of such occurrences as being a cure far, far worse than the disease.
If you keep the entry for each function on just a couple of lines the
chances of this happening are greatly reduced, because you're much
likely to get a false match to surrounding context.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Robert Haas (#29)

Re: Bootstrap DATA is a pita

Robert Haas <robertmhaas@gmail.com> writes:

Another advantage of this is that it would probably make git less
likely to fumble a rebase. If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk.

That is a really, really good point.

I had been thinking it was a disadvantage of Andrew's proposal that
line breaks would tend to fall in inconsistent places from one entry
to another ... but from this perspective, maybe that's not such a
bad thing.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Andrew Dunstan (#27)

Re: Bootstrap DATA is a pita

Andrew Dunstan wrote:

On 03/04/2015 09:51 AM, Robert Haas wrote:

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

I think grep will be the wrong tool for this format, but if we're settling
on a perl format, a few perl one-liners should be able to work pretty well.
It might be worth shipping a small perl module that provided some functions,
or a script doing common tasks (or both).

I was going to say the same thing. We need to make sure that the output
format of those oneliners is consistent, though -- it wouldn't be nice
if adding one column with nondefault value to a dozen of entries changes
the formatting of other entries. For example, perhaps declare that the
order of entries is alphabetical or it matches something declared at the
start of the file.

From that POV, I don't like the idea of having multiple columns for a
sigle entry in a single line; adding more columns means that eventually
we're going to split lines that have become too long in a different
place, which would reformat the whole file; not very nice. But maybe
this doesn't matter if we decree that changing the column split is a
manual chore rather than automatic, because then it can be done in a
separate mechanical commit after the extra column is added.

BTW one solution to the merge problem is to have unique separators for
each entry. For instance, instead of

} -- this is the end of the previous entry
,
{
oid = 2233,
proname = array_append,

we could have
} # array_prepend 2232
,
} # array_append 2233
oid = 2233,
proname = array_append,

where the funcname-oid comment is there to avoid busted merges. The
automatic editing tools make sure that those markers are always present.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Alvaro Herrera (#31)

Re: Bootstrap DATA is a pita

On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Andrew Dunstan wrote:

On 03/04/2015 09:51 AM, Robert Haas wrote:

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

I think grep will be the wrong tool for this format, but if we're settling
on a perl format, a few perl one-liners should be able to work pretty well.
It might be worth shipping a small perl module that provided some functions,
or a script doing common tasks (or both).

I was going to say the same thing. We need to make sure that the output
format of those oneliners is consistent, though -- it wouldn't be nice
if adding one column with nondefault value to a dozen of entries changes
the formatting of other entries. For example, perhaps declare that the
order of entries is alphabetical or it matches something declared at the
start of the file.

From that POV, I don't like the idea of having multiple columns for a
sigle entry in a single line; adding more columns means that eventually
we're going to split lines that have become too long in a different
place, which would reformat the whole file; not very nice. But maybe
this doesn't matter if we decree that changing the column split is a
manual chore rather than automatic, because then it can be done in a
separate mechanical commit after the extra column is added.

BTW one solution to the merge problem is to have unique separators for
each entry. For instance, instead of

} -- this is the end of the previous entry
,
{
oid = 2233,
proname = array_append,

we could have
} # array_prepend 2232
,
} # array_append 2233
oid = 2233,
proname = array_append,

where the funcname-oid comment is there to avoid busted merges. The
automatic editing tools make sure that those markers are always present.

Speaking from entirely too much experience, that's not nearly enough.
git only needs 3 lines of context to apply a hunk with no qualms at
all, and it'll shade that to just 1 or 2 with little fanfare. If your
pg_proc entries are each 20 lines long, this sort of thing will
provide little protection.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Peter Eisentraut

peter_e@gmx.net

almost 11 years ago

In reply to: Robert Haas (#22)

Re: Bootstrap DATA is a pita

On 3/4/15 9:51 AM, Robert Haas wrote:

On Wed, Mar 4, 2015 at 9:06 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

e.g. grep for a function name and check that all the matches have the
same volatility.

You could still do that with grep -A or something like that. I think
that it would be easier than now.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#32)

Re: Bootstrap DATA is a pita

Robert Haas wrote:

On Wed, Mar 4, 2015 at 2:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

BTW one solution to the merge problem is to have unique separators for
each entry. For instance, instead of

Speaking from entirely too much experience, that's not nearly enough.
git only needs 3 lines of context to apply a hunk with no qualms at
all, and it'll shade that to just 1 or 2 with little fanfare. If your
pg_proc entries are each 20 lines long, this sort of thing will
provide little protection.

Yeah, you're right. This is going to be a problem, and we need some
solution for it. I'm out of ideas, other than of course getting each
entry to be at most two lines long which nobody seems to like (for good
reasons.)

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Stephen Frost (#26)

Re: Bootstrap DATA is a pita

On 3/4/15 9:07 AM, Stephen Frost wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Mar 4, 2015 at 9:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

and make it harder to compare entries by grepping out some common
substring.

Could you give an example of the sort of thing you wish to do?

On that angle, I'm dubious that a format that allows omission of fields is
going to be easy for editing scripts to modify, no matter what the layout
convention is. I've found it relatively easy to write sed or even Emacs
macros to add new column values to old-school pg_proc.h ... but in this
brave new world, I'm going to be really hoping that the column default
works for 99.9% of pg_proc entries when we add a new pg_proc column,
because slipping a value into a desired position is gonna be hard for
a script when you don't know whether the adjacent existing fields are
present or not.

I wonder if we should have a tool in our repository to help people
edit the file. So instead of going in there yourself and changing
things by hand, or writing your own script, you can do:

updatepgproc.pl --oid 5678 provolatile=v

or

updatepgpproc.pl --name='.*xact.*' prowhatever=someval

Regardless of what format we end up with, that seems like it would
make things easier.

Alright, I'll bite on this- we have this really neat tool for editing
data in bulk, or individual values, or pulling out data to look at based
on particular values or even functions... It's called PostgreSQL.

What if we had an easy way to export an existing table into whatever
format we decide to use for initdb to use? For that matter, what if
that file was simple to import into PG?

What about having a way to load all the catalog tables from their git
repo files into a "pg_dev" schema? Maybe even include a make target or
initdb option which does that? (the point here being to provide a way
to modify the tables and compare the results to the existing tables
without breaking the instance one is using for this)

I have to admit that I've never tried to do that with the existing
format, but seems like an interesting idea to consider. I further
wonder if it'd be possible to generate the table structures too..

Semi-related... if we put some special handling in some places for
bootstrap mode, couldn't most catalog objects be created using SQL, once
we got pg_class, pg_attributes and pg_type created? That would
theoretically allow us to drive much more of initdb with plain SQL
(possibly created via pg_dump).
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Jim Nasby (#35)

Re: Bootstrap DATA is a pita

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Andres Freund (#36)

Re: Bootstrap DATA is a pita

On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

My reaction exactly. We should not make users pay a price for
developers' convenience.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Indeed.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Stephen Frost

sfrost@snowman.net

almost 11 years ago

In reply to: Andrew Dunstan (#37)

Re: Bootstrap DATA is a pita

* Andrew Dunstan (andrew@dunslane.net) wrote:

On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

My reaction exactly. We should not make users pay a price for
developers' convenience.

Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Indeed.

No, they aren't, but having the data in a table in PG, with a way to
easily export to the format needed by BKI, would make bulk updates much
easier..

Thanks!

Stephen

#39

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 11 years ago

In reply to: Stephen Frost (#38)

Re: Bootstrap DATA is a pita

On 3/7/15 6:02 PM, Stephen Frost wrote:

* Andrew Dunstan (andrew@dunslane.net) wrote:

On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created? That would theoretically allow
us to drive much more of initdb with plain SQL (possibly created via
pg_dump).

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

My reaction exactly. We should not make users pay a price for
developers' convenience.

How often does a normal user actually initdb? I don't think it's that
incredibly common. Added time to our development cycle certainly is a
concern though.

Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.

Yeah, I was thinking SQL would make it even easier, but perhaps not.
Since the other options here seem to have hit a dead end though, it
seems your load it into tables idea is what we've got left...

That's besides the fact that SQL commands aren't actually that
comfortably editable in bulk.

Indeed.

No, they aren't, but having the data in a table in PG, with a way to
easily export to the format needed by BKI, would make bulk updates much
easier..

My thought was that pg_dump would be useful here, so instead of hand
editing you'd just make changes in a live database and then dump it.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Robert Haas (#29)

Re: Bootstrap DATA is a pita

On 2015-03-04 10:25:58 -0500, Robert Haas wrote:

Another advantage of this is that it would probably make git less
likely to fumble a rebase. If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk. I would personally consider a substantial increase in the rate
of such occurrences as being a cure far, far worse than the disease.
If you keep the entry for each function on just a couple of lines the
chances of this happening are greatly reduced, because you're much
likely to get a false match to surrounding context.

I'm not particularly worried about this. Especially with attribute
defaults it seems unlikely that you often have the same three
surrounding lines in both directions in a similar region of the file.

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Andres Freund (#40)

Re: Bootstrap DATA is a pita

Andres Freund <andres@2ndquadrant.com> writes:

On 2015-03-04 10:25:58 -0500, Robert Haas wrote:

Another advantage of this is that it would probably make git less
likely to fumble a rebase. If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk. I would personally consider a substantial increase in the rate
of such occurrences as being a cure far, far worse than the disease.
If you keep the entry for each function on just a couple of lines the
chances of this happening are greatly reduced, because you're much
likely to get a false match to surrounding context.

I'm not particularly worried about this. Especially with attribute
defaults it seems unlikely that you often have the same three
surrounding lines in both directions in a similar region of the file.

Really? A lot depends on the details of how we choose to lay out these
files, but you could easily blow all your safety margin on lines
containing just braces, for instance.

I'll reserve judgment on this till I see the proposed new catalog data
files, but I absolutely reject any contention that it's not something
to worry about.

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

Um. Good luck with getting every patch submitter to do that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Stephen Frost (#38)

Re: Bootstrap DATA is a pita

Stephen Frost <sfrost@snowman.net> writes:

* Andrew Dunstan (andrew@dunslane.net) wrote:

On 03/07/2015 05:46 PM, Andres Freund wrote:

On 2015-03-07 16:43:15 -0600, Jim Nasby wrote:

Semi-related... if we put some special handling in some places for bootstrap
mode, couldn't most catalog objects be created using SQL, once we got
pg_class, pg_attributes and pg_type created?

Several people have now made that suggestion, but I *seriously* doubt
that we actually want to go there. The overhead of executing SQL
commands in comparison to the bki stuff is really rather
noticeable. Doing the majority of the large number of insertions via SQL
will make initdb noticeably slower. And it's already annoyingly
slow. Besides make install it's probably the thing I wait most for
during development.

My reaction exactly. We should not make users pay a price for
developers' convenience.

Another reason not to do this is that it would require a significant (in
my judgment) amount of crapification of a lot of code with bootstrap-mode
special cases. Neither the parser, the planner, nor the executor could
function in bootstrap mode without a lot of lobotomization. Far better
to confine all that ugliness to bootstrap.c.

Just to clarify, since Jim was responding to my comment, my thought was
*not* to use SQL commands inside initdb, but rather to use PG to create
the source files that we have today in our tree, which wouldn't slow
down initdb at all.

That, on the other hand, might be a sane suggestion. I'm not sure
though. It feels more like "use the hammer you have at hand" than
necessarily being a good fit. In particular, keeping the raw data in
some tables doesn't seem like an environment that would naturally
distinguish between hard-coded and defaultable values. For instance,
how would you distinguish hard-coded OIDs from ones that could be
assigned at initdb's whim?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Alvaro Herrera

alvherre@2ndquadrant.com

almost 11 years ago

In reply to: Tom Lane (#41)

Re: Bootstrap DATA is a pita

Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

Um. Good luck with getting every patch submitter to do that.

Can we do it centrally somehow?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Andrew Dunstan

andrew@dunslane.net

almost 11 years ago

In reply to: Alvaro Herrera (#43)

Re: Bootstrap DATA is a pita

On 03/08/2015 10:11 PM, Alvaro Herrera wrote:

Tom Lane wrote:

Andres Freund <andres@2ndquadrant.com> writes:

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

Um. Good luck with getting every patch submitter to do that.

Can we do it centrally somehow?

I don't believe there is provision for setting diff.context on a per
file basis.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Robert Haas

robertmhaas@gmail.com

almost 11 years ago

In reply to: Andres Freund (#40)

Re: Bootstrap DATA is a pita

On Sun, Mar 8, 2015 at 12:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2015-03-04 10:25:58 -0500, Robert Haas wrote:

Another advantage of this is that it would probably make git less
likely to fumble a rebase. If there are lots of places in the file
where we have the same 10 lines in a row with occasional variations,
rebasing a patch could easily pick the the wrong place to reapply the
hunk. I would personally consider a substantial increase in the rate
of such occurrences as being a cure far, far worse than the disease.
If you keep the entry for each function on just a couple of lines the
chances of this happening are greatly reduced, because you're much
likely to get a false match to surrounding context.

I'm not particularly worried about this. Especially with attribute
defaults it seems unlikely that you often have the same three
surrounding lines in both directions in a similar region of the file.

That's woefully optimistic, and you don't need to have 3 lines. 1 or
2 will do fine.

And even if it turns out to actually be bothersome, you can help
yourself by passing -U 5/setting diff.context = 5 or something like
that.

I don't believe that for a minute. When you have your own private
branch and you do 'git rebase master', how's that going to help?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Andres Freund

andres@2ndquadrant.com

almost 11 years ago

In reply to: Jim Nasby (#39)

Re: Bootstrap DATA is a pita

On 2015-03-07 18:09:36 -0600, Jim Nasby wrote:

How often does a normal user actually initdb? I don't think it's that
incredibly common. Added time to our development cycle certainly is a
concern though.

There's many shops that run initdb as part of their test/CI systems.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers