Some 8.4 changes needed according to pg_migrator testing

Started by Tom Laneover 16 years ago24 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I was just talking to Bruce about his results from testing pg_migrator,
and we realized there are a couple of changes that we need to slip into
the core code before 8.4 goes final.

1. pg_dumpall dumps CREATE DATABASE commands that include the source
database's encoding, lc_collate, and lc_ctype settings ... but if
dumping from a pre-8.4 server it just omits the lc_ settings. This
is flat-out wrong (independently of pg_migrator). The correct behavior
when dumping from pre-8.4 is to get the server-wide locale settings
and include those in the CREATE DATABASE commands. Otherwise you're
not restoring the full state of the database correctly. This is
critical in view of the 8.4 changes to make CREATE DATABASE enforce
encoding-vs-locale match --- if you try to load the dump into a server
with a different default locale, it'll probably fail, and there's
absolutely no reason why it should.

2. There seem to be some corner cases where creating a table in the new
database will not create a toast table even though there was one in the
previous instance. (I'm not 100% convinced that this can happen if we
create and then drop dropped columns, for instance ... but I'm not
convinced it can't happen, either.) If there is a toast table in the
old database then pg_migrator must bring it over because it might
possibly contain live data. However, as toasting.c is presently coded
there is no way to force it to create a toast table. I think we should
change AlterTableCreateToastTable to add a "bool force" parameter.
Alternatively we could add a separate entry point, but the option seems
a bit cleaner.

Barring objections I'll commit changes for both of these before beta2.

regards, tom lane

#2Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#1)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

2. There seem to be some corner cases where creating a table in the new
database will not create a toast table even though there was one in the
previous instance. (I'm not 100% convinced that this can happen if we
create and then drop dropped columns, for instance ... but I'm not
convinced it can't happen, either.) If there is a toast table in the
old database then pg_migrator must bring it over because it might
possibly contain live data. However, as toasting.c is presently coded
there is no way to force it to create a toast table. I think we should
change AlterTableCreateToastTable to add a "bool force" parameter.
Alternatively we could add a separate entry point, but the option seems
a bit cleaner.

Hmm, what about toast reloptions? They are not a problem now of course, but
could be in a 8.4->8.5 migration.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: Some 8.4 changes needed according to pg_migrator testing

Alvaro Herrera <alvherre@commandprompt.com> writes:

Tom Lane wrote:

2. There seem to be some corner cases where creating a table in the new
database will not create a toast table even though there was one in the
previous instance.

Hmm, what about toast reloptions? They are not a problem now of course, but
could be in a 8.4->8.5 migration.

I don't think it's an issue. The type of scenario we are looking at is
where there is no need for a toast table *now*, but there might be some
old rows hanging around that got toasted anyway. (Say, you originally
had two wide varchar columns and then dropped one.) It seems unlikely
that preserving the reloptions for the toast table is going to be all
that critical in this type of scenario.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Some 8.4 changes needed according to pg_migrator testing

I wrote:

1. pg_dumpall dumps CREATE DATABASE commands that include the source
database's encoding, lc_collate, and lc_ctype settings ... but if
dumping from a pre-8.4 server it just omits the lc_ settings. This
is flat-out wrong (independently of pg_migrator). The correct behavior
when dumping from pre-8.4 is to get the server-wide locale settings
and include those in the CREATE DATABASE commands.

Actually, there's another issue that comes to mind here: since we are
relying on platform-dependent locale names, including those in the dump
is going to pose a severe problem for porting dumps across platforms
(where "different platform" could even mean "different libc release").
We're already at risk with respect to dumps from 8.4, even without the
above-proposed change.

I am not sure what we can do about this. Ideas?

regards, tom lane

#5David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#4)
Re: Some 8.4 changes needed according to pg_migrator testing

On May 7, 2009, at 10:18 AM, Tom Lane wrote:

Actually, there's another issue that comes to mind here: since we are
relying on platform-dependent locale names, including those in the
dump
is going to pose a severe problem for porting dumps across platforms
(where "different platform" could even mean "different libc release").
We're already at risk with respect to dumps from 8.4, even without the
above-proposed change.

I am not sure what we can do about this. Ideas?

Abandon platform-dependent locales?

Kidding! (Sort of.)

Best,

David

#6Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#4)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

Actually, there's another issue that comes to mind here: since we are
relying on platform-dependent locale names, including those in the dump
is going to pose a severe problem for porting dumps across platforms
(where "different platform" could even mean "different libc release").
We're already at risk with respect to dumps from 8.4, even without the
above-proposed change.

I am not sure what we can do about this. Ideas?

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.
Maybe what we can do is have a mechanism for pg_restore to map one
locale from the dump file into another. So the user can specify a file
with lines like
"en_US := English_UnitedStates"
etc

(For text dumps, the only solution would be for the user to edit the
dump manually; perhaps provide a pg_dump switch to avoid dumping
locale config?).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#7Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#6)
Re: Some 8.4 changes needed according to pg_migrator testing

Alvaro Herrera wrote:

Tom Lane wrote:

Actually, there's another issue that comes to mind here: since we are
relying on platform-dependent locale names, including those in the dump
is going to pose a severe problem for porting dumps across platforms
(where "different platform" could even mean "different libc release").
We're already at risk with respect to dumps from 8.4, even without the
above-proposed change.

I am not sure what we can do about this. Ideas?

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.
Maybe what we can do is have a mechanism for pg_restore to map one
locale from the dump file into another. So the user can specify a file
with lines like
"en_US := English_UnitedStates"
etc

(For text dumps, the only solution would be for the user to edit the
dump manually; perhaps provide a pg_dump switch to avoid dumping
locale config?).

We have a pg_dump switch that sets the encoding. Perhaps we could have a
pg_dump switch that "fakes" the output locale? Seems awfully kludgy
though - I'd much rather see us supporting it on pg_restore and just say
that if you are dumping in plaintext, well, use a plaintext editor to
edit it.

//Magnus

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#7)
Re: Some 8.4 changes needed according to pg_migrator testing

Magnus Hagander <magnus@hagander.net> writes:

Alvaro Herrera wrote:

(For text dumps, the only solution would be for the user to edit the
dump manually; perhaps provide a pg_dump switch to avoid dumping
locale config?).

We have a pg_dump switch that sets the encoding. Perhaps we could have a
pg_dump switch that "fakes" the output locale? Seems awfully kludgy
though - I'd much rather see us supporting it on pg_restore and just say
that if you are dumping in plaintext, well, use a plaintext editor to
edit it.

I don't think a solution that requires you to know about this in advance
(ie when you make the dump) is going to be very satisfactory.

I'm inclined to think that the most usable answer is to have some way of
getting CREATE DATABASE itself to apply a locale-name mapping.

Or we could try to make the user-visible locale names
platform-independent in the first place, a la David's not-silly-at-all
suggestion. I think the part that goes "en_US" or whatever is actually
quite well standardized (except for good ol' Windows, but we could
provide a mapping from the Unix-style names to Windows names). It's the
encoding-name part that's not very stable. If we could hide that from
the user and tack it on internally, things would be much better.

regards, tom lane

#9Bernd Helmle
mailings@oopsware.de
In reply to: Tom Lane (#8)
Re: Some 8.4 changes needed according to pg_migrator testing

--On 7. Mai 2009 15:32:01 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the part that goes "en_US" or whatever is actually
quite well standardized (except for good ol' Windows, but we could
provide a mapping from the Unix-style names to Windows names).

I like this idea, but i could imagine that this is pretty hard to maintain,
once someone decides to change things suddenly?

--
Thanks

Bernd

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bernd Helmle (#9)
Re: Some 8.4 changes needed according to pg_migrator testing

Bernd Helmle <mailings@oopsware.de> writes:

--On 7. Mai 2009 15:32:01 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think the part that goes "en_US" or whatever is actually
quite well standardized (except for good ol' Windows, but we could
provide a mapping from the Unix-style names to Windows names).

I like this idea, but i could imagine that this is pretty hard to maintain,
once someone decides to change things suddenly?

Well, we'd probably want to make sure there was an escape-hatch whereby
you could specify an exact platform-dependent locale name, in case
whatever we were doing didn't work on a particular platform. I just don't
want that to be the norm.

Possibly it would work to first try the locale name as given by the
user, and if that doesn't work (either isn't recognized, or doesn't seem
to use the right encoding) then try to map/modify it.

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

2. There seem to be some corner cases where creating a table in the new
database will not create a toast table even though there was one in the
previous instance. (I'm not 100% convinced that this can happen if we
create and then drop dropped columns, for instance ... but I'm not
convinced it can't happen, either.) If there is a toast table in the
old database then pg_migrator must bring it over because it might
possibly contain live data. However, as toasting.c is presently coded
there is no way to force it to create a toast table. I think we should
change AlterTableCreateToastTable to add a "bool force" parameter.
Alternatively we could add a separate entry point, but the option seems
a bit cleaner.

The bottom line is that the TOAST logic was so fluid on when it thinks a
TOAST table is needed that even if it it consistent from 8.3 -> 8.4, it
would likely break in some later release and it was just safer to add a
boolean.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#8)
Re: Some 8.4 changes needed according to pg_migrator testing

On May 7, 2009, at 12:32 PM, Tom Lane wrote:

Or we could try to make the user-visible locale names
platform-independent in the first place, a la David's not-silly-at-all
suggestion.

Actually, what I was thinking of was using a platform-independent
locale infrastructure: the inconsistency in behavior between platforms
is astonishing and annoying. But this works as a stopgap.

Best,

David

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Alvaro Herrera (#6)
Re: Some 8.4 changes needed according to pg_migrator testing

On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.

The other part of the problem is that there is no guarantee that equally or
similarly named locales behave the same. There will necessarily be a user-
beware factor here, and perhaps blowing up when the locale name is not
recognized is safer and alerts about this fact better than trying to make it
match by force somehow.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: Some 8.4 changes needed according to pg_migrator testing

Peter Eisentraut <peter_e@gmx.net> writes:

On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.

The other part of the problem is that there is no guarantee that equally or
similarly named locales behave the same. There will necessarily be a user-
beware factor here, and perhaps blowing up when the locale name is not
recognized is safer and alerts about this fact better than trying to make it
match by force somehow.

We have never before operated on the assumption that it's okay for
pg_dump output to be locked to particular platforms, and I do not think
we should start now. So we've got to do *something* about this.
(Do you really want PG Windows users to be unable to port to a better
platform?)

I note also that the problem is in pg_dumpall not pg_dump, so pg_restore
is not going to help us --- if someone has a problem his only recourse
would be manual editing of a possibly-monstrous SQL script file.

regards, tom lane

#15Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#14)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

On Thursday 07 May 2009 20:54:37 Alvaro Herrera wrote:

I don't think there's much we can do apart from telling the user not to
move stuff across platforms that do not have equally named locales.

The other part of the problem is that there is no guarantee that equally or
similarly named locales behave the same. There will necessarily be a user-
beware factor here, and perhaps blowing up when the locale name is not
recognized is safer and alerts about this fact better than trying to make it
match by force somehow.

We have never before operated on the assumption that it's okay for
pg_dump output to be locked to particular platforms, and I do not think
we should start now. So we've got to do *something* about this.
(Do you really want PG Windows users to be unable to port to a better
platform?)

I note also that the problem is in pg_dumpall not pg_dump, so pg_restore
is not going to help us --- if someone has a problem his only recourse
would be manual editing of a possibly-monstrous SQL script file.

How about only outputting the LC_COLLATE/CTYPE options for databases
that use a non-default setting? In the common scenarios where you have
the same collation for the whole cluster it would work just like in
previous releases. If you dump and restore a database with default
locale to a cluster initialized with a different locale, the database is
restored with the default locale of the target cluster. But if you
explicitly set a database to use a different locale, that would be
preserved in dumps with the caveat that you'd have to change it manually
if you restore to a cluster on a different platform.

If we go with that, we should probably make the notion of a default
collation explicit. We could set pg_database.datcollate/datctype column
to NULL to mean "use the cluster default".

I don't find the idea of creating mapping tables of locale names very
appetizing. Looking at our encoding name mapping table, there's quite a
few different spellings of different encoding names alone, let alone all
locale names.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Heikki Linnakangas (#15)
Re: Some 8.4 changes needed according to pg_migrator testing

On Friday 08 May 2009 19:09:51 Heikki Linnakangas wrote:

How about only outputting the LC_COLLATE/CTYPE options for databases
that use a non-default setting? In the common scenarios where you have
the same collation for the whole cluster it would work just like in
previous releases. If you dump and restore a database with default
locale to a cluster initialized with a different locale, the database is
restored with the default locale of the target cluster. But if you
explicitly set a database to use a different locale, that would be
preserved in dumps with the caveat that you'd have to change it manually
if you restore to a cluster on a different platform.

That was my latest thinking as well. And it preserves the not-uncommon use
case that you pg_dumpall and restore your database after having initdb'ed with
a different locale/encoding in order to, say, switch to Unicode.

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#16)
Re: Some 8.4 changes needed according to pg_migrator testing

Peter Eisentraut <peter_e@gmx.net> writes:

On Friday 08 May 2009 19:09:51 Heikki Linnakangas wrote:

How about only outputting the LC_COLLATE/CTYPE options for databases
that use a non-default setting?

That was my latest thinking as well.

Logically we should handle database encoding the same way, no?

I'm not really satisfied with this as a long-term solution, but it may
be the most we can hope to squeeze into 8.4. I'm willing to go make the
pg_dumpall changes if there are not objections/better ideas.

If we go with that, we should probably make the notion of a default
collation explicit. We could set pg_database.datcollate/datctype column
to NULL to mean "use the cluster default".

I don't find that to be a good idea, mainly because there *is* no
"cluster default" anymore. I think it's okay for pg_dumpall to treat
the template0 settings as being the "default" for its purposes, but keep
in mind that there is nothing much stopping a DBA from replacing
template0 with another DB that has different settings. We can't have
that break existing DBs.

regards, tom lane

#18Alvaro Herrera
alvherre@commandprompt.com
In reply to: Heikki Linnakangas (#15)
Re: Some 8.4 changes needed according to pg_migrator testing

Heikki Linnakangas wrote:

If we go with that, we should probably make the notion of a default
collation explicit. We could set pg_database.datcollate/datctype column
to NULL to mean "use the cluster default".

I'm not sure how this would work. If I initdb with a certain
locale/encoding and then create a database with default locale/encoding,
how would a restore work on a cluster that has been initdb'd with a
different locale/encoding? If you don't dump the locale specification,
it could very well not match what the user intended.

I don't find the idea of creating mapping tables of locale names very
appetizing. Looking at our encoding name mapping table, there's quite a
few different spellings of different encoding names alone, let alone all
locale names.

Yeah, it doesn't seem pleasant that way. When I proposed it I was
thinking that the mapping would be specified by the user. OTOH since
it's only pg_dumpall output that's the problem, this idea is not be very
useful because there will be no way for a tool to do the replacement.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#18)
Re: Some 8.4 changes needed according to pg_migrator testing

Alvaro Herrera <alvherre@commandprompt.com> writes:

Heikki Linnakangas wrote:

If we go with that, we should probably make the notion of a default
collation explicit. We could set pg_database.datcollate/datctype column
to NULL to mean "use the cluster default".

I'm not sure how this would work. If I initdb with a certain
locale/encoding and then create a database with default locale/encoding,
how would a restore work on a cluster that has been initdb'd with a
different locale/encoding? If you don't dump the locale specification,
it could very well not match what the user intended.

As Peter suggested, that's more or less the point. As long as we still
have pg_dump output include the client_encoding setting, it is sensible
to try to load a dump into a different default database encoding/locale;
and in fact you can not guarantee that the new platform's locales behave
exactly the same anyway.

One problem that just occurred to me is that this solution may not be
adequate for pg_migrator. It will need to check that the new database
has the same "default" settings (where "default" means "those of
template0") as the old installation did. I think that's probably doable
but we'll have to check.

regards, tom lane

#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#19)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

One problem that just occurred to me is that this solution may not be
adequate for pg_migrator. It will need to check that the new database
has the same "default" settings (where "default" means "those of
template0") as the old installation did. I think that's probably doable
but we'll have to check.

I have already coded pg_migrator to get the 8.4 template0 locale and
check for a match of that against pg_controldata of 8.3. Are you saying
I need to spin through all the databases after they are created and
check again?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#20)
Re: Some 8.4 changes needed according to pg_migrator testing

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

One problem that just occurred to me is that this solution may not be
adequate for pg_migrator. It will need to check that the new database
has the same "default" settings (where "default" means "those of
template0") as the old installation did. I think that's probably doable
but we'll have to check.

I have already coded pg_migrator to get the 8.4 template0 locale and
check for a match of that against pg_controldata of 8.3. Are you saying
I need to spin through all the databases after they are created and
check again?

No, that sounds like it should be sufficient ... but you need to check
the template0 encodings match too, assuming that we make encoding and
locale work the same here.

regards, tom lane

#22Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#21)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

One problem that just occurred to me is that this solution may not be
adequate for pg_migrator. It will need to check that the new database
has the same "default" settings (where "default" means "those of
template0") as the old installation did. I think that's probably doable
but we'll have to check.

I have already coded pg_migrator to get the 8.4 template0 locale and
check for a match of that against pg_controldata of 8.3. Are you saying
I need to spin through all the databases after they are created and
check again?

No, that sounds like it should be sufficient ... but you need to check
the template0 encodings match too, assuming that we make encoding and
locale work the same here.

OK, are you suggesting checking the pg_database.encoding for
template0 for both servers because that information isn't in either
pg_controldatas?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#22)
Re: Some 8.4 changes needed according to pg_migrator testing

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

No, that sounds like it should be sufficient ... but you need to check
the template0 encodings match too, assuming that we make encoding and
locale work the same here.

OK, are you suggesting checking the pg_database.encoding for
template0 for both servers because that information isn't in either
pg_controldatas?

Right. Compare old template0 encoding against new template0 encoding,
and old locale information from pg_controldata against new template0
locale information. If these match then it will be safe to let
pg_dumpall omit the corresponding CREATE DATABASE parameters.

regards, tom lane

#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
Re: Some 8.4 changes needed according to pg_migrator testing

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

No, that sounds like it should be sufficient ... but you need to check
the template0 encodings match too, assuming that we make encoding and
locale work the same here.

OK, are you suggesting checking the pg_database.encoding for
template0 for both servers because that information isn't in either
pg_controldatas?

Right. Compare old template0 encoding against new template0 encoding,
and old locale information from pg_controldata against new template0
locale information. If these match then it will be safe to let
pg_dumpall omit the corresponding CREATE DATABASE parameters.

OK, encoding check added to pg_migrator.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +