sequences and pg_upgrade

Started by Peter Eisentrautover 9 years ago13 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

I was toying with a couple of ideas that would involve changing the
storage of sequences. (Say, for the sake of discussion, removing the
problematic/useless sequence_name field.) This would cause problems for
pg_upgrade, because pg_upgrade copies the "heap" storage of sequences
like it does for normal tables, and we have no facilities for effecting
any changes during that.

There was a previous discussion in the early days of pg_migrator, which
resulted in the current behavior:
/messages/by-id/20090713220112.GF7933@klana.box

This also alluded to what I think was the last change in the sequence
storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between
versions 8.3 and 8.4. How did pg_upgrade handle that?

I think the other solution mentioned in that thread would also work:
Have pg_upgrade treat sequences more like system catalogs, whose format
changes between major releases, and transferred them via the
dump/restore route. So instead of copying the disk files, issue a
setval call, and the sequence should be all set up.

Am I missing anything?

Attached is a rough patch set that would implement that.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-pg_dump-Separate-table-data-and-sequence-data-object.patchtext/x-patch; name=0001-pg_dump-Separate-table-data-and-sequence-data-object.patchDownload+15-5
0002-pg_dump-Add-sequence-data-option.patchtext/x-patch; name=0002-pg_dump-Add-sequence-data-option.patchDownload+29-2
0003-pg_upgrade-Skip-copying-sequence-files.patchtext/x-patch; name=0003-pg_upgrade-Skip-copying-sequence-files.patchDownload+2-3
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: sequences and pg_upgrade

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

I was toying with a couple of ideas that would involve changing the
storage of sequences. (Say, for the sake of discussion, removing the
problematic/useless sequence_name field.) This would cause problems for
pg_upgrade, because pg_upgrade copies the "heap" storage of sequences
like it does for normal tables, and we have no facilities for effecting
any changes during that.

There was a previous discussion in the early days of pg_migrator, which
resulted in the current behavior:
/messages/by-id/20090713220112.GF7933@klana.box

This also alluded to what I think was the last change in the sequence
storage format (10a3471bed7b57fb986a5be8afdee5f0dda419de) between
versions 8.3 and 8.4. How did pg_upgrade handle that?

I think it probably never did handle that. pg_upgrade doesn't currently
claim to support migrating from 8.3, and the thread you mention shows that
the original attempt at 8.3->8.4 migration crashed-and-burned for numerous
unrelated reasons. We may not have ever got to the point of noticing that
10a3471be also created a problem.

I think the other solution mentioned in that thread would also work:
Have pg_upgrade treat sequences more like system catalogs, whose format
changes between major releases, and transferred them via the
dump/restore route. So instead of copying the disk files, issue a
setval call, and the sequence should be all set up.

Seems reasonable.

If you're proposing to expose --sequence-data as a user-visible option,
the patch set lacks documentation. But I wonder whether it shouldn't
simply be a side-effect of --binary-upgrade. It seems a tad
non-orthogonal for a user switch.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#1)
Re: sequences and pg_upgrade

On Tue, Aug 30, 2016 at 08:46:48AM -0400, Peter Eisentraut wrote:

I think the other solution mentioned in that thread would also work:
Have pg_upgrade treat sequences more like system catalogs, whose format
changes between major releases, and transferred them via the
dump/restore route. So instead of copying the disk files, issue a
setval call, and the sequence should be all set up.

Am I missing anything?

Looks straight-forward to me.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#1)
Re: sequences and pg_upgrade

On 2016-08-30 08:46:48 -0400, Peter Eisentraut wrote:

I was toying with a couple of ideas that would involve changing the
storage of sequences. (Say, for the sake of discussion, removing the
problematic/useless sequence_name field.)

I'd be quite interested to know what changes that are...

I think the other solution mentioned in that thread would also work:
Have pg_upgrade treat sequences more like system catalogs, whose format
changes between major releases, and transferred them via the
dump/restore route. So instead of copying the disk files, issue a
setval call, and the sequence should be all set up.

+1.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Andres Freund (#4)
Re: sequences and pg_upgrade

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, failed

Thank you for the patch.
As I see there are no objections in the discussion, all the patches look clear.

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+	if (dopt.schemaOnly && dopt.sequence_data)
+		getSequenceData(&dopt, tblinfo, numTables, dopt.oids);

Example:
postgres=# create table t(i serial, data text);
postgres=# insert into t(data) values ('aaa');
pg_dump -d postgres --sequence-data --schema-only > ../reviews/dump_pg

Then restore it into newdb and add new value.
newdb=# insert into t(data) values ('aaa');
INSERT 0 1
newdb=# select * from t;
i | data
---+------
2 | aaa

I'm not an experienced user, but I thought that while doing dump/restore
of schema of database we reset all the data. Why should the table in newly
created (via pg_restore) database have non-default sequence value?

I also did some other tests and all of them were passed.

One more thing to do is a documentation for the new option.
You should update help() function in pg_dump.c and also add some
notes to pg_dump.sgml and probably to pgupgrade.sgml.

The new status of this patch is: Waiting on Author

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Anastasia Lubennikova (#5)
Re: sequences and pg_upgrade

On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+	if (dopt.schemaOnly && dopt.sequence_data)
+		getSequenceData(&dopt, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema. This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Peter Eisentraut (#6)
Re: sequences and pg_upgrade

15.09.2016 15:29, Peter Eisentraut:

On 9/14/16 8:52 AM, Anastasia Lubennikova wrote:

Could you clarify, please, why do we dump sequence in schemaOnly mode?
+	if (dopt.schemaOnly && dopt.sequence_data)
+		getSequenceData(&dopt, tblinfo, numTables, dopt.oids);

The point of this patch is that with the new option, you can dump
sequence data (but not table data) alongside with the schema. This
would be used by pg_upgrade for the reasons described at the beginning
of the thread.

Oh, thank you. Now I see.
Somewhy I thought that it *always* dumps sequence data in schemaOnly mode.
Which is definitely not true.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Anastasia Lubennikova (#7)
Re: sequences and pg_upgrade

Here is an updated patch set. Compared to the initial set, I have
changed pg_dump's sorting priorities so that sequence data is always
after table data. This would otherwise have introduced a problem
because sortDataAndIndexObjectsBySize() only considers consecutive
DO_TABLE_DATA entries. Also, I have removed the separate
--sequence-data switch from pg_dump and made it implicit in
--binary-upgrade. (So the previous patches 0002 and 0003 have been
combined, because it's no longer a separate feature.)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v2-0001-pg_dump-Separate-table-and-sequence-data-object-t.patchinvalid/octet-stream; name=v2-0001-pg_dump-Separate-table-and-sequence-data-object-t.patchDownload+37-27
v2-0002-pg_upgrade-Upgrade-sequence-data-via-pg_dump.patchinvalid/octet-stream; name=v2-0002-pg_upgrade-Upgrade-sequence-data-via-pg_dump.patchDownload+24-7
#9Anastasia Lubennikova
a.lubennikova@postgrespro.ru
In reply to: Peter Eisentraut (#8)
Re: sequences and pg_upgrade

23.09.2016 21:06, Peter Eisentraut:

Here is an updated patch set. Compared to the initial set, I have
changed pg_dump's sorting priorities so that sequence data is always
after table data. This would otherwise have introduced a problem
because sortDataAndIndexObjectsBySize() only considers consecutive
DO_TABLE_DATA entries. Also, I have removed the separate
--sequence-data switch from pg_dump and made it implicit in
--binary-upgrade. (So the previous patches 0002 and 0003 have been
combined, because it's no longer a separate feature.)

The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this
option,
because it's quite non intuitive behaviour...
/* dump sequence data even in schema-only mode */

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Michael Paquier
michael@paquier.xyz
In reply to: Anastasia Lubennikova (#9)
Re: sequences and pg_upgrade

On Sat, Oct 1, 2016 at 1:50 AM, Anastasia Lubennikova
<a.lubennikova@postgrespro.ru> wrote:

23.09.2016 21:06, Peter Eisentraut:

Here is an updated patch set. Compared to the initial set, I have
changed pg_dump's sorting priorities so that sequence data is always
after table data. This would otherwise have introduced a problem
because sortDataAndIndexObjectsBySize() only considers consecutive
DO_TABLE_DATA entries. Also, I have removed the separate
--sequence-data switch from pg_dump and made it implicit in
--binary-upgrade. (So the previous patches 0002 and 0003 have been
combined, because it's no longer a separate feature.)

The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this
option,
because it's quite non intuitive behaviour...
/* dump sequence data even in schema-only mode */

Moved to next CF. This is fresh.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Anastasia Lubennikova (#9)
Re: sequences and pg_upgrade

On 9/30/16 12:50 PM, Anastasia Lubennikova wrote:

The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this
option,
because it's quite non intuitive behaviour...
/* dump sequence data even in schema-only mode */

Here are rebased patches.

Regarding your question: The initial patch had a separate option for
this behavior, which was then used by pg_upgrade. It was commented that
this option is not useful outside of pg_upgrade, so it doesn't need to
be exposed as a user-facing option. I agreed with that and removed the
option. We can always add the option back easily if someone really
wants it, but so far no use case has been presented. So I suggest we
proceed with this proposal ignoring whether this option is exposed or not.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v3-0001-pg_dump-Separate-table-and-sequence-data-object-t.patchtext/x-patch; name=v3-0001-pg_dump-Separate-table-and-sequence-data-object-t.patchDownload+25-16
v3-0002-pg_upgrade-Upgrade-sequence-data-via-pg_dump.patchtext/x-patch; name=v3-0002-pg_upgrade-Upgrade-sequence-data-via-pg_dump.patchDownload+24-7
#12Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#11)
Re: sequences and pg_upgrade

On Mon, Oct 31, 2016 at 9:53 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 9/30/16 12:50 PM, Anastasia Lubennikova wrote:

The patches are good, no complaints.
But again, I have the same question.
I was confused, why do we always dump sequence data,
because I'd overlooked the --sequence-data key. I'd rather leave this
option,
because it's quite non intuitive behaviour...
/* dump sequence data even in schema-only mode */

Here are rebased patches.

Regarding your question: The initial patch had a separate option for
this behavior, which was then used by pg_upgrade. It was commented that
this option is not useful outside of pg_upgrade, so it doesn't need to
be exposed as a user-facing option. I agreed with that and removed the
option. We can always add the option back easily if someone really
wants it, but so far no use case has been presented. So I suggest we
proceed with this proposal ignoring whether this option is exposed or not.

I had a look at those fresh patches, and 0001 looks like a good thing.
This makes the separation between sequences and table data dump
cleaner. I ran some tests with pg_upgrade and 0002, and things are
clear. And +1 for the way done in the patch, aka no options of pg_dump
exposed to user, still keep the option tracking as a separate value.

One small thing here:
 static void
-getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables, bool oids)
+getTableData(DumpOptions *dopt, TableInfo *tblinfo, int numTables,
bool oids, char relkind)
 {
    int         i;
    for (i = 0; i < numTables; i++)
    {
-       if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA)
+       if (tblinfo[i].dobj.dump & DUMP_COMPONENT_DATA &&
+           (!relkind || tblinfo[i].relkind == relkind))
            makeTableDataInfo(dopt, &(tblinfo[i]), oids)

One idea here would be to have an extra routine, getSequenceData and
not extend getTableData() with relkind as extra argument. I am fine
with the way patch does things, so I just switched the patch as ready
for committer.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#12)
Re: sequences and pg_upgrade

On 11/2/16 2:34 AM, Michael Paquier wrote:

I had a look at those fresh patches, and 0001 looks like a good thing.
This makes the separation between sequences and table data dump
cleaner. I ran some tests with pg_upgrade and 0002, and things are
clear. And +1 for the way done in the patch, aka no options of pg_dump
exposed to user, still keep the option tracking as a separate value.

Committed, thanks.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers