pg_upgrade and rsync

Started by Bruce Momjianover 11 years ago88 messageshackers

bruce@momjian.us

over 11 years ago

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

However, Josh Berkus reported that he found that hint bits set on the
master server but not on the standby servers made rsync too slow.

The attached pg_upgrade doc patch recommends using wal_log_hints to make
hint bits on the standby match the master. One question I have is
whether hint bits are set by read-only transactions on standby servers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 11 years ago

In reply to: Bruce Momjian (#1)

Re: pg_upgrade and rsync

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: Heikki Linnakangas (#2)

Re: pg_upgrade and rsync

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

over 11 years ago

In reply to: Jim Nasby (#3)

Re: pg_upgrade and rsync

On 01/22/2015 10:34 PM, Jim Nasby wrote:

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.

Then the page will be updated without writing a WAL record. Just like in
the master, if wal_log_hints is off. wal_log_hints works the same on the
master or the standby.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Heikki Linnakangas (#4)

Re: pg_upgrade and rsync

On Thu, Jan 22, 2015 at 10:48:37PM +0200, Heikki Linnakangas wrote:

* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.

Then the page will be updated without writing a WAL record. Just
like in the master, if wal_log_hints is off. wal_log_hints works the
same on the master or the standby.

[ see below for why this entire idea might not work ]

OK, I was confused by your previous "no". It means we do update hint
bits on read-only slave queries --- we just don't WAL log them. In
fact, we can't update hint bits on the standby if we are wal logging
them ---- is that right?

My text was saying:

these differences can be reduced by using a fresh standby and by
enabling <xref linkend="guc-wal-log-hints">. (While
<varname>wal_log_hints</> transfers hint bits from the primary to
standbys, additional hint bits are still set on the standbys by
read-only queries.)

meaning if you don't run any read-only queries on the standby, the files
will be same on master/standby because the hint bits will be the same,
and rsync will not copy the files.

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I would really like to get a way to pg_upgrade the standbys but we have
never really be able to get a solution. Ideally we would update just
the system table files, and if the order of pg_upgrade file renames is
exactly the same, everything else would match, but I can't imagine what
such an API would look like. Have pg_upgrade spit out a list of files
to be copied?

In fact, these are the relfilenodes pg_upgrade preserves:

* While pg_class.oid and pg_class.relfilenode are initially the same
* in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
* FULL. In the new cluster, pg_class.oid and pg_class.relfilenode will
* be the same and will match the old pg_class.oid value. Because of
* this, old/new pg_class.relfilenode values will not match if CLUSTER,
* REINDEX, or VACUUM FULL have been performed in the old cluster.
*
* We control all assignments of pg_type.oid because these oids are stored
* in user composite type values.
*
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
* We control all assignments of pg_authid.oid because these oids are stored
* in pg_largeobject_metadata.

so if the table/index relfilenodes no longer match the oid on the old
cluster, due to CLUSTER, REINDEX, or VACUUM FULL, the file name will not
match on the new cluster and rsync will copy the entire file. In fact,
rsync is going to copy it to the wrong file name, and delete the right
file.

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well. However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: Bruce Momjian (#5)

Re: pg_upgrade and rsync

On 1/22/15 5:43 PM, Bruce Momjian wrote:

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I don't think so. AIUI if the timestamps are different the very next thing it does is run the checksum (which is expensive). So --checksum is just going to hurt.

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well. However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

We've previously talked about required steps before an upgrade; perhaps we need a way to force an OID/relfilenode change on the old server prior to upgrade.

Or, thinking outside the box here... could this type of stuff be done in postgres itself so we could generate wal that's shipped to standby's? That would allow doing this as part of the formal upgrade process without the need for preliminary steps.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Bruce Momjian (#1)

Re: pg_upgrade and rsync

On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

Isn't that a pretty crazy procedure? If you need to shut down all
servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Jim Nasby (#6)

Re: pg_upgrade and rsync

On Thu, Jan 22, 2015 at 06:04:24PM -0600, Jim Nasby wrote:

On 1/22/15 5:43 PM, Bruce Momjian wrote:

This brings up the other problem that the mod times of the files
are likely to be different between master and slave --- should I
recommend to only use rsync --checksum?

I don't think so. AIUI if the timestamps are different the very next
thing it does is run the checksum (which is expensive). So --checksum
is just going to hurt.

Oh, OK, good.

I am going to now conclude that rsync is never going to work for
this, unless we have pg_upgrade preserve relfilenodes as well.
However, I am not even sure that is possible due to conflicts with
system table relfilenodes created in the new cluster.

We've previously talked about required steps before an upgrade;
perhaps we need a way to force an OID/relfilenode change on the old
server prior to upgrade.

Actually, the idea I had forgotten is that we are not rsyncing between
old and new clusters here, but between two servers who are both new
after running pg_upgrade. Their relfilenodes match their oid, and the
oids match, so we should be fine.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Andres Freund (#7)

Re: pg_upgrade and rsync

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

Isn't that a pretty crazy procedure? If you need to shut down all

Yes, it is crazy, but so is pg_upgrade. ;-)

servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Uh, yeah, it will, because the files can get renamed as part of the
upgrade (relfilenode now matches oid), so by running the upgrade, file
names are going to match up. I didn't think rsync could handle renaming
of files without recopying the entire file.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Right, some non-deterministic things would change, but I thought
runnning upgrade on the standby would help. However, now that I think
of it, we don't preserver the database directory name and assume
dbs will will get the same oid and therefore same database directory
name on both, but if you use -j, things are going to happen in random
order. Oops.

Oh well.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again. However, at that point, you might
as well just take a base backup and be done with it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Bruce Momjian (#9)

Re: pg_upgrade and rsync

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

Thanks,

Stephen

#11

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Stephen Frost (#10)

Re: pg_upgrade and rsync

On 1/22/15 8:54 PM, Stephen Frost wrote:

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

I would definitely not feel comfortable using --size-only.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used. This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

--
- David Steele
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: David Steele (#11)

Re: pg_upgrade and rsync

* David Steele (david@pgmasters.net) wrote:

On 1/22/15 8:54 PM, Stephen Frost wrote:

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

I would definitely not feel comfortable using --size-only.

Yeah, it also occurs to me that if any of the catalog tables end up
being the same size between the master and the replica that they
wouldn't get copied and that'd make for one very interesting result, and
not a good one.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used. This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

That's certainly a good point though.

Thanks!

Stephen

#13

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Stephen Frost (#12)

Re: pg_upgrade and rsync

On 1/22/15 10:05 PM, Stephen Frost wrote:

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

Indeed. This was really directed more at what Bruce said:

--
- David Steele
david@pgmasters.net

#14

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: Stephen Frost (#10)

Re: pg_upgrade and rsync

On 1/22/15 7:54 PM, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file?

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Jim Nasby (#14)

Re: pg_upgrade and rsync

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:

On 1/22/15 7:54 PM, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file?

So, two things, I chatted w/ Bruce and he was less concerned about the
lack of being able to match up the timestamps than I was. He has a
point though- the catalog tables are going to get copied anyway since
they won't be hard links and checking that all the other files match in
size and that both the master and the standby are at the same xlog
position should give you a pretty good feeling that everything matches
up sufficiently.

Second, I don't follow what you mean by having pg_upgrade change the
hardlinks to have the same timestamp- for starters, the timestamp is in
the inode and not the actual hard link (two files hard linked together
won't have different timestamps..) and second, the problem isn't on the
master side- it's on the standby side. The standby's files will have
timestamps different from the master and there really isn't much to be
done about that.

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work...

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.

Thanks,

Stephen

#16

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Stephen Frost (#10)

Re: pg_upgrade and rsync

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Andres Freund (#16)

Re: pg_upgrade and rsync

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no? That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby. Even if the relfilenodes were the same across the upgrade,
I don't think it'd be a good idea to run pg_upgrade on the standby and
hope the results match close enough to the master that you can trust
updates to the catalog tables on the standby from the master going
forward to work..

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..

Thanks,

Stephen

#18

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Stephen Frost (#17)

Re: pg_upgrade and rsync

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no?

Don't think so.

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..

I think running pg_upgrade on the standby is a dangerous folly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Andres Freund (#18)

Re: pg_upgrade and rsync

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over. If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

I don't really think I'd want to go there either..

Thanks,

Stephen

#20

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Stephen Frost (#19)

Re: pg_upgrade and rsync

On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over.

Yes. The problem is that the filenames after pg_upgrade aren't the same
as before. Which means that a simple rsync call won't be able to save
anything because the standby's filenames differ. What you can do is
rsync both cluster directories (i.e. the old and the post pg_upgrade
ones) and use rsync -H, right? Without transferring both -H won't detect
the hardlinks as they need to be in the synced set. That's pretty
cumbersome/complicated, and far from cheap.

If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

No. Except that it'd preserve the relfilenodes (i.e. the filenames of
relations) it'd work exactly the same as today. The standby is simply
updated by rsyncing the new data directory of the primary to the
standby.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Andres Freund (#20)

#22

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Stephen Frost (#21)

#23

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Andres Freund (#22)

#24

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Stephen Frost (#23)

#25

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Stephen Frost (#23)

#26

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: Stephen Frost (#15)

#27

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Jim Nasby (#26)

#28

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Jim Nasby (#26)

#29

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: David Steele (#28)

#30

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Andres Freund (#16)

#31

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Bruce Momjian (#25)

#32

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#31)

#33

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Robert Haas (#30)

#34

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Stephen Frost (#32)

#35

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Tom Lane (#33)

#36

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Robert Haas (#35)

#37

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Robert Haas (#34)

#38

Andres Freund

andres@anarazel.de

over 11 years ago

In reply to: Robert Haas (#35)

#39

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Tom Lane (#36)

#40

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: Stephen Frost (#37)

#41

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Jim Nasby (#40)

#42

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Stephen Frost (#27)

#43

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Stephen Frost (#32)

#44

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Bruce Momjian (#42)

#45

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: David Steele (#44)

#46

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Bruce Momjian (#45)

#47

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Jim Nasby (#40)

#48

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Bruce Momjian (#42)

#49

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Bruce Momjian (#43)

#50

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Andres Freund (#18)

#51

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Andres Freund (#18)

#52

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Andres Freund (#18)

#53

Stephen Frost

sfrost@snowman.net

over 11 years ago

In reply to: Josh Berkus (#52)

#54

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Robert Haas (#31)

#55

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: David Steele (#46)

#56

Andrew Dunstan

andrew@dunslane.net

over 11 years ago

In reply to: Josh Berkus (#54)

#57

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Andrew Dunstan (#56)

#58

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Josh Berkus (#54)

#59

Andrew Dunstan

andrew@dunslane.net

over 11 years ago

In reply to: Bruce Momjian (#57)

#60

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Andrew Dunstan (#59)

#61

Josh Berkus

josh@agliodbs.com

over 11 years ago

In reply to: Josh Berkus (#51)

#62

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Bruce Momjian (#57)

#63

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Josh Berkus (#61)

#64

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Bruce Momjian (#55)

#65

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: David Steele (#63)

#66

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Jim Nasby (#65)

#67

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: David Steele (#66)

#68

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Jim Nasby (#67)

#69

Jim Nasby

Jim.Nasby@BlueTreble.com

over 11 years ago

In reply to: David Steele (#68)

#70

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Jim Nasby (#69)

#71

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Josh Berkus (#54)

#72

David Steele

david@pgmasters.net

over 11 years ago

In reply to: Bruce Momjian (#71)

#73

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: David Steele (#72)

#74

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Bruce Momjian (#73)

#75

Vladimir Borodin

root@simply.name

over 11 years ago

In reply to: Bruce Momjian (#74)

#76

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#75)

#77

Vladimir Borodin

root@simply.name

about 11 years ago

In reply to: Bruce Momjian (#76)

#78

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#77)

#79

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Bruce Momjian (#78)

#80

Vladimir Borodin

root@simply.name

about 11 years ago

In reply to: Bruce Momjian (#78)

#81

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#80)

#82

Vladimir Borodin

root@simply.name

about 11 years ago

In reply to: Bruce Momjian (#81)

#83

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#82)

#84

Vladimir Borodin

root@simply.name

about 11 years ago

In reply to: Bruce Momjian (#83)

#85

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#84)

#86

Vladimir Borodin

root@simply.name

about 11 years ago

In reply to: Bruce Momjian (#85)

#87

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Vladimir Borodin (#86)

#88

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Bruce Momjian (#87)

pg_upgrade and rsync

Attachments: