pg_upgrade and rsync

Started by Bruce Momjianabout 11 years ago88 messageshackers
Jump to latest
#1Bruce Momjian
bruce@momjian.us

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

However, Josh Berkus reported that he found that hint bits set on the
master server but not on the standby servers made rsync too slow.

The attached pg_upgrade doc patch recommends using wal_log_hints to make
hint bits on the standby match the master. One question I have is
whether hint bits are set by read-only transactions on standby servers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachments:

rsync.difftext/x-diff; charset=us-asciiDownload+8-8
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#1)
Re: pg_upgrade and rsync

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Heikki Linnakangas (#2)
Re: pg_upgrade and rsync

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Jim Nasby (#3)
Re: pg_upgrade and rsync

On 01/22/2015 10:34 PM, Jim Nasby wrote:

On 1/22/15 2:19 PM, Heikki Linnakangas wrote:

On 01/22/2015 09:20 PM, Bruce Momjian wrote:

One question I have is whether hint bits are set by read-only
transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

/*
* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.

Then the page will be updated without writing a WAL record. Just like in
the master, if wal_log_hints is off. wal_log_hints works the same on the
master or the standby.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#4)
Re: pg_upgrade and rsync

On Thu, Jan 22, 2015 at 10:48:37PM +0200, Heikki Linnakangas wrote:

* If we need to protect hint bit updates from torn writes, WAL-log a
* full page image of the page. This full page image is only necessary
* if the hint bit update is the first change to the page since the
* last checkpoint.
*
* We don't check full_page_writes here because that logic is included
* when we call XLogInsert() since the value changes dynamically.
*/
if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
{
/*
* If we're in recovery we cannot dirty a page because of a hint.
* We can set the hint, just not dirty the page as a result so the
* hint is lost when we evict the page or shutdown.
*
* See src/backend/storage/page/README for longer discussion.
*/
if (RecoveryInProgress())
return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.

Then the page will be updated without writing a WAL record. Just
like in the master, if wal_log_hints is off. wal_log_hints works the
same on the master or the standby.

[ see below for why this entire idea might not work ]

OK, I was confused by your previous "no". It means we do update hint
bits on read-only slave queries --- we just don't WAL log them. In
fact, we can't update hint bits on the standby if we are wal logging
them ---- is that right?

My text was saying:

these differences can be reduced by using a fresh standby and by
enabling <xref linkend="guc-wal-log-hints">. (While
<varname>wal_log_hints</> transfers hint bits from the primary to
standbys, additional hint bits are still set on the standbys by
read-only queries.)

meaning if you don't run any read-only queries on the standby, the files
will be same on master/standby because the hint bits will be the same,
and rsync will not copy the files.

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I would really like to get a way to pg_upgrade the standbys but we have
never really be able to get a solution. Ideally we would update just
the system table files, and if the order of pg_upgrade file renames is
exactly the same, everything else would match, but I can't imagine what
such an API would look like. Have pg_upgrade spit out a list of files
to be copied?

In fact, these are the relfilenodes pg_upgrade preserves:

* While pg_class.oid and pg_class.relfilenode are initially the same
* in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
* FULL. In the new cluster, pg_class.oid and pg_class.relfilenode will
* be the same and will match the old pg_class.oid value. Because of
* this, old/new pg_class.relfilenode values will not match if CLUSTER,
* REINDEX, or VACUUM FULL have been performed in the old cluster.
*
* We control all assignments of pg_type.oid because these oids are stored
* in user composite type values.
*
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
* We control all assignments of pg_authid.oid because these oids are stored
* in pg_largeobject_metadata.

so if the table/index relfilenodes no longer match the oid on the old
cluster, due to CLUSTER, REINDEX, or VACUUM FULL, the file name will not
match on the new cluster and rsync will copy the entire file. In fact,
rsync is going to copy it to the wrong file name, and delete the right
file.

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well. However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#5)
Re: pg_upgrade and rsync

On 1/22/15 5:43 PM, Bruce Momjian wrote:

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I don't think so. AIUI if the timestamps are different the very next thing it does is run the checksum (which is expensive). So --checksum is just going to hurt.

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well. However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

We've previously talked about required steps before an upgrade; perhaps we need a way to force an OID/relfilenode change on the old server prior to upgrade.

Or, thinking outside the box here... could this type of stuff be done in postgres itself so we could generate wal that's shipped to standby's? That would allow doing this as part of the formal upgrade process without the need for preliminary steps.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#1)
Re: pg_upgrade and rsync

On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

Isn't that a pretty crazy procedure? If you need to shut down all
servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#6)
Re: pg_upgrade and rsync

On Thu, Jan 22, 2015 at 06:04:24PM -0600, Jim Nasby wrote:

On 1/22/15 5:43 PM, Bruce Momjian wrote:

This brings up the other problem that the mod times of the files
are likely to be different between master and slave --- should I
recommend to only use rsync --checksum?

I don't think so. AIUI if the timestamps are different the very next
thing it does is run the checksum (which is expensive). So --checksum
is just going to hurt.

Oh, OK, good.

I am going to now conclude that rsync is never going to work for
this, unless we have pg_upgrade preserve relfilenodes as well.
However, I am not even sure that is possible due to conflicts with
system table relfilenodes created in the new cluster.

We've previously talked about required steps before an upgrade;
perhaps we need a way to force an OID/relfilenode change on the old
server prior to upgrade.

Actually, the idea I had forgotten is that we are not rsyncing between
old and new clusters here, but between two servers who are both new
after running pg_upgrade. Their relfilenodes match their oid, and the
oids match, so we should be fine.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#7)
Re: pg_upgrade and rsync

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:

It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

Isn't that a pretty crazy procedure? If you need to shut down all

Yes, it is crazy, but so is pg_upgrade. ;-)

servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Uh, yeah, it will, because the files can get renamed as part of the
upgrade (relfilenode now matches oid), so by running the upgrade, file
names are going to match up. I didn't think rsync could handle renaming
of files without recopying the entire file.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Right, some non-deterministic things would change, but I thought
runnning upgrade on the standby would help. However, now that I think
of it, we don't preserver the database directory name and assume
dbs will will get the same oid and therefore same database directory
name on both, but if you use -j, things are going to happen in random
order. Oops.

Oh well.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again. However, at that point, you might
as well just take a base backup and be done with it.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#9)
Re: pg_upgrade and rsync

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

Thanks,

Stephen

#11David Steele
david@pgmasters.net
In reply to: Stephen Frost (#10)
Re: pg_upgrade and rsync

On 1/22/15 8:54 PM, Stephen Frost wrote:

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

I would definitely not feel comfortable using --size-only.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used. This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

--
- David Steele
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Stephen Frost
sfrost@snowman.net
In reply to: David Steele (#11)
Re: pg_upgrade and rsync

* David Steele (david@pgmasters.net) wrote:

On 1/22/15 8:54 PM, Stephen Frost wrote:

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only. I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic. It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else. I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.

I would definitely not feel comfortable using --size-only.

Yeah, it also occurs to me that if any of the catalog tables end up
being the same size between the master and the replica that they
wouldn't get copied and that'd make for one very interesting result, and
not a good one.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used. This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

That's certainly a good point though.

Thanks!

Stephen

#13David Steele
david@pgmasters.net
In reply to: Stephen Frost (#12)
Re: pg_upgrade and rsync

On 1/22/15 10:05 PM, Stephen Frost wrote:

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used. This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

Indeed. This was really directed more at what Bruce said:

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again. However, at that point, you might
as well just take a base backup and be done with it.

--
- David Steele
david@pgmasters.net

#14Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Stephen Frost (#10)
Re: pg_upgrade and rsync

On 1/22/15 7:54 PM, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file?

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Stephen Frost
sfrost@snowman.net
In reply to: Jim Nasby (#14)
Re: pg_upgrade and rsync

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:

On 1/22/15 7:54 PM, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ. You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had the same timestamp as the original file?

So, two things, I chatted w/ Bruce and he was less concerned about the
lack of being able to match up the timestamps than I was. He has a
point though- the catalog tables are going to get copied anyway since
they won't be hard links and checking that all the other files match in
size and that both the master and the standby are at the same xlog
position should give you a pretty good feeling that everything matches
up sufficiently.

Second, I don't follow what you mean by having pg_upgrade change the
hardlinks to have the same timestamp- for starters, the timestamp is in
the inode and not the actual hard link (two files hard linked together
won't have different timestamps..) and second, the problem isn't on the
master side- it's on the standby side. The standby's files will have
timestamps different from the master and there really isn't much to be
done about that.

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work...

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.

Thanks,

Stephen

#16Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#10)
Re: pg_upgrade and rsync

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#16)
Re: pg_upgrade and rsync

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no? That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby. Even if the relfilenodes were the same across the upgrade,
I don't think it'd be a good idea to run pg_upgrade on the standby and
hope the results match close enough to the master that you can trust
updates to the catalog tables on the standby from the master going
forward to work..

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..

Thanks,

Stephen

#18Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#17)
Re: pg_upgrade and rsync

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:

* Bruce Momjian (bruce@momjian.us) wrote:

On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be. If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links. rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no?

Don't think so.

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..

I think running pg_upgrade on the standby is a dangerous folly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#18)
Re: pg_upgrade and rsync

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over. If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

I don't really think I'd want to go there either..

Thanks,

Stephen

#20Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#19)
Re: pg_upgrade and rsync

On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:

* Andres Freund (andres@2ndquadrant.com) wrote:

On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:

That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over.

Yes. The problem is that the filenames after pg_upgrade aren't the same
as before. Which means that a simple rsync call won't be able to save
anything because the standby's filenames differ. What you can do is
rsync both cluster directories (i.e. the old and the post pg_upgrade
ones) and use rsync -H, right? Without transferring both -H won't detect
the hardlinks as they need to be in the synced set. That's pretty
cumbersome/complicated, and far from cheap.

If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

No. Except that it'd preserve the relfilenodes (i.e. the filenames of
relations) it'd work exactly the same as today. The standby is simply
updated by rsyncing the new data directory of the primary to the
standby.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#20)
#22Andres Freund
andres@anarazel.de
In reply to: Stephen Frost (#21)
#23Stephen Frost
sfrost@snowman.net
In reply to: Andres Freund (#22)
#24Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#23)
#26Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Stephen Frost (#15)
#27Stephen Frost
sfrost@snowman.net
In reply to: Jim Nasby (#26)
#28David Steele
david@pgmasters.net
In reply to: Jim Nasby (#26)
#29Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Steele (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#16)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#25)
#32Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#30)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#32)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#37Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#34)
#38Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#35)
#39Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#36)
#40Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Stephen Frost (#37)
#41David Steele
david@pgmasters.net
In reply to: Jim Nasby (#40)
#42Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#27)
#43Bruce Momjian
bruce@momjian.us
In reply to: Stephen Frost (#32)
#44David Steele
david@pgmasters.net
In reply to: Bruce Momjian (#42)
#45Bruce Momjian
bruce@momjian.us
In reply to: David Steele (#44)
#46David Steele
david@pgmasters.net
In reply to: Bruce Momjian (#45)
#47Stephen Frost
sfrost@snowman.net
In reply to: Jim Nasby (#40)
#48Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#42)
#49Stephen Frost
sfrost@snowman.net
In reply to: Bruce Momjian (#43)
#50Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#18)
#51Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#18)
#52Josh Berkus
josh@agliodbs.com
In reply to: Andres Freund (#18)
#53Stephen Frost
sfrost@snowman.net
In reply to: Josh Berkus (#52)
#54Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#31)
#55Bruce Momjian
bruce@momjian.us
In reply to: David Steele (#46)
#56Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#54)
#57Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#56)
#58Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#54)
#59Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#57)
#60Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#59)
#61Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#51)
#62David Steele
david@pgmasters.net
In reply to: Bruce Momjian (#57)
#63David Steele
david@pgmasters.net
In reply to: Josh Berkus (#61)
#64David Steele
david@pgmasters.net
In reply to: Bruce Momjian (#55)
#65Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Steele (#63)
#66David Steele
david@pgmasters.net
In reply to: Jim Nasby (#65)
#67Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Steele (#66)
#68David Steele
david@pgmasters.net
In reply to: Jim Nasby (#67)
#69Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Steele (#68)
#70David Steele
david@pgmasters.net
In reply to: Jim Nasby (#69)
#71Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#54)
#72David Steele
david@pgmasters.net
In reply to: Bruce Momjian (#71)
#73Bruce Momjian
bruce@momjian.us
In reply to: David Steele (#72)
#74Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#73)
#75Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#74)
#76Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#75)
#77Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#76)
#78Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#77)
#79Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#78)
#80Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#78)
#81Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#80)
#82Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#81)
#83Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#82)
#84Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#83)
#85Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#84)
#86Vladimir Borodin
root@simply.name
In reply to: Bruce Momjian (#85)
#87Bruce Momjian
bruce@momjian.us
In reply to: Vladimir Borodin (#86)
#88Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#87)