pg_rewind, a tool for resynchronizing an old master after failover

Started by Heikki Linnakangasover 12 years ago29 messages
#1Heikki Linnakangas
hlinnakangas@vmware.com

Hi,

I've been hacking on a tool to allow resynchronizing an old master
server after failover. The need to do a full backup/restore has been a
common complaint ever since we've had streaming replication. I saw on
the wiki that this was discussed in the dev meeting; too bad I couldn't
make it.

In a nutshell, the idea is to do copy everything that has changed
between the cluster, like rsync does, but instead of reading through all
files, use the WAL to determine what has changed. Here's a somewhat more
detailed explanation, from the README:

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except
for the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For
each WAL record, make a note of the data blocks that are touched. This
yields a list of all the data blocks that were changed in the old
cluster, after the new cluster forked off.

2. Copy all those changed blocks from the new master to the old master.

3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.

4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just
creates a backup label file indicating that when PostgreSQL is started,
it will start replay from that checkpoint and apply all the required WAL)

Please take a look: https://github.com/vmware/pg_rewind

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 23 May 2013 12:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

Please take a look: https://github.com/vmware/pg_rewind

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

As a result, while it sounds interesting, people should be aware of
that and I suggest we shouldn't discuss that code on this list, to
avoid any disputes should we decide to include a similar facility in
core Postgres in the future.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Thom Brown
thom@linux.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

Hi,

I've been hacking on a tool to allow resynchronizing an old master server
after failover. The need to do a full backup/restore has been a common
complaint ever since we've had streaming replication. I saw on the wiki that
this was discussed in the dev meeting; too bad I couldn't make it.

In a nutshell, the idea is to do copy everything that has changed between
the cluster, like rsync does, but instead of reading through all files, use
the WAL to determine what has changed. Here's a somewhat more detailed
explanation, from the README:

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except for
the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

2. Copy all those changed blocks from the new master to the old master.

3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.

4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just
creates a backup label file indicating that when PostgreSQL is started, it
will start replay from that checkpoint and apply all the required WAL)

Please take a look: https://github.com/vmware/pg_rewind

6 instances set up:

[Primary (5530)]
|
---[Standby 1 (5531)]
|
---[Standby 2 (5532)]
|
---[Standby 3 (5533)]
|
---[Standby 4 (5534)]
|
---[Standby 5 (5535)]

1) Created a table on the primary with some data.
2) Promoted Standby 1
3) Cleanly shut down Primary
4) pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres'

Last common WAL position: 0/30227F8 on timeline 1
Last common checkpoint at 0/30227F8 on timeline 1
error reading xlog record: record with zero length at 0/3022860
Done!

Contents of pg_xlog directory in Primary and Standby 1:
thom@swift /tmp $ ls -l primary/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:52 00000002.history
drwx------ 2 thom users 80 May 23 09:52 archive_status
thom@swift /tmp $ ls -l standby1/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:50 00000002.history
drwx------ 2 thom users 80 May 23 09:50 archive_status

5) Changed recovery.done in primary to point its primary_conninfo port
to 5531 (that of Standby 1).
6) Renamed it to .conf.
7) Changed postgresql.conf to set the port back to its original one
(as pg_rewind has caused it to match that of Standby 1)
8) Start Primary

Latest log in primary reads:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: invalid xl_info in checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/tmp/primary/backup_label".
LOG: startup process (PID 31503) exited with exit code 1
LOG: aborting startup due to startup process failure

9) Okay, so I'll delete that label and try again. Now all is well:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/3022828
LOG: record with zero length at 0/3041A60
LOG: consistent recovery state reached at 0/3041A60
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 2

10) Connect to Standby 1 and insert more rows into our original table.
11) Connect to Primary and those rows are appearing.

--
Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#2)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Yes in asynchronous replication case. But in synchronous replication case,
after WAL forks, hint bits would not be set if their corresponding commit record
is not replicated to the standby. The transaction commit keeps waiting
for the reply
from the standby before updating clog. So, this data corruption would not happen
in sync case.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Fujii Masao (#5)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 11:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, May 23, 2013 at 8:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Yes in asynchronous replication case. But in synchronous replication case,
after WAL forks, hint bits would not be set if their corresponding commit record
is not replicated to the standby. The transaction commit keeps waiting
for the reply
from the standby before updating clog. So, this data corruption would not happen
in sync case.

Not necessarily. SyncRepWaitForLSN() can be interrupted via a cancel signal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Robert Haas (#2)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 23.05.2013 07:55, Robert Haas wrote:

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Bummer, you're right. Hmm, if you have checksums enabled, however, we'll
WAL log a full-page every time a page is dirtied for setting a hint bit,
which fixes the problem. So, there's a caveat with pg_rewind; you must
have checksums enabled.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Simon Riggs (#3)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 23.05.2013 08:03, Simon Riggs wrote:

On 23 May 2013 12:10, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

Please take a look: https://github.com/vmware/pg_rewind

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

We have a lot of code in PostgreSQL source tree with different copyright
notices, and there's no problem with that as long as the coe is licensed
under the PostgreSQL license. For patches that add or modify code in
PostgreSQL, we generally have copyright notices with just PGDG, to avoid
having a long list of copyright notices of a lot of companies and
individuals on every file. I'm no lawyer, but I believe there's no
difference from the legal point of view.

As a result, while it sounds interesting, people should be aware of
that and I suggest we shouldn't discuss that code on this list, to
avoid any disputes should we decide to include a similar facility in
core Postgres in the future.

That's just paranoia. There are a lot of tools out there on pgfoundry,
with various copyright holders and even difference licenses, and it's
fine to talk about all those on this list. Besides, the code is licensed
under the PostgreSQL license, so if someone decides we should have this
e.g in contrib, you can just grab the sources and commit. Thirdly,
there's no reason to refrain from even discussing this, even if someone
would include a similar facility in core Postgres - this is about
copyrights, not patents (and yes, this contribution has been cleared by
VMware legal department; VMware doesn't hold any patents on this)

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Heikki Linnakangas (#7)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 23.05.2013 07:55, Robert Haas wrote:

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For
each
WAL record, make a note of the data blocks that are touched. This yields
a
list of all the data blocks that were changed in the old cluster, after
the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Bummer, you're right. Hmm, if you have checksums enabled, however, we'll
WAL log a full-page every time a page is dirtied for setting a hint bit,
which fixes the problem. So, there's a caveat with pg_rewind; you must have
checksums enabled.

I was quite impressed with the idea, but hint bits indeed are problem. I
realised the same issue also applies to the other idea that Fujii-san and
others have suggested about waiting for dirty buffers to be written until
the WAL is received at the standby. But since that idea would anyways need
to be implemented in the core, we could teach SetHintBits() to return false
unless the corresponding commit WAL records are written to the standby
first.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

#10Thom Brown
thom@linux.com
In reply to: Thom Brown (#4)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 23 May 2013 10:03, Thom Brown <thom@linux.com> wrote:

On 23 May 2013 07:10, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

Hi,

I've been hacking on a tool to allow resynchronizing an old master server
after failover. The need to do a full backup/restore has been a common
complaint ever since we've had streaming replication. I saw on the wiki that
this was discussed in the dev meeting; too bad I couldn't make it.

In a nutshell, the idea is to do copy everything that has changed between
the cluster, like rsync does, but instead of reading through all files, use
the WAL to determine what has changed. Here's a somewhat more detailed
explanation, from the README:

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except for
the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

2. Copy all those changed blocks from the new master to the old master.

3. Copy all other files like clog, conf files etc. from the new cluster
to old. Everything except the relation files.

4. Apply the WAL from the new master, starting from the checkpoint
created at failover. (pg_rewind doesn't actually apply the WAL, it just
creates a backup label file indicating that when PostgreSQL is started, it
will start replay from that checkpoint and apply all the required WAL)

Please take a look: https://github.com/vmware/pg_rewind

6 instances set up:

[Primary (5530)]
|
---[Standby 1 (5531)]
|
---[Standby 2 (5532)]
|
---[Standby 3 (5533)]
|
---[Standby 4 (5534)]
|
---[Standby 5 (5535)]

1) Created a table on the primary with some data.
2) Promoted Standby 1
3) Cleanly shut down Primary
4) pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres'

Last common WAL position: 0/30227F8 on timeline 1
Last common checkpoint at 0/30227F8 on timeline 1
error reading xlog record: record with zero length at 0/3022860
Done!

Contents of pg_xlog directory in Primary and Standby 1:
thom@swift /tmp $ ls -l primary/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:52 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:52 00000002.history
drwx------ 2 thom users 80 May 23 09:52 archive_status
thom@swift /tmp $ ls -l standby1/pg_xlog/
total 49156
-rw------- 1 thom users 16777216 May 23 09:49 000000010000000000000002
-rw------- 1 thom users 16777216 May 23 09:50 000000010000000000000003
-rw------- 1 thom users 16777216 May 23 09:52 000000020000000000000003
-rw------- 1 thom users 41 May 23 09:50 00000002.history
drwx------ 2 thom users 80 May 23 09:50 archive_status

5) Changed recovery.done in primary to point its primary_conninfo port
to 5531 (that of Standby 1).
6) Renamed it to .conf.
7) Changed postgresql.conf to set the port back to its original one
(as pg_rewind has caused it to match that of Standby 1)
8) Start Primary

Latest log in primary reads:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: invalid xl_info in checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/tmp/primary/backup_label".
LOG: startup process (PID 31503) exited with exit code 1
LOG: aborting startup due to startup process failure

9) Okay, so I'll delete that label and try again. Now all is well:

LOG: database system was interrupted; last known up at 2013-05-23 09:50:34 EDT
LOG: entering standby mode
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/3022828
LOG: record with zero length at 0/3041A60
LOG: consistent recovery state reached at 0/3041A60
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 2

10) Connect to Standby 1 and insert more rows into our original table.
11) Connect to Primary and those rows are appearing.

By the way, without any data inserted I get:

thom@swift /tmp $ pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres' -v
connected to remote server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000002.history", length 41
Last common WAL position: 0/3000000 on timeline 1
pg_rewind: xlogreader.c:214: XLogReadRecord: Assertion `((RecPtr) %
8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) &
~((intptr_t) ((8) - 1))))' failed.
Aborted (core dumped)

And this magically goes away with -k on initdb.

--
Thom

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@2ndquadrant.com
In reply to: Thom Brown (#10)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 2013-05-24 10:57:22 -0400, Thom Brown wrote:

By the way, without any data inserted I get:

thom@swift /tmp $ pg_rewind --target-pgdata=/tmp/primary
--source-server='host=localhost port=5531 dbname=postgres' -v
connected to remote server
fetched file "global/pg_control", length 8192
fetched file "pg_xlog/00000002.history", length 41
Last common WAL position: 0/3000000 on timeline 1
pg_rewind: xlogreader.c:214: XLogReadRecord: Assertion `((RecPtr) %
8192 >= (((intptr_t) ((sizeof(XLogPageHeaderData))) + ((8) - 1)) &
~((intptr_t) ((8) - 1))))' failed.
Aborted (core dumped)

And this magically goes away with -k on initdb.

That looks like this is a bug independent of -k. pg_rewind tries to read
the beginning of a page but xlogreader doesn't allow that atm. The
likely reason this doesn't happen with -k is autovacuum probably hints
some pages which are logged when checksums are enabled.

Should we make that assert
Assert((RecPtr % XLOG_BLCKSZ == 0) ||XRecOffIsValid(RecPtr));
instead?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Nasby
jim@nasby.net
In reply to: Pavan Deolasee (#9)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 5/23/13 12:51 PM, Pavan Deolasee wrote:

On Thu, May 23, 2013 at 11:10 PM, Heikki Linnakangas <hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>> wrote:

On 23.05.2013 07:55, Robert Haas wrote:

On Thu, May 23, 2013 at 7:10 AM, Heikki Linnakangas
<hlinnakangas@vmware.com <mailto:hlinnakangas@vmware.com>> wrote:

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For each
WAL record, make a note of the data blocks that are touched. This yields a
list of all the data blocks that were changed in the old cluster, after the
new cluster forked off.

Suppose that a transaction is open and has written tuples at the point
where WAL forks. After WAL forks, the transaction commits. Then, it
hints some of the tuples that it wrote. There is no record in WAL
that those blocks are changed, but failing to revert them leads to
data corruption.

Bummer, you're right. Hmm, if you have checksums enabled, however, we'll WAL log a full-page every time a page is dirtied for setting a hint bit, which fixes the problem. So, there's a caveat with pg_rewind; you must have checksums enabled.

I was quite impressed with the idea, but hint bits indeed are problem. I realised the same issue also applies to the other idea that Fujii-san and others have suggested about waiting for dirty buffers to be written until the WAL is received at the standby. But since that idea would anyways need to be implemented in the core, we could teach SetHintBits() to return false unless the corresponding commit WAL records are written to the standby first.

Would it be useful to turn this problem around? Heikki's proposal is based on being able to track (without fail) all blocks that have been modified; could we instead track blocks that we know for certain have NOT been modified? The difference there is that we can be more conservative in stating "we know this block is the same"; worst case we just do some extra copying.

<thinking out loud...>
One possibility would be to use file timestamps. For files that are past a certain age on both master and slave, if we force the timestamp on the slave to match the timestamp from the master, rsync will be able to safely ignore that file. I realize that's not as good as block-level detection, but it's probably a tremendous improvement over what we have today. The critical thing in this case would be to *guarantee* that the timestamps did not match on modified files.

Of course, screwing around with FS timestamps in this manner is pretty grotty, at least on a live system. Perhaps there's some way to track that info separately and then use it to change file timestamps before running rsync. Or if we are able to define a list of files that we think may have changed, we just feed that list to rsync and let it do the heavy lifting.
--
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Amit kapila
amit.kapila@huawei.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thursday, May 23, 2013 4:40 PM Heikki Linnakangas wrote:

Hi,

I've been hacking on a tool to allow resynchronizing an old master
server after failover. The need to do a full backup/restore has been a
common complaint ever since we've had streaming replication. I saw on
the wiki that this was discussed in the dev meeting; too bad I couldn't
make it.

In a nutshell, the idea is to do copy everything that has changed
between the cluster, like rsync does, but instead of reading through all
files, use the WAL to determine what has changed. Here's a somewhat more
detailed explanation, from the README:

This is really a nice idea and an important requirement from many users.

Does this tool handle all kind of operations user would have performaed after
forking of new cluster or it would mandate that user should not have performed certain kind of
operations on old cluster after new cluster forked off?

Theory of operation
-------------------

The basic idea is to copy everything from the new cluster to old, except
for the blocks that we know to be the same.

1. Scan the WAL log of the old cluster, starting from the point where
the new cluster's timeline history forked off from the old cluster. For
each WAL record, make a note of the data blocks that are touched. This
yields a list of all the data blocks that were changed in the old
cluster, after the new cluster forked off.

a. How about if after forking off new cluster, a new relation gets created in old cluster,
then it might not find the blocks of same in new cluster, if new cluster also got
the same name relation as old but with different data, it might get error when it start
to replay WAL of new master as mentioned in point-4.

b. How about if after forking off new cluster, an update occurs such that it has to put new
row in new heap page, now in WAL it will have mention of old and new row blocks (blk-1 and blk-2).
It might be the case new cluster will not have blk-2, so only blk-1 will be copied from new cluster,
in such scenario, it will have 2 valid versions of same row.

With Regards,
Amit Kapila.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Ants Aasma
ants@cybertec.at
In reply to: Amit kapila (#13)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Sat, May 25, 2013 at 10:05 AM, Amit kapila <amit.kapila@huawei.com> wrote:

In a nutshell, the idea is to do copy everything that has changed
between the cluster, like rsync does, but instead of reading through all
files, use the WAL to determine what has changed. Here's a somewhat more
detailed explanation, from the README:

This is really a nice idea and an important requirement from many users.

+1

Does this tool handle all kind of operations user would have performaed after
forking of new cluster or it would mandate that user should not have performed certain kind of
operations on old cluster after new cluster forked off?

Truncate and all kinds of DROP come to mind, also table rewrites from
ALTER. The tool should probably just flag those relation files to be
copied wholesale.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#8)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 01:48:24PM -0400, Heikki Linnakangas wrote:

On 23.05.2013 08:03, Simon Riggs wrote:

On 23 May 2013 12:10, Heikki Linnakangas<hlinnakangas@vmware.com> wrote:

Please take a look: https://github.com/vmware/pg_rewind

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

We have a lot of code in PostgreSQL source tree with different
copyright notices, and there's no problem with that as long as the
coe is licensed under the PostgreSQL license. For patches that add

Really? Where? I think we have removed them all, as far as I know.
A quick grep shows:

$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner
./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
./src/port/getopt_long.c: * Portions Copyright (c) 2003

Can someone comment on the "Philip Warner" item? Would someone contact
him to clarify we can remove the mention? CC'ing him.

or modify code in PostgreSQL, we generally have copyright notices
with just PGDG, to avoid having a long list of copyright notices of
a lot of companies and individuals on every file. I'm no lawyer, but
I believe there's no difference from the legal point of view.

Probably, but some mentions can cause concern when our code is reviewed
by companies, so simplicity is good.

As a result, while it sounds interesting, people should be aware of
that and I suggest we shouldn't discuss that code on this list, to
avoid any disputes should we decide to include a similar facility in
core Postgres in the future.

That's just paranoia. There are a lot of tools out there on
pgfoundry, with various copyright holders and even difference
licenses, and it's fine to talk about all those on this list.
Besides, the code is licensed under the PostgreSQL license, so if
someone decides we should have this e.g in contrib, you can just
grab the sources and commit. Thirdly, there's no reason to refrain
from even discussing this, even if someone would include a similar
facility in core Postgres - this is about copyrights, not patents
(and yes, this contribution has been cleared by VMware legal
department; VMware doesn't hold any patents on this)

I think Simon has a good point, as VMWare has asserted patents on some
changes to their version of Postgres in the past, so if the copyright
mentions VMWare, we can't assume it is patent-free. Just the fact you
had to check with the VMware legal department verifies there is cause
for concern about things coming from VMWare. In fact, I am curious what
level of contribution requires a legal check, but I am not sure you can
even share that information.

Anyway, I would love to think we don't need to worry about this, but I
think we do --- not in this case, but in general. I acknowledge that
VMWare has been disciplined in share only patent-free information, at
the community's request.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andres Freund
andres@2ndquadrant.com
In reply to: Bruce Momjian (#15)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:

We have a lot of code in PostgreSQL source tree with different
copyright notices, and there's no problem with that as long as the
coe is licensed under the PostgreSQL license. For patches that add

Really? Where? I think we have removed them all, as far as I know.
A quick grep shows:

$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner
./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
./src/port/getopt_long.c: * Portions Copyright (c) 2003

Just remove the "Portions" part from your grep, and you will see quite
some more...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Bruce Momjian
bruce@momjian.us
In reply to: Andres Freund (#16)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:

On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:

We have a lot of code in PostgreSQL source tree with different
copyright notices, and there's no problem with that as long as the
coe is licensed under the PostgreSQL license. For patches that add

Really? Where? I think we have removed them all, as far as I know.
A quick grep shows:

$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner
./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
./src/port/getopt_long.c: * Portions Copyright (c) 2003

Just remove the "Portions" part from your grep, and you will see quite
some more...

Oh, I see. Have we historically been OK with these as long as it is
clear it is the PG copyright? I know we had do some cleanups in the
past, but I don't remember the details, obviously.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@2ndquadrant.com
In reply to: Bruce Momjian (#17)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 2013-05-28 14:50:57 -0400, Bruce Momjian wrote:

On Tue, May 28, 2013 at 08:37:44PM +0200, Andres Freund wrote:

On 2013-05-28 14:32:07 -0400, Bruce Momjian wrote:

We have a lot of code in PostgreSQL source tree with different
copyright notices, and there's no problem with that as long as the
coe is licensed under the PostgreSQL license. For patches that add

Really? Where? I think we have removed them all, as far as I know.
A quick grep shows:

$ grep -r 'Portions Copyright'|egrep -v 'Global|Regents'
./src/backend/regex/regexport.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/backend/regex/regprefix.c: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/regex/regexport.h: * Portions Copyright (c) 1998, 1999 Henry Spencer
./src/include/getopt_long.h: * Portions Copyright (c) 1987, 1993, 1994
./src/bin/pg_dump/pg_backup_directory.c: * Portions Copyright (c) 2000, Philip Warner
./src/port/getopt_long.c: * Portions Copyright (c) 1987, 1993, 1994
./src/port/getopt_long.c: * Portions Copyright (c) 2003

Just remove the "Portions" part from your grep, and you will see quite
some more...

Oh, I see. Have we historically been OK with these as long as it is
clear it is the PG copyright? I know we had do some cleanups in the
past, but I don't remember the details, obviously.

I don't see a problem with a different copyrights as long as the
licenses are compatible. I remember code getting (re-)moved because it
was GPL, which is a different thing to having a different copyright.

I don't have a all that wide look over the history though.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#17)
Re: pg_rewind, a tool for resynchronizing an old master after failover

Bruce Momjian wrote:

Oh, I see. Have we historically been OK with these as long as it is
clear it is the PG copyright? I know we had do some cleanups in the
past, but I don't remember the details, obviously.

We've had request from companies because they wanted to distribute
Postgres and lawyers weren't comfortable with copyright statements in
assorted files. In those cases we've asked the people mentioned in such
copyright statements, got approval to remove the offending copyright
lines, and removed them.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Christophe Pettus
xof@thebuild.com
In reply to: Alvaro Herrera (#19)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On May 28, 2013, at 12:49 PM, Alvaro Herrera wrote:

We've had request from companies because they wanted to distribute
Postgres and lawyers weren't comfortable with copyright statements in
assorted files. In those cases we've asked the people mentioned in such
copyright statements, got approval to remove the offending copyright
lines, and removed them.

I assume this topic has come up and been rejected for some reason, but just in case: The Django project requires an explicit agreement for contributions that end up in the main source tree for it, part of which is the acceptance of the Django license and copyright notice. (I don't have my copy right in front of me, but I don't think it's a full-on assignment of copyright.)

--
-- Christophe Pettus
xof@thebuild.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#19)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Tue, May 28, 2013 at 03:49:14PM -0400, Alvaro Herrera wrote:

Bruce Momjian wrote:

Oh, I see. Have we historically been OK with these as long as it is
clear it is the PG copyright? I know we had do some cleanups in the
past, but I don't remember the details, obviously.

We've had request from companies because they wanted to distribute
Postgres and lawyers weren't comfortable with copyright statements in
assorted files. In those cases we've asked the people mentioned in such
copyright statements, got approval to remove the offending copyright
lines, and removed them.

OK, so it was different _licenses_ that was the problem. OK.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 05/28/2013 11:32 AM, Bruce Momjian wrote:

I think Simon has a good point, as VMWare has asserted patents on some
changes to their version of Postgres in the past, so if the copyright

... which I'll point out that they *didn't* contribute, and which may
yet get resolved in a way that benefits the PostgreSQL community.

mentions VMWare, we can't assume it is patent-free. Just the fact you
had to check with the VMware legal department verifies there is cause
for concern about things coming from VMWare.

That seems rather like a catch-22 Bruce. If they don't check with the
legal department, it's dangerous, but if they do check, it's dangerous?

Presumably if they checked with the legal department, it's cleared. We
should be wary of stuff contributed by company employees who *didn't* check.

This particular tool seems highly unlikely to be legitimately
patentable, anyway. There's too much prior art.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23Joshua D. Drake
jd@commandprompt.com
In reply to: Josh Berkus (#22)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On 06/04/2013 01:55 PM, Josh Berkus wrote:

That seems rather like a catch-22 Bruce. If they don't check with the
legal department, it's dangerous, but if they do check, it's dangerous?

Presumably if they checked with the legal department, it's cleared. We
should be wary of stuff contributed by company employees who *didn't* check.

This particular tool seems highly unlikely to be legitimately
patentable, anyway. There's too much prior art.

"legitimately patentable" is a rather ethereal phrase in our industry
right now don't you think?

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#22)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Tue, Jun 4, 2013 at 01:55:27PM -0700, Josh Berkus wrote:

On 05/28/2013 11:32 AM, Bruce Momjian wrote:

I think Simon has a good point, as VMWare has asserted patents on some
changes to their version of Postgres in the past, so if the copyright

... which I'll point out that they *didn't* contribute, and which may
yet get resolved in a way that benefits the PostgreSQL community.

Right. Simon was just verifying that we were good with this new
feature, as it had a VMWare copyright and was on github --- totally
legitimate question.

mentions VMWare, we can't assume it is patent-free. Just the fact you
had to check with the VMware legal department verifies there is cause
for concern about things coming from VMWare.

That seems rather like a catch-22 Bruce. If they don't check with the
legal department, it's dangerous, but if they do check, it's dangerous?

Presumably if they checked with the legal department, it's cleared. We
should be wary of stuff contributed by company employees who *didn't* check.

My point is that there was no mention of a legal check in the original
posting, which prompted Simon to ask:

/messages/by-id/519DF910.4020609@vmware.com

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Greg Stark
stark@mit.edu
In reply to: Simon Riggs (#3)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Thu, May 23, 2013 at 1:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

Fwiw I was under the same misconception when I started at Google. But
this is wrong.

We have no copyright assignments to any entity named PGDG. All the
code is copyright the original authors. The PGDG is just a collective
noun for all the the people and organizations who have contributed to
Postgres. As long as all those people or organizations release the
code under the Postgres license then Postgres is ok with it. They
retain ownership of the copyright for the code they wrote but we don't
generally note it at that level of detail and just say everything is
owned by the PGDG.

I'm not a lawyer and I make no judgement on how solid a practice this
is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Bruce Momjian
bruce@momjian.us
In reply to: Greg Stark (#25)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Wed, Jun 5, 2013 at 10:12:17AM +0100, Greg Stark wrote:

On Thu, May 23, 2013 at 1:03 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

The COPYRIGHT file shows that VMware is claiming copyright on unstated
parts of the code for this. As such, its not a normal submission to
the PostgreSQL project, which involves placing copyright with the
PGDG.

Fwiw I was under the same misconception when I started at Google. But
this is wrong.

We have no copyright assignments to any entity named PGDG. All the
code is copyright the original authors. The PGDG is just a collective
noun for all the the people and organizations who have contributed to
Postgres. As long as all those people or organizations release the
code under the Postgres license then Postgres is ok with it. They
retain ownership of the copyright for the code they wrote but we don't
generally note it at that level of detail and just say everything is
owned by the PGDG.

I'm not a lawyer and I make no judgement on how solid a practice this
is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

Yes, completely true, and I was not clear on that myself either.
Several people pointed out similar user copyrights in our existing code,
which I then realized were not a problem. As long as the copyright
details are the same as our code, anyone can hold the copyright, I
think.

Part of my concern was patents. Because VMWare asserts patents on
Postgres enhancements, when I saw VMWare copyright code, my "concern"
antenna went up and was glad to find it had all be handled by Heikki
already.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#26)
Re: pg_rewind, a tool for resynchronizing an old master after failover

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Jun 5, 2013 at 10:12:17AM +0100, Greg Stark wrote:

I'm not a lawyer and I make no judgement on how solid a practice this
is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

Yes, completely true, and I was not clear on that myself either.
Several people pointed out similar user copyrights in our existing code,
which I then realized were not a problem. As long as the copyright
details are the same as our code, anyone can hold the copyright, I
think.

You're both being quite sloppy about the difference between "copyright"
and "license". The point is correct though: what we care about is that
everybody releases their work under the same *license terms*. As long
as that's the case, we don't care terribly much exactly who holds
copyright on which parts of the code. (In this analysis, "PGDG" is
basically a shorthand for "everybody who's ever contributed anything".)

Part of my concern was patents. Because VMWare asserts patents on
Postgres enhancements, when I saw VMWare copyright code, my "concern"
antenna went up and was glad to find it had all be handled by Heikki
already.

Yes, patents are a different and much nastier can of worms.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#1)
Re: pg_rewind, a tool for resynchronizing an old master after failover

I'm not a lawyer and I make no judgement on how solid a practice this

is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

Many other projects also take this approach: Linux Kernel, Drizzle, etc.
There's some legal advantages, as well as disadvantages, in having the
copyright rest with the original contributors. Mostly, it prevents
relicensing of the whole project.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Dave Page
dpage@pgadmin.org
In reply to: Josh Berkus (#28)
Re: pg_rewind, a tool for resynchronizing an old master after failover

On Wed, Jun 5, 2013 at 6:10 PM, Josh Berkus <josh@agliodbs.com> wrote:

I'm not a lawyer and I make no judgement on how solid a practice this

is but that's VMware doesn't seem to be doing anything special here.
They can retain copyright ownership of their contributions as long as
they're happy releasing it under the Postgres copyright. Ideally they
wold also be happy with a copyright notice that includes all of the
PGDG just to reduce the maintenance headache.

Many other projects also take this approach: Linux Kernel, Drizzle, etc.
There's some legal advantages, as well as disadvantages, in having the
copyright rest with the original contributors. Mostly, it prevents
relicensing of the whole project.

No it doesn't - it just makes it a pain in the arse (I know, I've done it).

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers