snapbuild woes

Started by Petr Jelinekover 9 years ago92 messageshackers
Jump to latest
#1Petr Jelinek
petr@2ndquadrant.com

Hi,

I recently found couple of issues with the way initial logical decoding
snapshot is made.

First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged. This in
turn means that the snapbuild will never make snapshot if the situation
occurs. The reason why this didn't bite us much yet is that the
snapbuild will consider all transactions finished if the empty
xl_running_xacts comes which usually happens after a while (but might
not on a consistently busy server).

The fix I came up with this is to extend the mechanism to also consider
all transactions finished when xl_running_xacts which has xmin bigger
then that previous xmax comes. This seems to work pretty well on busy
servers. This fix is attached as
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch. I
believe it should be backpatched all the way to 9.4.

The other issue is performance problem again on busy servers with
initial snapshot. We track transactions for catalog modifications so
that we can do proper catalog time travel for decoding of changes. But
for transactions that were running while we started trying to get
initial consistent snapshot, there is no good way to tell if they did
catalog changes or not so we consider them all as catalog changing. We
make separate historical snapshot for every such transaction. This by
itself is fine, the problem is that current implementation also
considers all the transactions that started after we started watching
for changes but before we reached consistent state to also do catalog
changes even though there we actually do know if they did any catalog
change or not. In practice it means we make snapshots that are not
really necessary and if there was long running transaction for which the
snapshot builder has to wait for then we can create thousands of unused
snapshots which affects performance in bad ways (I've seen the initial
snapshot taking hour because of this).

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Mark-snapshot-consistent-when-all-running-txes-have.patchapplication/x-patch; name=0001-Mark-snapshot-consistent-when-all-running-txes-have.patchDownload+21-9
0002-Skip-unnecessary-snapshot-builds.patchapplication/x-patch; name=0002-Skip-unnecessary-snapshot-builds.patchDownload+28-11
#2Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#1)
Re: snapbuild woes

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0002-Skip-unnecessary-snapshot-builds.patchapplication/x-patch; name=0002-Skip-unnecessary-snapshot-builds.patchDownload+53-30
#3Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#2)
Re: snapbuild woes

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Craig Ringer (#3)
Re: snapbuild woes

On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

I see an attachment, and it shows in the archives.

/messages/by-id/aee1d499-e3ca-e091-56da-1ee6a47741c8@2ndquadrant.com

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Kevin Grittner (#4)
Re: snapbuild woes

On 12 December 2016 at 00:36, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

I see an attachment, and it shows in the archives.

/messages/by-id/aee1d499-e3ca-e091-56da-1ee6a47741c8@2ndquadrant.com

Sorry for the noise, apparently my phone's mail client was being dense.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#1)
Re: snapbuild woes

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#6)
Re: snapbuild woes

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#7)
Re: snapbuild woes

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#8)
Re: snapbuild woes

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Also if we did the approach suggested by my patch (ie using this
xmin/xmax comparison) I guess we wouldn't need to hold the lock for
extra time in wal_level logical anymore.

That is of course unless you think it should be approached from the
other side of the stream and try log correct xl_running_xacts.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#9)
Re: snapbuild woes

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

I also identified another bug in snapbuild while looking at the code.
That is the logical decoding will try to use on disk serialized snapshot
for initial snapshot export when it can. The problem is that these
snapshots are quite special and are not really usable as snapshots for
data (ie, the logical decoding snapshots regularly have xmax smaller
than xmin). So then when client tries to use this exported snapshot it
gets completely wrong data as the snapshot is broken. I think this is
explanation for Erik Rijker's problems with the initial COPY patch for
logical replication. At least for me the issues go away when I disable
use of the ondisk snapshots.

I didn't really find better solution than that though (disabling the use
of ondisk snapshots for initial consistent snapshot).

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

Thoughts or better ideas?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0001-Skip-unnecessary-snapshot-builds.patchtext/x-patch; name=0001-Skip-unnecessary-snapshot-builds.patchDownload+53-30
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchtext/x-patch; name=0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchDownload+2-13
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patchtext/x-patch; name=0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patchDownload+64-23
#11Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#10)
Re: snapbuild woes

On 2017-02-22 03:05, Petr Jelinek wrote:

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of
extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some
comments
that need updating, this is more of a PoC.

I am not not entirely sure what to expect. Should a server with these 3
patches do initial data copy or not? The sgml seems to imply there is
not inital data copy. But my test does copy something.

Anyway, I have repeated the same old pgbench-test, assuming inital data
copy should be working.

With

0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch

the consistent (but wrong) end state is always that only one of the four
pgbench tables, pgbench_history, is replicated (always correctly).

Below is the output from the test (I've edited the lines for email)
(below, a,b,t,h stand for: pgbench_accounts, pgbench_branches,
pgbench_tellers, pgbench_history)
(master on port 6972, replica on port 6973.)

port
6972 a,b,t,h: 100000 1 10 347
6973 a,b,t,h: 0 0 0 347

a,b,t,h: a68efc81a 2c27f7ba5 128590a57 1e4070879 master
a,b,t,h: d41d8cd98 d41d8cd98 d41d8cd98 1e4070879 replica NOK

The md5-initstrings are from a md5 of the whole content of each table
(an ordered select *)

I repeated this a few times: of course, the number of rows in
pgbench_history varies a bit but otherwise it is always the same: 3
empty replica tables, pgbench_history replicated correctly.

Something is not right.

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#11)
Re: snapbuild woes

On 22/02/17 11:29, Erik Rijkers wrote:

On 2017-02-22 03:05, Petr Jelinek wrote:

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

I am not not entirely sure what to expect. Should a server with these 3
patches do initial data copy or not? The sgml seems to imply there is
not inital data copy. But my test does copy something.

Not by itself (without the copy patch), those fixes are for snapshots.

With

0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch

the consistent (but wrong) end state is always that only one of the four
pgbench tables, pgbench_history, is replicated (always correctly).

Below is the output from the test (I've edited the lines for email)
(below, a,b,t,h stand for: pgbench_accounts, pgbench_branches,
pgbench_tellers, pgbench_history)
(master on port 6972, replica on port 6973.)

port
6972 a,b,t,h: 100000 1 10 347
6973 a,b,t,h: 0 0 0 347

a,b,t,h: a68efc81a 2c27f7ba5 128590a57 1e4070879 master
a,b,t,h: d41d8cd98 d41d8cd98 d41d8cd98 1e4070879 replica NOK

The md5-initstrings are from a md5 of the whole content of each table
(an ordered select *)

I repeated this a few times: of course, the number of rows in
pgbench_history varies a bit but otherwise it is always the same: 3
empty replica tables, pgbench_history replicated correctly.

That's actually correct behaviour without the initial copy patch, it
replicates changes, but since the 3 tables only get updates there is
nothing to replicate as there is no data downstream. However inserts
will of course work fine even without data downstream.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#10)
Re: snapbuild woes

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Attached are updated versions of patches:

0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

0003 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0004 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

The 0001 and 0002 are bug fixes because without them the exported
snapshots are basically corrupted. The 0003 and 0004 are performance
improvements, but on busy servers the snapshot export might never happen
so it's for rather serious performance issues.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchtext/x-patch; name=snapbuild-v3-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchDownload+27-5
snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchtext/x-patch; name=snapbuild-v3-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchDownload+2-13
snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patchtext/x-patch; name=snapbuild-v3-0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patchDownload+57-33
snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patchtext/x-patch; name=snapbuild-v3-0004-Skip-unnecessary-snapshot-builds.patchDownload+53-30
#14Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#13)
Re: snapbuild woes

On 24/02/17 22:56, Petr Jelinek wrote:

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Aaand I found one more bug in snapbuild. Apparently we don't protect the
snapshot builder xmin from going backwards which can yet again result in
corrupted exported snapshot.

Summary of attached patches:
0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

0003 - Makes sure snapshot builder xmin is not moved backwards by
xl_running_xacts (which can otherwise happen during initial snapshot
building). Also should be backported to 9.4.

0004 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0005 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

I did some improvements to the other patches as well so they are not the
same as in previous post, hence the version bump.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

snapbuild-v4-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchtext/x-patch; name=snapbuild-v4-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchDownload+20-2
snapbuild-v4-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchtext/x-patch; name=snapbuild-v4-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchDownload+2-13
snapbuild-v4-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patchtext/x-patch; name=snapbuild-v4-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patchDownload+2-2
snapbuild-v4-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patchtext/x-patch; name=snapbuild-v4-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patchDownload+63-33
snapbuild-v4-0005-Skip-unnecessary-snapshot-builds.patchtext/x-patch; name=snapbuild-v4-0005-Skip-unnecessary-snapshot-builds.patchDownload+53-30
#15Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#14)
Re: snapbuild woes

On 26/02/17 01:43, Petr Jelinek wrote:

On 24/02/17 22:56, Petr Jelinek wrote:

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Aaand I found one more bug in snapbuild. Apparently we don't protect the
snapshot builder xmin from going backwards which can yet again result in
corrupted exported snapshot.

Summary of attached patches:
0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

0003 - Makes sure snapshot builder xmin is not moved backwards by
xl_running_xacts (which can otherwise happen during initial snapshot
building). Also should be backported to 9.4.

0004 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0005 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchtext/x-patch; name=snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patchDownload+20-2
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchtext/x-patch; name=snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patchDownload+7-6
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patchtext/x-patch; name=snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patchDownload+2-2
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patchtext/x-patch; name=snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patchDownload+63-33
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patchtext/x-patch; name=snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patchDownload+53-30
#16Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#15)
Re: snapbuild woes

On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
non-static declaration
static int oid_cmp(const void *p1, const void *p2);
^~~~~~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration
of ‘oid_cmp’ was here
extern int oid_cmp(const void *p1, const void *p2);
^~~~~~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#16)
Re: snapbuild woes

On 03/03/17 01:53, Erik Rijkers wrote:

On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch

snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
non-static declaration
static int oid_cmp(const void *p1, const void *p2);
^~~~~~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration
of ‘oid_cmp’ was here
extern int oid_cmp(const void *p1, const void *p2);
^~~~~~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Yes the copy patch needs rebase as well. But these ones are fine.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18David Steele
david@pgmasters.net
In reply to: Petr Jelinek (#17)
Re: snapbuild woes

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@anarazel.de
In reply to: David Steele (#18)
Re: snapbuild woes

On 2017-04-08 09:51:39 -0400, David Steele wrote:

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Erik Rijkers
er@xs4all.nl
In reply to: Andres Freund (#19)
Re: snapbuild woes

On 2017-04-08 15:56, Andres Freund wrote:

On 2017-04-08 09:51:39 -0400, David Steele wrote:

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.

CF 2017-07 pertains to postgres 11, is that right?

But I hope you mean to commit these snapbuild patches before the
postgres 10 release? As far as I know, logical replication is still
very broken without them (or at least some of that set of 5 patches - I
don't know which ones are essential and which may not be).

If it's at all useful I can repeat tests to show how often current
master still fails (easily 50% or so failure-rate).

This would be the pgbench-over-logical-replication test that I did so
often earlier on.

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andres Freund
andres@anarazel.de
In reply to: Erik Rijkers (#20)
#22David Steele
david@pgmasters.net
In reply to: Erik Rijkers (#20)
#23Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#21)
#24Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Petr Jelinek (#15)
#26Andres Freund
andres@anarazel.de
In reply to: Peter Eisentraut (#24)
#27Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#15)
#28Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#26)
#29Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#28)
#30Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#27)
#31Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#29)
#32Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#30)
#33Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#28)
#34Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#33)
#35Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#34)
#36Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#35)
#37Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#34)
#39Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#32)
#40Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#39)
#41Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#38)
#42Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#41)
#43Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#37)
#44Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#32)
#45Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#44)
#46Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#45)
#47Craig Ringer
craig@2ndquadrant.com
In reply to: Petr Jelinek (#46)
#48Petr Jelinek
petr@2ndquadrant.com
In reply to: Craig Ringer (#47)
#49Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#46)
#50Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#49)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noah Misch (#43)
#52Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#51)
#53Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#51)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#53)
#55Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#55)
#57Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#50)
#58Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#56)
#59Noah Misch
noah@leadboat.com
In reply to: Tom Lane (#54)
#60Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#57)
#61Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#42)
#62Petr Jelinek
petr@2ndquadrant.com
In reply to: Noah Misch (#61)
#63Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#60)
#64Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#63)
#65Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#63)
#66Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#64)
#67Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#66)
#68Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#67)
#69Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#61)
#70Erik Rijkers
er@xs4all.nl
In reply to: Andres Freund (#63)
#71Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#70)
#72Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#71)
#73Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#72)
#74Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#73)
#75Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#73)
#76Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#75)
#77Erik Rijkers
er@xs4all.nl
In reply to: Petr Jelinek (#76)
#78Petr Jelinek
petr@2ndquadrant.com
In reply to: Erik Rijkers (#77)
#79Stas Kelvich
s.kelvich@postgrespro.ru
In reply to: Petr Jelinek (#78)
#80Petr Jelinek
petr@2ndquadrant.com
In reply to: Stas Kelvich (#79)
#81Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#69)
In reply to: Andres Freund (#81)
#83Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#81)
#84Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#32)
#85Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#83)
#86Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#85)
#87Petr Jelinek
petr@2ndquadrant.com
In reply to: Petr Jelinek (#86)
#88Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#86)
#89Petr Jelinek
petr@2ndquadrant.com
In reply to: Andres Freund (#88)
#90Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#89)
#91Antonin Houska
ah@cybertec.at
In reply to: Andres Freund (#39)
#92Andres Freund
andres@anarazel.de
In reply to: Antonin Houska (#91)