snapbuild woes

Started by Petr Jelinekover 9 years ago92 messageshackers

petr@2ndquadrant.com

over 9 years ago

Hi,

I recently found couple of issues with the way initial logical decoding
snapshot is made.

First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged. This in
turn means that the snapbuild will never make snapshot if the situation
occurs. The reason why this didn't bite us much yet is that the
snapbuild will consider all transactions finished if the empty
xl_running_xacts comes which usually happens after a while (but might
not on a consistently busy server).

The fix I came up with this is to extend the mechanism to also consider
all transactions finished when xl_running_xacts which has xmin bigger
then that previous xmax comes. This seems to work pretty well on busy
servers. This fix is attached as
0001-Mark-snapshot-consistent-when-all-running-txes-have.patch. I
believe it should be backpatched all the way to 9.4.

The other issue is performance problem again on busy servers with
initial snapshot. We track transactions for catalog modifications so
that we can do proper catalog time travel for decoding of changes. But
for transactions that were running while we started trying to get
initial consistent snapshot, there is no good way to tell if they did
catalog changes or not so we consider them all as catalog changing. We
make separate historical snapshot for every such transaction. This by
itself is fine, the problem is that current implementation also
considers all the transactions that started after we started watching
for changes but before we reached consistent state to also do catalog
changes even though there we actually do know if they did any catalog
change or not. In practice it means we make snapshots that are not
really necessary and if there was long running transaction for which the
snapshot builder has to wait for then we can create thousands of unused
snapshots which affects performance in bad ways (I've seen the initial
snapshot taking hour because of this).

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Petr Jelinek (#1)

Re: snapbuild woes

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Craig Ringer

craig@2ndquadrant.com

over 9 years ago

In reply to: Petr Jelinek (#2)

Re: snapbuild woes

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 9 years ago

In reply to: Craig Ringer (#3)

Re: snapbuild woes

On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

I see an attachment, and it shows in the archives.

/messages/by-id/aee1d499-e3ca-e091-56da-1ee6a47741c8@2ndquadrant.com

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Craig Ringer

craig@2ndquadrant.com

over 9 years ago

In reply to: Kevin Grittner (#4)

Re: snapbuild woes

On 12 December 2016 at 00:36, Kevin Grittner <kgrittn@gmail.com> wrote:

On Sun, Dec 11, 2016 at 1:17 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:

On 11 Dec. 2016 06:50, "Petr Jelinek" <petr.jelinek@2ndquadrant.com> wrote:

On 10/12/16 23:10, Petr Jelinek wrote:

The attached 0002-Skip-unnecessary-snapshot-builds.patch changes this
behavior so that we don't make snapshots for transactions that we seen
wholly and know that they didn't make catalog changes even if we didn't
reach consistency yet.

Eh, attached wrong patch. This one is correct.

Attached no patch second time?

I see an attachment, and it shows in the archives.

/messages/by-id/aee1d499-e3ca-e091-56da-1ee6a47741c8@2ndquadrant.com

Sorry for the noise, apparently my phone's mail client was being dense.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Petr Jelinek (#1)

Re: snapbuild woes

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Andres Freund (#6)

Re: snapbuild woes

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Petr Jelinek (#7)

Re: snapbuild woes

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Andres Freund (#8)

Re: snapbuild woes

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Also if we did the approach suggested by my patch (ie using this
xmin/xmax comparison) I guess we wouldn't need to hold the lock for
extra time in wal_level logical anymore.

That is of course unless you think it should be approached from the
other side of the stream and try log correct xl_running_xacts.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Petr Jelinek (#9)

Re: snapbuild woes

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

I also identified another bug in snapbuild while looking at the code.
That is the logical decoding will try to use on disk serialized snapshot
for initial snapshot export when it can. The problem is that these
snapshots are quite special and are not really usable as snapshots for
data (ie, the logical decoding snapshots regularly have xmax smaller
than xmin). So then when client tries to use this exported snapshot it
gets completely wrong data as the snapshot is broken. I think this is
explanation for Erik Rijker's problems with the initial COPY patch for
logical replication. At least for me the issues go away when I disable
use of the ondisk snapshots.

I didn't really find better solution than that though (disabling the use
of ondisk snapshots for initial consistent snapshot).

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

Thoughts or better ideas?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#11

Erik Rijkers

er@xs4all.nl

over 9 years ago

In reply to: Petr Jelinek (#10)

Re: snapbuild woes

On 2017-02-22 03:05, Petr Jelinek wrote:

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of
extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some
comments
that need updating, this is more of a PoC.

I am not not entirely sure what to expect. Should a server with these 3
patches do initial data copy or not? The sgml seems to imply there is
not inital data copy. But my test does copy something.

Anyway, I have repeated the same old pgbench-test, assuming inital data
copy should be working.

With

0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch

the consistent (but wrong) end state is always that only one of the four
pgbench tables, pgbench_history, is replicated (always correctly).

Below is the output from the test (I've edited the lines for email)
(below, a,b,t,h stand for: pgbench_accounts, pgbench_branches,
pgbench_tellers, pgbench_history)
(master on port 6972, replica on port 6973.)

port
6972 a,b,t,h: 100000 1 10 347
6973 a,b,t,h: 0 0 0 347

a,b,t,h: a68efc81a 2c27f7ba5 128590a57 1e4070879 master
a,b,t,h: d41d8cd98 d41d8cd98 d41d8cd98 1e4070879 replica NOK

The md5-initstrings are from a md5 of the whole content of each table
(an ordered select *)

I repeated this a few times: of course, the number of rows in
pgbench_history varies a bit but otherwise it is always the same: 3
empty replica tables, pgbench_history replicated correctly.

Something is not right.

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Erik Rijkers (#11)

Re: snapbuild woes

On 22/02/17 11:29, Erik Rijkers wrote:

On 2017-02-22 03:05, Petr Jelinek wrote:

So to summarize attached patches:
0001 - Fixes performance issue where we build tons of snapshots that we
don't need which kills CPU.

0002 - Disables the use of ondisk historical snapshots for initial
consistent snapshot export as it may result in corrupt data. This
definitely needs backport.

0003 - Fixes bug where we might never reach snapshot on busy server due
to race condition in xl_running_xacts logging. The original use of extra
locking does not seem to be enough in practice. Once we have agreed fix
for this it's probably worth backpatching. There are still some comments
that need updating, this is more of a PoC.

I am not not entirely sure what to expect. Should a server with these 3
patches do initial data copy or not? The sgml seems to imply there is
not inital data copy. But my test does copy something.

Not by itself (without the copy patch), those fixes are for snapshots.

With

0001-Skip-unnecessary-snapshot-builds.patch
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
0003-Fix-xl_running_xacts-usage-in-snapshot-builder.patch

the consistent (but wrong) end state is always that only one of the four
pgbench tables, pgbench_history, is replicated (always correctly).

Below is the output from the test (I've edited the lines for email)
(below, a,b,t,h stand for: pgbench_accounts, pgbench_branches,
pgbench_tellers, pgbench_history)
(master on port 6972, replica on port 6973.)

port
6972 a,b,t,h: 100000 1 10 347
6973 a,b,t,h: 0 0 0 347

a,b,t,h: a68efc81a 2c27f7ba5 128590a57 1e4070879 master
a,b,t,h: d41d8cd98 d41d8cd98 d41d8cd98 1e4070879 replica NOK

The md5-initstrings are from a md5 of the whole content of each table
(an ordered select *)

I repeated this a few times: of course, the number of rows in
pgbench_history varies a bit but otherwise it is always the same: 3
empty replica tables, pgbench_history replicated correctly.

That's actually correct behaviour without the initial copy patch, it
replicates changes, but since the 3 tables only get updates there is
nothing to replicate as there is no data downstream. However inserts
will of course work fine even without data downstream.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Petr Jelinek (#10)

Re: snapbuild woes

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Attached are updated versions of patches:

0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

0003 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0004 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

The 0001 and 0002 are bug fixes because without them the exported
snapshots are basically corrupted. The 0003 and 0004 are performance
improvements, but on busy servers the snapshot export might never happen
so it's for rather serious performance issues.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#14

Petr Jelinek

petr@2ndquadrant.com

over 9 years ago

In reply to: Petr Jelinek (#13)

Re: snapbuild woes

On 24/02/17 22:56, Petr Jelinek wrote:

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Aaand I found one more bug in snapbuild. Apparently we don't protect the
snapshot builder xmin from going backwards which can yet again result in
corrupted exported snapshot.

Summary of attached patches:
0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0003 - Makes sure snapshot builder xmin is not moved backwards by
xl_running_xacts (which can otherwise happen during initial snapshot
building). Also should be backported to 9.4.

0004 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0005 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

I did some improvements to the other patches as well so they are not the
same as in previous post, hence the version bump.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#15

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Petr Jelinek (#14)

Re: snapbuild woes

On 26/02/17 01:43, Petr Jelinek wrote:

On 24/02/17 22:56, Petr Jelinek wrote:

On 22/02/17 03:05, Petr Jelinek wrote:

On 13/12/16 00:38, Petr Jelinek wrote:

On 12/12/16 23:33, Andres Freund wrote:

On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:

On 12/12/16 22:42, Andres Freund wrote:

Hi,

On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:

Hi,
First one is outright bug, which has to do with how we track running
transactions. What snapbuild basically does while doing initial snapshot
is read the xl_running_xacts record, store the list of running txes and
then wait until they all finish. The problem with this is that
xl_running_xacts does not ensure that it only logs transactions that are
actually still running (to avoid locking PGPROC) so there might be xids
in xl_running_xacts that already committed before it was logged.

I don't think that's actually true? Notice how LogStandbySnapshot()
only releases the lock *after* the LogCurrentRunningXacts() iff
wal_level >= WAL_LEVEL_LOGICAL. So the explanation for the problem you
observed must actually be a bit more complex :(

Hmm, interesting, I did see the transaction commit in the WAL before the
xl_running_xacts that contained the xid as running. I only seen it on
production system though, didn't really manage to easily reproduce it
locally.

I suspect the reason for that is that RecordTransactionCommit() doesn't
conflict with ProcArrayLock in the first place - only
ProcArrayEndTransaction() does. So they're still running in the PGPROC
sense, just not the crash-recovery sense...

That looks like reasonable explanation. BTW I realized my patch needs
bit more work, currently it will break the actual snapshot as it behaves
same as if the xl_running_xacts was empty which is not correct AFAICS.

Hi,

I got to work on this again. Unfortunately I haven't found solution that
I would be very happy with. What I did is in case we read
xl_running_xacts which has all transactions we track finished, we start
tracking from that new xl_running_xacts again with the difference that
we clean up the running transactions based on previously seen committed
ones. That means that on busy server we may wait for multiple
xl_running_xacts rather than just one, but at least we have chance to
finish unlike with current coding which basically waits for empty
xl_running_xacts. I also removed the additional locking for logical
wal_level in LogStandbySnapshot() since it does not work.

Not hearing any opposition to this idea so I decided to polish this and
also optimize it a bit.

That being said, thanks to testing from Erik Rijkers I've identified one
more bug in how we do the initial snapshot. Apparently we don't reserve
the global xmin when we start building the initial exported snapshot for
a slot (we only reserver catalog_xmin which is fine for logical decoding
but not for the exported snapshot) so the VACUUM and heap pruning will
happily delete old versions of rows that are still needed by anybody
trying to use that exported snapshot.

Aaand I found one more bug in snapbuild. Apparently we don't protect the
snapshot builder xmin from going backwards which can yet again result in
corrupted exported snapshot.

Summary of attached patches:
0001 - Fixes the above mentioned global xmin tracking issues. Needs to
be backported all the way to 9.4

0002 - Removes use of the logical decoding saved snapshots for initial
exported snapshot since those snapshots only work for catalogs and not
user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

0003 - Makes sure snapshot builder xmin is not moved backwards by
xl_running_xacts (which can otherwise happen during initial snapshot
building). Also should be backported to 9.4.

0004 - Changes handling of the xl_running_xacts in initial snapshot
build to what I wrote above and removes the extra locking from
LogStandbySnapshot introduced by logical decoding.

0005 - Improves performance of initial snapshot building by skipping
catalog snapshot build for transactions that don't do catalog changes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#16

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Petr Jelinek (#15)

Re: snapbuild woes

On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch
snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
non-static declaration
static int oid_cmp(const void *p1, const void *p2);
^~~~~~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration
of ‘oid_cmp’ was here
extern int oid_cmp(const void *p1, const void *p2);
^~~~~~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Erik Rijkers (#16)

Re: snapbuild woes

On 03/03/17 01:53, Erik Rijkers wrote:

On 2017-03-03 01:30, Petr Jelinek wrote:

With these patches:

0001-Use-asynchronous-connect-API-in-libpqwalreceiver.patch
0002-Fix-after-trigger-execution-in-logical-replication.patch
0003-Add-RENAME-support-for-PUBLICATIONs-and-SUBSCRIPTION.patch
snapbuild-v5-0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch
snapbuild-v5-0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch

snapbuild-v5-0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch
snapbuild-v5-0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch
snapbuild-v5-0005-Skip-unnecessary-snapshot-builds.patch
0001-Logical-replication-support-for-initial-data-copy-v6.patch

I get:

subscriptioncmds.c:47:12: error: static declaration of ‘oid_cmp’ follows
non-static declaration
static int oid_cmp(const void *p1, const void *p2);
^~~~~~~
In file included from subscriptioncmds.c:42:0:
../../../src/include/utils/builtins.h:70:12: note: previous declaration
of ‘oid_cmp’ was here
extern int oid_cmp(const void *p1, const void *p2);
^~~~~~~
make[3]: *** [subscriptioncmds.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [commands-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Yes the copy patch needs rebase as well. But these ones are fine.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

David Steele

david@pgmasters.net

about 9 years ago

In reply to: Petr Jelinek (#17)

Re: snapbuild woes

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: David Steele (#18)

Re: snapbuild woes

On 2017-04-08 09:51:39 -0400, David Steele wrote:

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Andres Freund (#19)

Re: snapbuild woes

On 2017-04-08 15:56, Andres Freund wrote:

On 2017-04-08 09:51:39 -0400, David Steele wrote:

On 3/2/17 7:54 PM, Petr Jelinek wrote:

Yes the copy patch needs rebase as well. But these ones are fine.

This bug has been moved to CF 2017-07.

FWIW, as these are bug-fixes that need to be backpatched, I do plan to
work on them soon.

CF 2017-07 pertains to postgres 11, is that right?

But I hope you mean to commit these snapbuild patches before the
postgres 10 release? As far as I know, logical replication is still
very broken without them (or at least some of that set of 5 patches - I
don't know which ones are essential and which may not be).

If it's at all useful I can repeat tests to show how often current
master still fails (easily 50% or so failure-rate).

This would be the pgbench-over-logical-replication test that I did so
often earlier on.

thanks,

Erik Rijkers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Erik Rijkers (#20)

#22

David Steele

david@pgmasters.net

about 9 years ago

In reply to: Erik Rijkers (#20)

#23

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Andres Freund (#21)

#24

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Noah Misch (#23)

#25

Simon Riggs

simon@2ndQuadrant.com

about 9 years ago

In reply to: Petr Jelinek (#15)

#26

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Peter Eisentraut (#24)

#27

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#15)

#28

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Andres Freund (#26)

#29

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Noah Misch (#28)

#30

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#27)

#31

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#29)

#32

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Petr Jelinek (#30)

#33

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Noah Misch (#28)

#34

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Noah Misch (#33)

#35

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#34)

#36

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#35)

#37

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#36)

#38

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#34)

#39

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#32)

#40

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#39)

#41

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Andres Freund (#38)

#42

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Noah Misch (#41)

#43

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Andres Freund (#37)

#44

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#32)

#45

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#44)

#46

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#45)

#47

Craig Ringer

craig@2ndquadrant.com

about 9 years ago

In reply to: Petr Jelinek (#46)

#48

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Craig Ringer (#47)

#49

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#46)

#50

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#49)

#51

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Noah Misch (#43)

#52

Andrew Dunstan

andrew@dunslane.net

about 9 years ago

In reply to: Tom Lane (#51)

#53

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Tom Lane (#51)

#54

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Andres Freund (#53)

#55

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Tom Lane (#54)

#56

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Andres Freund (#55)

#57

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#50)

#58

Peter Eisentraut

peter_e@gmx.net

about 9 years ago

In reply to: Tom Lane (#56)

#59

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Tom Lane (#54)

#60

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#57)

#61

Noah Misch

noah@leadboat.com

about 9 years ago

In reply to: Andres Freund (#42)

#62

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Noah Misch (#61)

#63

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#60)

#64

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#63)

#65

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#63)

#66

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#64)

#67

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#66)

#68

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#67)

#69

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Noah Misch (#61)

#70

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Andres Freund (#63)

#71

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Erik Rijkers (#70)

#72

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Petr Jelinek (#71)

#73

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Erik Rijkers (#72)

#74

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Petr Jelinek (#73)

#75

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Petr Jelinek (#73)

#76

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Erik Rijkers (#75)

#77

Erik Rijkers

er@xs4all.nl

about 9 years ago

In reply to: Petr Jelinek (#76)

#78

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Erik Rijkers (#77)

#79

Stas Kelvich

s.kelvich@postgrespro.ru

about 9 years ago

In reply to: Petr Jelinek (#78)

#80

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Stas Kelvich (#79)

#81

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#69)

#82

Peter Geoghegan

pg@bowt.ie

about 9 years ago

In reply to: Andres Freund (#81)

#83

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#81)

#84

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#32)

#85

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Andres Freund (#83)

#86

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#85)

#87

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Petr Jelinek (#86)

#88

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#86)

#89

Petr Jelinek

petr@2ndquadrant.com

about 9 years ago

In reply to: Andres Freund (#88)

#90

Andres Freund

andres@anarazel.de

about 9 years ago

In reply to: Petr Jelinek (#89)

#91

Antonin Houska

ah@cybertec.at

almost 9 years ago

In reply to: Andres Freund (#39)

#92

Andres Freund

andres@anarazel.de

almost 9 years ago

In reply to: Antonin Houska (#91)

snapbuild woes

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: