logical replication busy-waiting on a lock
When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:
(log_lock_waits is on, of course)
deadlock_timeout is set to 1s, so I don't know why it seems to be running
several times per millisecond.
47462 idle in transaction 2017-05-26 16:05:20.505 PDT LOG: logical
decoding found initial starting point at 1B/7BAC9D50
47462 idle in transaction 2017-05-26 16:05:20.505 PDT DETAIL: Waiting for
transactions (approximately 9) older than 73326615 to end.
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.060 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.398 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.574 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.816 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.180 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.284 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.493 ms
.....
And so on out to "after 9616.814", when it finally acquires the lock.
The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.
Cheers,
Jeff
On 27/05/17 01:25, Jeff Janes wrote:
When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:(log_lock_waits is on, of course)
deadlock_timeout is set to 1s, so I don't know why it seems to be
running several times per millisecond......
And so on out to "after 9616.814", when it finally acquires the lock.
The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.
We lock wait for running transactions in snapshot builder while the
snapshot is being built so I guess that's what you are seeing. I am not
quite sure why the snapshot builder would hold the xid lock for
prolonged period of time though, the XactLockTableWait releases the lock
immediately after acquiring it. In fact AFAICS everything that acquires
ShareLock on xid releases it immediately after acquiring as it's only
used for waits.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27/05/17 15:44, Petr Jelinek wrote:
On 27/05/17 01:25, Jeff Janes wrote:
When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:(log_lock_waits is on, of course)
deadlock_timeout is set to 1s, so I don't know why it seems to be
running several times per millisecond......
And so on out to "after 9616.814", when it finally acquires the lock.
The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.We lock wait for running transactions in snapshot builder while the
snapshot is being built so I guess that's what you are seeing. I am not
quite sure why the snapshot builder would hold the xid lock for
prolonged period of time though, the XactLockTableWait releases the lock
immediately after acquiring it. In fact AFAICS everything that acquires
ShareLock on xid releases it immediately after acquiring as it's only
used for waits.
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).
[1]: /messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com
/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid for some crosscheck, and that's what we're waiting for. Could you check what happens if you don't assign one and just content the error checks out? Not at my computer, just theorizing.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, May 27, 2017 at 6:48 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com>
wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1].
Related, but not the same. It would be nice if they didn't block, but if
they do have to block, shouldn't it wait on a semaphore, rather than doing
a tight loop? It looks like maybe a latch didn't get reset when it should
have or something.
[1]:
/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3V
g_0xmuc%3DUpBrZ-_MUxh-Q%40mail.gmail.com
Cheers,
Jeff
On 2017-05-29 11:38:20 -0700, Jeff Janes wrote:
Related, but not the same. It would be nice if they didn't block, but if
they do have to block, shouldn't it wait on a semaphore, rather than doing
a tight loop? It looks like maybe a latch didn't get reset when it should
have or something.
The code certainly is trying to just block using a lock (on the xid of
the running xact), there shouldn't be any busy looping going on...
There's no latch involved, so something is certainly weird here.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).Hm. I suspect the issue is that the exported snapshot needs an xid for some crosscheck, and that's what we're waiting for. Could you check what happens if you don't assign one and just content the error checks out? Not at my computer, just theorizing.
I don't think that's it, in my opinion it's the parallelization of table
data copy where we create snapshot for one process but then the next one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does not
seem safe.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thingMasahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPYis
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.
Read-only txs have no xid ...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29/05/17 20:59, Andres Freund wrote:
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thingMasahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPYis
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.Read-only txs have no xid ...
That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 29, 2017 12:21:50 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 29/05/17 20:59, Andres Freund wrote:
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is
actually
running the xid 73322726. In that case that's the same thing
Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is along
transaction running when the snapshot is being created (and the
COPY
is
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content theerror
checks out? Not at my computer, just theorizing.
I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot soit
would work fine (except the snapshot was corrupted of course). I
wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.Read-only txs have no xid ...
That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.
That's precisely what I pointed out a few emails above, and what I suggest changing.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29/05/17 21:21, Petr Jelinek wrote:
On 29/05/17 20:59, Andres Freund wrote:
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thingMasahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPYis
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.Read-only txs have no xid ...
That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.
Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29/05/17 21:23, Andres Freund wrote:
On May 29, 2017 12:21:50 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 29/05/17 20:59, Andres Freund wrote:
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is
actually
running the xid 73322726. In that case that's the same thing
Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is along
transaction running when the snapshot is being created (and the
COPY
is
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content theerror
checks out? Not at my computer, just theorizing.
I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot soit
would work fine (except the snapshot was corrupted of course). I
wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.Read-only txs have no xid ...
That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.That's precisely what I pointed out a few emails above, and what I suggest changing.
Ah didn't realize that's what you meant. I can try playing with it.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 29, 2017 12:25:35 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 29/05/17 21:21, Petr Jelinek wrote:
On 29/05/17 20:59, Andres Freund wrote:
On May 29, 2017 11:58:05 AM PDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
On 27/05/17 17:17, Andres Freund wrote:
On May 27, 2017 9:48:22 AM EDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Actually, I guess it's the pid 47457 (COPY process) who is
actually
running the xid 73322726. In that case that's the same thing
Masahiko
Sawada reported [1]. Which basically is result of snapshot
builder
waiting for transaction to finish, that's normal if there is a
long
transaction running when the snapshot is being created (and the
COPY
is
a long transaction).
Hm. I suspect the issue is that the exported snapshot needs an
xid
for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content theerror
checks out? Not at my computer, just theorizing.
I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then thenext
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot soit
would work fine (except the snapshot was corrupted of course). I
wonder
if we could somehow give it a hint to ignore the read-only txes,
but
then we have no way to enforce the txes to stay read-only so it
does
not
seem safe.Read-only txs have no xid ...
That's what I mean by hinting, normally they don't but building
initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.
Not at a computer, but by memory that'll trigger the snapshot export routine to include it. Import in turn requires the xid to check if the source is still alive. But there's better ways, e.g. using the virtual xactid.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29/05/17 21:28, Andres Freund wrote:
On May 29, 2017 12:25:35 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.Not at a computer, but by memory that'll trigger the snapshot export routine to include it. Import in turn requires the xid to check if the source is still alive. But there's better ways, e.g. using the virtual xactid.
It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where the
snapshot is just installed into current transaction but not exported. So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet, so
it might be good enough solution for PG10.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On May 29, 2017 12:41:26 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 29/05/17 21:28, Andres Freund wrote:
On May 29, 2017 12:25:35 PM PDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with thatparameter,
I wonder if it's really needed then.
Not at a computer, but by memory that'll trigger the snapshot export
routine to include it. Import in turn requires the xid to check if the
source is still alive. But there's better ways, e.g. using the virtual
xactid.It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where
the
snapshot is just installed into current transaction but not exported.
So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet,
so
it might be good enough solution for PG10.
In the general case you can't do so, because of vacuum and such. Even for LR we need to make sure the exporting session didn't die badly, deleting the slot. Hence suggestion to use the virtual xid.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 29/05/17 21:44, Andres Freund wrote:
On May 29, 2017 12:41:26 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:
On 29/05/17 21:28, Andres Freund wrote:
On May 29, 2017 12:25:35 PM PDT, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with thatparameter,
I wonder if it's really needed then.
Not at a computer, but by memory that'll trigger the snapshot export
routine to include it. Import in turn requires the xid to check if the
source is still alive. But there's better ways, e.g. using the virtual
xactid.It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where
the
snapshot is just installed into current transaction but not exported.
So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet,
so
it might be good enough solution for PG10.In the general case you can't do so, because of vacuum and such. Even for LR we need to make sure the exporting session didn't die badly, deleting the slot. Hence suggestion to use the virtual xid.
I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.
The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.
The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.
It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31/05/17 09:24, Andres Freund wrote:
On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.
Okay, thanks for explanation. Code-wise it does seem simple enough
indeed. I admit I don't know enough about the exported snapshots and
snapshot management in general to be able to answer the question of
safety here. That said, it does seem to me like it should work as the
exported snapshots are just on disk representation of in-memory state
that becomes invalid once the in-memory state does.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 31/05/17 11:21, Petr Jelinek wrote:
On 31/05/17 09:24, Andres Freund wrote:
On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:
I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.Okay, thanks for explanation. Code-wise it does seem simple enough
indeed. I admit I don't know enough about the exported snapshots and
snapshot management in general to be able to answer the question of
safety here. That said, it does seem to me like it should work as the
exported snapshots are just on disk representation of in-memory state
that becomes invalid once the in-memory state does.
Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:
Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.
I'm not quite convinced by this argument. Exported snapshot contents
are ephemeral, we can change the format at any time. The wait is fairly
annoying for every user of logical decoding. For me the combination of
those two fact implies that we should rather fix this properly.
- Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers