logical replication busy-waiting on a lock

Started by Jeff Janesover 8 years ago32 messages
#1Jeff Janes
jeff.janes@gmail.com

When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:

(log_lock_waits is on, of course)

deadlock_timeout is set to 1s, so I don't know why it seems to be running
several times per millisecond.

47462 idle in transaction 2017-05-26 16:05:20.505 PDT LOG: logical
decoding found initial starting point at 1B/7BAC9D50
47462 idle in transaction 2017-05-26 16:05:20.505 PDT DETAIL: Waiting for
transactions (approximately 9) older than 73326615 to end.
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.060 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.505 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.398 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.574 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1000.816 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.180 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.506 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.284 ms
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT DETAIL:
Process holding the lock: 47457. Wait queue: 47462.
47462 idle in transaction waiting 2017-05-26 16:05:21.507 PDT LOG: process
47462 still waiting for ShareLock on transaction 73322726 after 1001.493 ms
.....

And so on out to "after 9616.814", when it finally acquires the lock.

The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.

Cheers,

Jeff

#2Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Jeff Janes (#1)
Re: logical replication busy-waiting on a lock

On 27/05/17 01:25, Jeff Janes wrote:

When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:

(log_lock_waits is on, of course)

deadlock_timeout is set to 1s, so I don't know why it seems to be
running several times per millisecond.

.....

And so on out to "after 9616.814", when it finally acquires the lock.

The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.

We lock wait for running transactions in snapshot builder while the
snapshot is being built so I guess that's what you are seeing. I am not
quite sure why the snapshot builder would hold the xid lock for
prolonged period of time though, the XactLockTableWait releases the lock
immediately after acquiring it. In fact AFAICS everything that acquires
ShareLock on xid releases it immediately after acquiring as it's only
used for waits.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Petr Jelinek (#2)
Re: logical replication busy-waiting on a lock

On 27/05/17 15:44, Petr Jelinek wrote:

On 27/05/17 01:25, Jeff Janes wrote:

When I create a subscription in the disabled state, and then later doing
"alter subscription sub enable;", on the master I sometimes get a tight
loop of the deadlock detector:

(log_lock_waits is on, of course)

deadlock_timeout is set to 1s, so I don't know why it seems to be
running several times per millisecond.

.....

And so on out to "after 9616.814", when it finally acquires the lock.

The other process, 47457, is doing the initial COPY of another table as
part of the same publisher/subscriber set.

We lock wait for running transactions in snapshot builder while the
snapshot is being built so I guess that's what you are seeing. I am not
quite sure why the snapshot builder would hold the xid lock for
prolonged period of time though, the XactLockTableWait releases the lock
immediately after acquiring it. In fact AFAICS everything that acquires
ShareLock on xid releases it immediately after acquiring as it's only
used for waits.

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).

[1]: /messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com
/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#3)
Re: logical replication busy-waiting on a lock

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid for some crosscheck, and that's what we're waiting for. Could you check what happens if you don't assign one and just content the error checks out? Not at my computer, just theorizing.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Petr Jelinek (#3)
Re: logical replication busy-waiting on a lock

On Sat, May 27, 2017 at 6:48 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com>
wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1].

Related, but not the same. It would be nice if they didn't block, but if
they do have to block, shouldn't it wait on a semaphore, rather than doing
a tight loop? It looks like maybe a latch didn't get reset when it should
have or something.

[1]:

/messages/by-id/CAD21AoC2KJdavS7MFffmSsRc1dn3V
g_0xmuc%3DUpBrZ-_MUxh-Q%40mail.gmail.com

Cheers,

Jeff

#6Andres Freund
andres@anarazel.de
In reply to: Jeff Janes (#5)
Re: logical replication busy-waiting on a lock

On 2017-05-29 11:38:20 -0700, Jeff Janes wrote:

Related, but not the same. It would be nice if they didn't block, but if
they do have to block, shouldn't it wait on a semaphore, rather than doing
a tight loop? It looks like maybe a latch didn't get reset when it should
have or something.

The code certainly is trying to just block using a lock (on the xid of
the running xact), there shouldn't be any busy looping going on...
There's no latch involved, so something is certainly weird here.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#4)
Re: logical replication busy-waiting on a lock

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing Masahiko
Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY is
a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid for some crosscheck, and that's what we're waiting for. Could you check what happens if you don't assign one and just content the error checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of table
data copy where we create snapshot for one process but then the next one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does not
seem safe.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#7)
Re: logical replication busy-waiting on a lock

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.

Read-only txs have no xid ...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#8)
Re: logical replication busy-waiting on a lock

On 29/05/17 20:59, Andres Freund wrote:

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.

Read-only txs have no xid ...

That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#9)
Re: logical replication busy-waiting on a lock

On May 29, 2017 12:21:50 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 29/05/17 20:59, Andres Freund wrote:

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is

actually

running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a

long

transaction running when the snapshot is being created (and the

COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the

error

checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so

it

would work fine (except the snapshot was corrupted of course). I

wonder

if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.

Read-only txs have no xid ...

That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

That's precisely what I pointed out a few emails above, and what I suggest changing.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Petr Jelinek (#9)
Re: logical replication busy-waiting on a lock

On 29/05/17 21:21, Petr Jelinek wrote:

On 29/05/17 20:59, Andres Freund wrote:

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is actually
running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a long
transaction running when the snapshot is being created (and the COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the error
checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so it
would work fine (except the snapshot was corrupted of course). I wonder
if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.

Read-only txs have no xid ...

That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#10)
Re: logical replication busy-waiting on a lock

On 29/05/17 21:23, Andres Freund wrote:

On May 29, 2017 12:21:50 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 29/05/17 20:59, Andres Freund wrote:

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is

actually

running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot builder
waiting for transaction to finish, that's normal if there is a

long

transaction running when the snapshot is being created (and the

COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the

error

checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the next
one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so

it

would work fine (except the snapshot was corrupted of course). I

wonder

if we could somehow give it a hint to ignore the read-only txes, but
then we have no way to enforce the txes to stay read-only so it does
not
seem safe.

Read-only txs have no xid ...

That's what I mean by hinting, normally they don't but building initial
snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

That's precisely what I pointed out a few emails above, and what I suggest changing.

Ah didn't realize that's what you meant. I can try playing with it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#11)
Re: logical replication busy-waiting on a lock

On May 29, 2017 12:25:35 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 29/05/17 21:21, Petr Jelinek wrote:

On 29/05/17 20:59, Andres Freund wrote:

On May 29, 2017 11:58:05 AM PDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

On 27/05/17 17:17, Andres Freund wrote:

On May 27, 2017 9:48:22 AM EDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Actually, I guess it's the pid 47457 (COPY process) who is

actually

running the xid 73322726. In that case that's the same thing

Masahiko

Sawada reported [1]. Which basically is result of snapshot

builder

waiting for transaction to finish, that's normal if there is a

long

transaction running when the snapshot is being created (and the

COPY

is

a long transaction).

Hm. I suspect the issue is that the exported snapshot needs an

xid

for some crosscheck, and that's what we're waiting for. Could you
check what happens if you don't assign one and just content the

error

checks out? Not at my computer, just theorizing.

I don't think that's it, in my opinion it's the parallelization of
table
data copy where we create snapshot for one process but then the

next

one
has to wait for the first one to finish. Before we fixed the
snapshotting, the second one would just use the ondisk snapshot so

it

would work fine (except the snapshot was corrupted of course). I

wonder

if we could somehow give it a hint to ignore the read-only txes,

but

then we have no way to enforce the txes to stay read-only so it

does

not
seem safe.

Read-only txs have no xid ...

That's what I mean by hinting, normally they don't but building

initial

snapshot in snapshot builder calls GetTopTransactionId() (see
SnapBuildInitialSnapshot()) which will assign it xid.

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.

Not at a computer, but by memory that'll trigger the snapshot export routine to include it. Import in turn requires the xid to check if the source is still alive. But there's better ways, e.g. using the virtual xactid.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#13)
Re: logical replication busy-waiting on a lock

On 29/05/17 21:28, Andres Freund wrote:

On May 29, 2017 12:25:35 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that parameter,
I wonder if it's really needed then.

Not at a computer, but by memory that'll trigger the snapshot export routine to include it. Import in turn requires the xid to check if the source is still alive. But there's better ways, e.g. using the virtual xactid.

It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where the
snapshot is just installed into current transaction but not exported. So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet, so
it might be good enough solution for PG10.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#14)
Re: logical replication busy-waiting on a lock

On May 29, 2017 12:41:26 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 29/05/17 21:28, Andres Freund wrote:

On May 29, 2017 12:25:35 PM PDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that

parameter,

I wonder if it's really needed then.

Not at a computer, but by memory that'll trigger the snapshot export

routine to include it. Import in turn requires the xid to check if the
source is still alive. But there's better ways, e.g. using the virtual
xactid.

It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where
the
snapshot is just installed into current transaction but not exported.
So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet,
so
it might be good enough solution for PG10.

In the general case you can't do so, because of vacuum and such. Even for LR we need to make sure the exporting session didn't die badly, deleting the slot. Hence suggestion to use the virtual xid.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#15)
Re: logical replication busy-waiting on a lock

On 29/05/17 21:44, Andres Freund wrote:

On May 29, 2017 12:41:26 PM PDT, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote:

On 29/05/17 21:28, Andres Freund wrote:

On May 29, 2017 12:25:35 PM PDT, Petr Jelinek

<petr.jelinek@2ndquadrant.com> wrote:

Looking at the code more, the xid is only used as parameter for
SnapBuildBuildSnapshot() which never does anything with that

parameter,

I wonder if it's really needed then.

Not at a computer, but by memory that'll trigger the snapshot export

routine to include it. Import in turn requires the xid to check if the
source is still alive. But there's better ways, e.g. using the virtual
xactid.

It does, and while that's unfortunate the logical replication does not
actually export the snapshots. It uses the USE_SNAPSHOT option where
the
snapshot is just installed into current transaction but not exported.
So
not calling the GetTopTransactionId() would solve it at least for that
in-core use-case. I don't see any bad side effects from doing so yet,
so
it might be good enough solution for PG10.

In the general case you can't do so, because of vacuum and such. Even for LR we need to make sure the exporting session didn't die badly, deleting the slot. Hence suggestion to use the virtual xid.

I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#16)
Re: logical replication busy-waiting on a lock

On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:

I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.

The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.

The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.

It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#17)
Re: logical replication busy-waiting on a lock

On 31/05/17 09:24, Andres Freund wrote:

On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:

I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.

The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.

The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.

It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.

Okay, thanks for explanation. Code-wise it does seem simple enough
indeed. I admit I don't know enough about the exported snapshots and
snapshot management in general to be able to answer the question of
safety here. That said, it does seem to me like it should work as the
exported snapshots are just on disk representation of in-memory state
that becomes invalid once the in-memory state does.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Petr Jelinek (#18)
Re: logical replication busy-waiting on a lock

On 31/05/17 11:21, Petr Jelinek wrote:

On 31/05/17 09:24, Andres Freund wrote:

On 2017-05-29 23:49:33 +0200, Petr Jelinek wrote:

I am not quite sure I understand (both the vxid suggestion and for the
session dying badly). Maybe we can discuss bit more when you get to
computer so it's easier for you to expand on what you mean.

The xid interlock when exporting a snapshot is required because
otherwise it's not generally guaranteed that all resourced required for
the snapshot are reserved. In the logical replication case we could
guarantee that otherwise, but there'd be weird-ish edge cases when
erroring out just after exporting a snapshot.

The problem with using the xid as that interlock is that it requires
acquiring an xid - which is something we're going to block against when
building a new catalog snapshot. Afaict that's not entirely required -
all that we need to verify is that the snapshot in the source
transaction is still running. The easiest way for the importer to check
that the source is still alive seems to be export the virtual
transaction id instead of the xid. Normally we can't store things like
virtual xids on disk, but that concern isn't valid here because exported
snapshots are ephemeral, there's also already a precedent in
predicate.c.

It seems like it'd be fairly easy to change things around that way, but
maybe I'm missing something.

Okay, thanks for explanation. Code-wise it does seem simple enough
indeed. I admit I don't know enough about the exported snapshots and
snapshot management in general to be able to answer the question of
safety here. That said, it does seem to me like it should work as the
exported snapshots are just on disk representation of in-memory state
that becomes invalid once the in-memory state does.

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#19)
Re: logical replication busy-waiting on a lock

On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

I'm not quite convinced by this argument. Exported snapshot contents
are ephemeral, we can change the format at any time. The wait is fairly
annoying for every user of logical decoding. For me the combination of
those two fact implies that we should rather fix this properly.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#20)
Re: logical replication busy-waiting on a lock

On Thu, Jun 1, 2017 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

I'm not quite convinced by this argument. Exported snapshot contents
are ephemeral, we can change the format at any time. The wait is fairly
annoying for every user of logical decoding. For me the combination of
those two fact implies that we should rather fix this properly.

+1. The change Andres is proposing doesn't sound like it will be
terribly high-impact, and I think we'll be happier in the long run if
we install real fixes instead of kludging it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Robert Haas (#21)
2 attachment(s)
Re: logical replication busy-waiting on a lock

On 02/06/17 03:38, Robert Haas wrote:

On Thu, Jun 1, 2017 at 2:28 PM, Andres Freund <andres@anarazel.de> wrote:

On 2017-06-01 14:17:44 +0200, Petr Jelinek wrote:

Thinking more about this, I am not convinced it's a good idea to change
exports this late in the cycle. I still think it's best to do the xid
assignment only when the snapshot is actually exported but don't assign
xid when the export is only used by the local session (the new option in
PG10). That's one line change which impacts only logical
replication/decoding as opposed to everything else which uses exported
snapshots.

I'm not quite convinced by this argument. Exported snapshot contents
are ephemeral, we can change the format at any time. The wait is fairly
annoying for every user of logical decoding. For me the combination of
those two fact implies that we should rather fix this properly.

+1. The change Andres is proposing doesn't sound like it will be
terribly high-impact, and I think we'll be happier in the long run if
we install real fixes instead of kludging it.

All right, here is my rough attempt on doing what Andres has suggested,
going straight from xid to vxid, seems to work fine in my tests and
parallel pg_dump seems happy with it as well.

The second patch removes the xid parameter from SnapBuildBuildSnapshot
as it's not used for anything and skips the GetTopTransactionId() call
as well. That should solve the original problem reported here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchinvalid/octet-stream; name=0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchDownload
From aee3b4be0faf4ccedb9a8ff6991428547ad1ad38 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:07:49 +0200
Subject: [PATCH 2/2] Don't assign xid to logical decoding snapshots

---
 src/backend/replication/logical/snapbuild.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8848f5b..e06aa09 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -262,7 +262,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -463,7 +463,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+SnapBuildBuildSnapshot(SnapBuild *builder)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -562,7 +562,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+	snap = SnapBuildBuildSnapshot(builder);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -679,7 +679,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder, TransactionId xid)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -743,7 +743,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+			builder->snapshot = SnapBuildBuildSnapshot(builder);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1061,7 +1061,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1831,7 +1831,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+	builder->snapshot = SnapBuildBuildSnapshot(builder);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-- 
2.7.4

0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchinvalid/octet-stream; name=0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchDownload
From 4f7eb5b8fdbab2539731414e81d7a7419a83e5b1 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:06:22 +0200
Subject: [PATCH 1/2] Use virtual transaction instead of normal ones in
 exported snapshots

---
 doc/src/sgml/ref/set_transaction.sgml |  8 ++---
 src/backend/storage/ipc/procarray.c   | 11 +++---
 src/backend/storage/lmgr/predicate.c  | 22 ++++++------
 src/backend/utils/time/snapmgr.c      | 68 ++++++++++++++++++++++++-----------
 src/include/storage/predicate.h       |  3 +-
 src/include/storage/procarray.h       |  2 +-
 6 files changed, 72 insertions(+), 42 deletions(-)

diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml
index ca55a5b..116315f 100644
--- a/doc/src/sgml/ref/set_transaction.sgml
+++ b/doc/src/sgml/ref/set_transaction.sgml
@@ -221,9 +221,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 SELECT pg_export_snapshot();
- pg_export_snapshot
---------------------
- 000003A1-1
+ pg_export_snapshot  
+---------------------
+ 00000003-0000001B-1
 (1 row)
 </programlisting>
 
@@ -233,7 +233,7 @@ SELECT pg_export_snapshot();
 
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-SET TRANSACTION SNAPSHOT '000003A1-1';
+SET TRANSACTION SNAPSHOT '00000003-0000001B-1';
 </programlisting></para>
  </refsect1>
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8a71536..c9ae5bf 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1793,14 +1793,15 @@ GetSnapshotData(Snapshot snapshot)
  * Returns TRUE if successful, FALSE if source xact is no longer running.
  */
 bool
-ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
+ProcArrayInstallImportedXmin(TransactionId xmin,
+							 VirtualTransactionId *sourcevxid)
 {
 	bool		result = false;
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
 	Assert(TransactionIdIsNormal(xmin));
-	if (!TransactionIdIsNormal(sourcexid))
+	if (!sourcevxid || !VirtualTransactionIdIsValid(*sourcevxid))
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
@@ -1817,8 +1818,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
 			continue;
 
-		xid = pgxact->xid;		/* fetch just once */
-		if (xid != sourcexid)
+		/* We are only interested in the specific virtual transaction. */
+		if (proc->backendId != sourcevxid->backendId)
+			continue;
+		if (proc->lxid != sourcevxid->localTransactionId)
 			continue;
 
 		/*
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 27c4af9..64acd6e 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -148,7 +148,7 @@
  * predicate lock maintenance
  *		GetSerializableTransactionSnapshot(Snapshot snapshot)
  *		SetSerializableTransactionSnapshot(Snapshot snapshot,
- *										   TransactionId sourcexid)
+ *										   VirtualTransactionId *sourcevxid)
  *		RegisterPredicateLockingXid(void)
  *		PredicateLockRelation(Relation relation, Snapshot snapshot)
  *		PredicateLockPage(Relation relation, BlockNumber blkno,
@@ -434,7 +434,7 @@ static uint32 predicatelock_hash(const void *key, Size keysize);
 static void SummarizeOldestCommittedSxact(void);
 static Snapshot GetSafeSnapshot(Snapshot snapshot);
 static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid);
+									  VirtualTransactionId *sourcevxid);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
 static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 						  PREDICATELOCKTARGETTAG *parent);
@@ -1510,7 +1510,7 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		 * one passed to it, but we avoid assuming that here.
 		 */
 		snapshot = GetSerializableTransactionSnapshotInt(origSnapshot,
-													   InvalidTransactionId);
+													   NULL);
 
 		if (MySerializableXact == InvalidSerializableXact)
 			return snapshot;	/* no concurrent r/w xacts; it's safe */
@@ -1643,7 +1643,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
 		return GetSafeSnapshot(snapshot);
 
 	return GetSerializableTransactionSnapshotInt(snapshot,
-												 InvalidTransactionId);
+												 NULL);
 }
 
 /*
@@ -1658,7 +1658,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
  */
 void
 SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid)
+								   VirtualTransactionId *sourcevxid)
 {
 	Assert(IsolationIsSerializable());
 
@@ -1673,7 +1673,7 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("a snapshot-importing transaction must not be READ ONLY DEFERRABLE")));
 
-	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcexid);
+	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcevxid);
 }
 
 /*
@@ -1687,7 +1687,7 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
  */
 static Snapshot
 GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid)
+									  VirtualTransactionId *sourcevxid)
 {
 	PGPROC	   *proc;
 	VirtualTransactionId vxid;
@@ -1741,17 +1741,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	} while (!sxact);
 
 	/* Get the snapshot, or check that it's safe to use */
-	if (!TransactionIdIsValid(sourcexid))
+	if (!sourcevxid)
 		snapshot = GetSnapshotData(snapshot);
-	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
 	{
 		ReleasePredXact(sxact);
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+	   errdetail("The source virtual transaction %d/%u is not running anymore.",
+					 sourcevxid->backendId, sourcevxid->localTransactionId)));
 	}
 
 	/*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b3d4fe3..b8e1933 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -211,9 +211,9 @@ static Snapshot FirstXactSnapshot = NULL;
 
 /* Define pathname of exported-snapshot files */
 #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
-#define XactExportFilePath(path, xid, num, suffix) \
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d%s", \
-			 xid, num, suffix)
+#define XactExportFilePath(path, backenId, lxid, num, suffix) \
+	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d%s", \
+			 backenId, lxid, num, suffix)
 
 /* Current xact's exported snapshots (a list of Snapshot structs) */
 static List *exportedSnapshots = NIL;
@@ -558,7 +558,7 @@ SnapshotSetCommandId(CommandId curcid)
  * in GetTransactionSnapshot.
  */
 static void
-SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
+SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 					   PGPROC *sourceproc)
 {
 	/* Caller should have checked this already */
@@ -617,12 +617,12 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 					 errmsg("could not import the requested snapshot"),
 			   errdetail("The source transaction is not running anymore.")));
 	}
-	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcevxid))
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+	   errdetail("The source virtual transaction %d/%u is not running anymore.",
+					 sourcevxid->backendId, sourcevxid->localTransactionId)));
 
 	/*
 	 * In transaction-snapshot mode, the first snapshot must live until end of
@@ -632,7 +632,7 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 	if (IsolationUsesXactSnapshot())
 	{
 		if (IsolationIsSerializable())
-			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcexid);
+			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid);
 		/* Make a saved copy */
 		CurrentSnapshot = CopySnapshot(CurrentSnapshot);
 		FirstXactSnapshot = CurrentSnapshot;
@@ -1075,7 +1075,6 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 */
 	if (exportedSnapshots != NIL)
 	{
-		TransactionId myxid = GetTopTransactionId();
 		int			i;
 		char		buf[MAXPGPATH];
 		ListCell   *lc;
@@ -1087,7 +1086,7 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 		 */
 		for (i = 1; i <= list_length(exportedSnapshots); i++)
 		{
-			XactExportFilePath(buf, myxid, i, "");
+			XactExportFilePath(buf, MyProc->backendId, MyProc->lxid, i, "");
 			if (unlink(buf))
 				elog(WARNING, "could not unlink file \"%s\": %m", buf);
 		}
@@ -1183,9 +1182,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * This will assign a transaction ID if we do not yet have one.
+	 * Get our transaction ID if there is one, to include in the snapshot.
 	 */
-	topXid = GetTopTransactionId();
+	topXid = GetTopTransactionIdIfAny();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1226,7 +1225,7 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "xid:%u\n", topXid);
+	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
 	appendStringInfo(&buf, "ro:%d\n", XactReadOnly);
@@ -1245,7 +1244,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * xmax.  (We need not make the same check for subxip[] members, see
 	 * snapshot.h.)
 	 */
-	addTopXid = TransactionIdPrecedes(topXid, snapshot->xmax) ? 1 : 0;
+	addTopXid = (TransactionIdIsValid(topXid) &&
+		TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
 	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
 	for (i = 0; i < snapshot->xcnt; i++)
 		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
@@ -1276,7 +1276,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * ensures that no other backend can read an incomplete file
 	 * (ImportSnapshot won't allow it because of its valid-characters check).
 	 */
-	XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), ".tmp");
+	XactExportFilePath(pathtmp, MyProc->backendId, MyProc->lxid,
+					   list_length(exportedSnapshots), ".tmp");
 	if (!(f = AllocateFile(pathtmp, PG_BINARY_W)))
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -1298,7 +1299,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * Now that we have written everything into a .tmp file, rename the file
 	 * to remove the .tmp suffix.
 	 */
-	XactExportFilePath(path, topXid, list_length(exportedSnapshots), "");
+	XactExportFilePath(path, MyProc->backendId, MyProc->lxid,
+					   list_length(exportedSnapshots), "");
 
 	if (rename(pathtmp, path) < 0)
 		ereport(ERROR,
@@ -1384,6 +1386,30 @@ parseXidFromText(const char *prefix, char **s, const char *filename)
 	return val;
 }
 
+static void
+parseVxidFromText(const char *prefix, char **s, const char *filename,
+				  VirtualTransactionId *vxid)
+{
+	char	   *ptr = *s;
+	int			prefixlen = strlen(prefix);
+
+	if (strncmp(ptr, prefix, prefixlen) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr += prefixlen;
+	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr = strchr(ptr, '\n');
+	if (!ptr)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	*s = ptr + 1;
+}
+
 /*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
@@ -1399,7 +1425,7 @@ ImportSnapshot(const char *idstr)
 	char	   *filebuf;
 	int			xcnt;
 	int			i;
-	TransactionId src_xid;
+	VirtualTransactionId src_vxid;
 	Oid			src_dbid;
 	int			src_isolevel;
 	bool		src_readonly;
@@ -1463,7 +1489,7 @@ ImportSnapshot(const char *idstr)
 	 */
 	memset(&snapshot, 0, sizeof(snapshot));
 
-	src_xid = parseXidFromText("xid:", &filebuf, path);
+	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
 	/* we abuse parseXidFromText a bit here ... */
 	src_dbid = parseXidFromText("dbid:", &filebuf, path);
 	src_isolevel = parseIntFromText("iso:", &filebuf, path);
@@ -1513,7 +1539,7 @@ ImportSnapshot(const char *idstr)
 	 * don't trouble to check the array elements, just the most critical
 	 * fields.
 	 */
-	if (!TransactionIdIsNormal(src_xid) ||
+	if (!VirtualTransactionIdIsValid(src_vxid) ||
 		!OidIsValid(src_dbid) ||
 		!TransactionIdIsNormal(snapshot.xmin) ||
 		!TransactionIdIsNormal(snapshot.xmax))
@@ -1554,7 +1580,7 @@ ImportSnapshot(const char *idstr)
 			  errmsg("cannot import a snapshot from a different database")));
 
 	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, src_xid, NULL);
+	SetTransactionSnapshot(&snapshot, &src_vxid, NULL);
 }
 
 /*
@@ -2141,5 +2167,5 @@ RestoreSnapshot(char *start_address)
 void
 RestoreTransactionSnapshot(Snapshot snapshot, void *master_pgproc)
 {
-	SetTransactionSnapshot(snapshot, InvalidTransactionId, master_pgproc);
+	SetTransactionSnapshot(snapshot, NULL, master_pgproc);
 }
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 8f9ea29..671d953 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -14,6 +14,7 @@
 #ifndef PREDICATE_H
 #define PREDICATE_H
 
+#include "storage/lock.h"
 #include "utils/relcache.h"
 #include "utils/snapshot.h"
 
@@ -46,7 +47,7 @@ extern bool PageIsPredicateLocked(Relation relation, BlockNumber blkno);
 /* predicate lock maintenance */
 extern Snapshot GetSerializableTransactionSnapshot(Snapshot snapshot);
 extern void SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid);
+								   VirtualTransactionId *sourcevxid);
 extern void RegisterPredicateLockingXid(TransactionId xid);
 extern void PredicateLockRelation(Relation relation, Snapshot snapshot);
 extern void PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 22955a7..5cf8ff7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -82,7 +82,7 @@ extern int	GetMaxSnapshotSubxidCount(void);
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
 extern bool ProcArrayInstallImportedXmin(TransactionId xmin,
-							 TransactionId sourcexid);
+							 VirtualTransactionId *sourcevxid);
 extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
-- 
2.7.4

#23Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#22)
Re: logical replication busy-waiting on a lock

Hi,

On 2017-06-03 03:03:20 +0200, Petr Jelinek wrote:

All right, here is my rough attempt on doing what Andres has suggested,
going straight from xid to vxid, seems to work fine in my tests and
parallel pg_dump seems happy with it as well.

Cool!

The second patch removes the xid parameter from SnapBuildBuildSnapshot
as it's not used for anything and skips the GetTopTransactionId() call
as well.

Makes sense.

That should solve the original problem reported here.

Did you verify that?

From 4f7eb5b8fdbab2539731414e81d7a7419a83e5b1 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:06:22 +0200
Subject: [PATCH 1/2] Use virtual transaction instead of normal ones in
exported snapshots

@@ -1817,8 +1818,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
if (pgxact->vacuumFlags & PROC_IN_VACUUM)
continue;

- xid = pgxact->xid; /* fetch just once */

Hm, I don't quite understand what the 'fetch just once' bit was trying
to achieve here (not to speak of the fact that it's simply not
guaranteed this way). Therefore I think you're right to simply
disregard it.

@@ -1741,17 +1741,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
} while (!sxact);

/* Get the snapshot, or check that it's safe to use */
-	if (!TransactionIdIsValid(sourcexid))
+	if (!sourcevxid)
snapshot = GetSnapshotData(snapshot);
-	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
{
ReleasePredXact(sxact);
LWLockRelease(SerializableXactHashLock);
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+	   errdetail("The source virtual transaction %d/%u is not running anymore.",
+					 sourcevxid->backendId, sourcevxid->localTransactionId)));

Hm, this is a harder to read. Wonder if we should add a pid field, that'd
make it a bit easier to interpret?

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#23)
2 attachment(s)
Re: logical replication busy-waiting on a lock

On 03/06/17 03:25, Andres Freund wrote:

That should solve the original problem reported here.

Did you verify that?

Yes, I tried to manually create multiple exported logical decoding
snapshots in parallel and read data from them and it worked fine, while
it blocked before.

@@ -1741,17 +1741,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
} while (!sxact);

/* Get the snapshot, or check that it's safe to use */
-	if (!TransactionIdIsValid(sourcexid))
+	if (!sourcevxid)
snapshot = GetSnapshotData(snapshot);
-	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
{
ReleasePredXact(sxact);
LWLockRelease(SerializableXactHashLock);
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+	   errdetail("The source virtual transaction %d/%u is not running anymore.",
+					 sourcevxid->backendId, sourcevxid->localTransactionId)));

Hm, this is a harder to read. Wonder if we should add a pid field, that'd
make it a bit easier to interpret?

Agreed, see attached. We have to pass the pid around a bit but I don't
think it's too bad.

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchinvalid/octet-stream; name=0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchDownload
From b4ad784fcb9042a509ee5ede7f53a24f7f6cac88 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:07:49 +0200
Subject: [PATCH 2/2] Don't assign xid to logical decoding snapshots

---
 src/backend/replication/logical/snapbuild.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8848f5b..e06aa09 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -262,7 +262,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -463,7 +463,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+SnapBuildBuildSnapshot(SnapBuild *builder)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -562,7 +562,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+	snap = SnapBuildBuildSnapshot(builder);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -679,7 +679,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder, TransactionId xid)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -743,7 +743,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+			builder->snapshot = SnapBuildBuildSnapshot(builder);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1061,7 +1061,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1831,7 +1831,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+	builder->snapshot = SnapBuildBuildSnapshot(builder);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-- 
2.7.4

0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchinvalid/octet-stream; name=0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchDownload
From 37c90cc2563cf1517ac6d917a632ba4beb32bc8c Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:06:22 +0200
Subject: [PATCH 1/2] Use virtual transaction instead of normal ones in
 exported snapshots

---
 doc/src/sgml/ref/set_transaction.sgml |  8 ++--
 src/backend/storage/ipc/procarray.c   | 11 +++--
 src/backend/storage/ipc/sinvaladt.c   | 21 +++++++---
 src/backend/storage/lmgr/predicate.c  | 26 +++++++-----
 src/backend/utils/time/snapmgr.c      | 76 +++++++++++++++++++++++++----------
 src/include/storage/predicate.h       |  4 +-
 src/include/storage/procarray.h       |  2 +-
 src/include/storage/sinvaladt.h       |  1 +
 8 files changed, 101 insertions(+), 48 deletions(-)

diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml
index ca55a5b..116315f 100644
--- a/doc/src/sgml/ref/set_transaction.sgml
+++ b/doc/src/sgml/ref/set_transaction.sgml
@@ -221,9 +221,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 SELECT pg_export_snapshot();
- pg_export_snapshot
---------------------
- 000003A1-1
+ pg_export_snapshot  
+---------------------
+ 00000003-0000001B-1
 (1 row)
 </programlisting>
 
@@ -233,7 +233,7 @@ SELECT pg_export_snapshot();
 
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-SET TRANSACTION SNAPSHOT '000003A1-1';
+SET TRANSACTION SNAPSHOT '00000003-0000001B-1';
 </programlisting></para>
  </refsect1>
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8a71536..dfddfc4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1793,14 +1793,15 @@ GetSnapshotData(Snapshot snapshot)
  * Returns TRUE if successful, FALSE if source xact is no longer running.
  */
 bool
-ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
+ProcArrayInstallImportedXmin(TransactionId xmin,
+							 VirtualTransactionId *sourcevxid)
 {
 	bool		result = false;
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
 	Assert(TransactionIdIsNormal(xmin));
-	if (!TransactionIdIsNormal(sourcexid))
+	if (!sourcevxid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
@@ -1817,8 +1818,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
 			continue;
 
-		xid = pgxact->xid;		/* fetch just once */
-		if (xid != sourcexid)
+		/* We are only interested in the specific virtual transaction. */
+		if (proc->backendId != sourcevxid->backendId)
+			continue;
+		if (proc->lxid != sourcevxid->localTransactionId)
 			continue;
 
 		/*
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 18302ca..78ad145 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -193,6 +193,7 @@ static SISeg *shmInvalBuffer;	/* pointer to the shared inval buffer */
 
 
 static LocalTransactionId nextLocalTransactionId;
+static LocalTransactionId lastLocalTransactionId = InvalidLocalTransactionId;
 
 static void CleanupInvalidationState(int status, Datum arg);
 
@@ -764,17 +765,27 @@ SICleanupQueue(bool callerHasWriteLock, int minFree)
  * within a short interval, successive procs occupying the same backend ID
  * slot should use a consecutive sequence of local IDs, which is implemented
  * by copying nextLocalTransactionId as seen above.
+ *
+ * The return value of this function is remembered for future use.
  */
 LocalTransactionId
 GetNextLocalTransactionId(void)
 {
-	LocalTransactionId result;
-
 	/* loop to avoid returning InvalidLocalTransactionId at wraparound */
 	do
 	{
-		result = nextLocalTransactionId++;
-	} while (!LocalTransactionIdIsValid(result));
+		lastLocalTransactionId = nextLocalTransactionId++;
+	} while (!LocalTransactionIdIsValid(lastLocalTransactionId));
 
-	return result;
+	return lastLocalTransactionId;
+}
+
+/*
+ * Get the last local LocalTransactionId which was allocated by
+ * GetNextLocalTransactionId
+ */
+LocalTransactionId
+GetLastLocalTransactionId(void)
+{
+	return lastLocalTransactionId;
 }
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 27c4af9..bce505a 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -148,7 +148,7 @@
  * predicate lock maintenance
  *		GetSerializableTransactionSnapshot(Snapshot snapshot)
  *		SetSerializableTransactionSnapshot(Snapshot snapshot,
- *										   TransactionId sourcexid)
+ *										   VirtualTransactionId *sourcevxid)
  *		RegisterPredicateLockingXid(void)
  *		PredicateLockRelation(Relation relation, Snapshot snapshot)
  *		PredicateLockPage(Relation relation, BlockNumber blkno,
@@ -434,7 +434,8 @@ static uint32 predicatelock_hash(const void *key, Size keysize);
 static void SummarizeOldestCommittedSxact(void);
 static Snapshot GetSafeSnapshot(Snapshot snapshot);
 static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid);
+									  VirtualTransactionId *sourcevxid,
+									  int sourcepid);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
 static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 						  PREDICATELOCKTARGETTAG *parent);
@@ -1510,7 +1511,7 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		 * one passed to it, but we avoid assuming that here.
 		 */
 		snapshot = GetSerializableTransactionSnapshotInt(origSnapshot,
-													   InvalidTransactionId);
+														 NULL, InvalidPid);
 
 		if (MySerializableXact == InvalidSerializableXact)
 			return snapshot;	/* no concurrent r/w xacts; it's safe */
@@ -1643,7 +1644,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
 		return GetSafeSnapshot(snapshot);
 
 	return GetSerializableTransactionSnapshotInt(snapshot,
-												 InvalidTransactionId);
+												 NULL, InvalidPid);
 }
 
 /*
@@ -1658,7 +1659,8 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
  */
 void
 SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid)
+								   VirtualTransactionId *sourcevxid,
+								   int sourcepid)
 {
 	Assert(IsolationIsSerializable());
 
@@ -1673,7 +1675,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("a snapshot-importing transaction must not be READ ONLY DEFERRABLE")));
 
-	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcexid);
+	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcevxid,
+												 sourcepid);
 }
 
 /*
@@ -1687,7 +1690,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
  */
 static Snapshot
 GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid)
+									  VirtualTransactionId *sourcevxid,
+									  int sourcepid)
 {
 	PGPROC	   *proc;
 	VirtualTransactionId vxid;
@@ -1741,17 +1745,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	} while (!sxact);
 
 	/* Get the snapshot, or check that it's safe to use */
-	if (!TransactionIdIsValid(sourcexid))
+	if (!sourcevxid)
 		snapshot = GetSnapshotData(snapshot);
-	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
 	{
 		ReleasePredXact(sxact);
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+			errdetail("The source process with pid %d is not running anymore.",
+						sourcepid)));
 	}
 
 	/*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b3d4fe3..ba1e78e 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -58,6 +58,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinval.h"
+#include "storage/sinvaladt.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -211,9 +212,9 @@ static Snapshot FirstXactSnapshot = NULL;
 
 /* Define pathname of exported-snapshot files */
 #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
-#define XactExportFilePath(path, xid, num, suffix) \
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d%s", \
-			 xid, num, suffix)
+#define XactExportFilePath(path, backenId, lxid, num, suffix) \
+	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d%s", \
+			 backenId, lxid, num, suffix)
 
 /* Current xact's exported snapshots (a list of Snapshot structs) */
 static List *exportedSnapshots = NIL;
@@ -558,8 +559,8 @@ SnapshotSetCommandId(CommandId curcid)
  * in GetTransactionSnapshot.
  */
 static void
-SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
-					   PGPROC *sourceproc)
+SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
+					   int sourcepid, PGPROC *sourceproc)
 {
 	/* Caller should have checked this already */
 	Assert(!FirstSnapshotSet);
@@ -617,12 +618,12 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 					 errmsg("could not import the requested snapshot"),
 			   errdetail("The source transaction is not running anymore.")));
 	}
-	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcevxid))
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+			errdetail("The source process with pid %d is not running anymore.",
+						sourcepid)));
 
 	/*
 	 * In transaction-snapshot mode, the first snapshot must live until end of
@@ -632,7 +633,8 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 	if (IsolationUsesXactSnapshot())
 	{
 		if (IsolationIsSerializable())
-			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcexid);
+			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid,
+											   sourcepid);
 		/* Make a saved copy */
 		CurrentSnapshot = CopySnapshot(CurrentSnapshot);
 		FirstXactSnapshot = CurrentSnapshot;
@@ -1075,7 +1077,6 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 */
 	if (exportedSnapshots != NIL)
 	{
-		TransactionId myxid = GetTopTransactionId();
 		int			i;
 		char		buf[MAXPGPATH];
 		ListCell   *lc;
@@ -1087,7 +1088,8 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 		 */
 		for (i = 1; i <= list_length(exportedSnapshots); i++)
 		{
-			XactExportFilePath(buf, myxid, i, "");
+			XactExportFilePath(buf, MyProc->backendId,
+							   GetLastLocalTransactionId(), i, "");
 			if (unlink(buf))
 				elog(WARNING, "could not unlink file \"%s\": %m", buf);
 		}
@@ -1183,9 +1185,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * This will assign a transaction ID if we do not yet have one.
+	 * Get our transaction ID if there is one, to include in the snapshot.
 	 */
-	topXid = GetTopTransactionId();
+	topXid = GetTopTransactionIdIfAny();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1226,7 +1228,8 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "xid:%u\n", topXid);
+	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
+	appendStringInfo(&buf, "pid:%d\n", MyProcPid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
 	appendStringInfo(&buf, "ro:%d\n", XactReadOnly);
@@ -1245,7 +1248,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * xmax.  (We need not make the same check for subxip[] members, see
 	 * snapshot.h.)
 	 */
-	addTopXid = TransactionIdPrecedes(topXid, snapshot->xmax) ? 1 : 0;
+	addTopXid = (TransactionIdIsValid(topXid) &&
+		TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
 	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
 	for (i = 0; i < snapshot->xcnt; i++)
 		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
@@ -1276,7 +1280,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * ensures that no other backend can read an incomplete file
 	 * (ImportSnapshot won't allow it because of its valid-characters check).
 	 */
-	XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), ".tmp");
+	XactExportFilePath(pathtmp, MyProc->backendId, MyProc->lxid,
+					   list_length(exportedSnapshots), ".tmp");
 	if (!(f = AllocateFile(pathtmp, PG_BINARY_W)))
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -1298,7 +1303,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * Now that we have written everything into a .tmp file, rename the file
 	 * to remove the .tmp suffix.
 	 */
-	XactExportFilePath(path, topXid, list_length(exportedSnapshots), "");
+	XactExportFilePath(path, MyProc->backendId, MyProc->lxid,
+					   list_length(exportedSnapshots), "");
 
 	if (rename(pathtmp, path) < 0)
 		ereport(ERROR,
@@ -1384,6 +1390,30 @@ parseXidFromText(const char *prefix, char **s, const char *filename)
 	return val;
 }
 
+static void
+parseVxidFromText(const char *prefix, char **s, const char *filename,
+				  VirtualTransactionId *vxid)
+{
+	char	   *ptr = *s;
+	int			prefixlen = strlen(prefix);
+
+	if (strncmp(ptr, prefix, prefixlen) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr += prefixlen;
+	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr = strchr(ptr, '\n');
+	if (!ptr)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	*s = ptr + 1;
+}
+
 /*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
@@ -1399,7 +1429,8 @@ ImportSnapshot(const char *idstr)
 	char	   *filebuf;
 	int			xcnt;
 	int			i;
-	TransactionId src_xid;
+	VirtualTransactionId src_vxid;
+	int			src_pid;
 	Oid			src_dbid;
 	int			src_isolevel;
 	bool		src_readonly;
@@ -1463,7 +1494,8 @@ ImportSnapshot(const char *idstr)
 	 */
 	memset(&snapshot, 0, sizeof(snapshot));
 
-	src_xid = parseXidFromText("xid:", &filebuf, path);
+	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
+	src_pid = parseIntFromText("pid:", &filebuf, path);
 	/* we abuse parseXidFromText a bit here ... */
 	src_dbid = parseXidFromText("dbid:", &filebuf, path);
 	src_isolevel = parseIntFromText("iso:", &filebuf, path);
@@ -1513,7 +1545,7 @@ ImportSnapshot(const char *idstr)
 	 * don't trouble to check the array elements, just the most critical
 	 * fields.
 	 */
-	if (!TransactionIdIsNormal(src_xid) ||
+	if (!VirtualTransactionIdIsValid(src_vxid) ||
 		!OidIsValid(src_dbid) ||
 		!TransactionIdIsNormal(snapshot.xmin) ||
 		!TransactionIdIsNormal(snapshot.xmax))
@@ -1554,7 +1586,7 @@ ImportSnapshot(const char *idstr)
 			  errmsg("cannot import a snapshot from a different database")));
 
 	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, src_xid, NULL);
+	SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
 }
 
 /*
@@ -2141,5 +2173,5 @@ RestoreSnapshot(char *start_address)
 void
 RestoreTransactionSnapshot(Snapshot snapshot, void *master_pgproc)
 {
-	SetTransactionSnapshot(snapshot, InvalidTransactionId, master_pgproc);
+	SetTransactionSnapshot(snapshot, NULL, InvalidPid, master_pgproc);
 }
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 8f9ea29..941ba71 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -14,6 +14,7 @@
 #ifndef PREDICATE_H
 #define PREDICATE_H
 
+#include "storage/lock.h"
 #include "utils/relcache.h"
 #include "utils/snapshot.h"
 
@@ -46,7 +47,8 @@ extern bool PageIsPredicateLocked(Relation relation, BlockNumber blkno);
 /* predicate lock maintenance */
 extern Snapshot GetSerializableTransactionSnapshot(Snapshot snapshot);
 extern void SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid);
+								   VirtualTransactionId *sourcevxid,
+								   int sourcepid);
 extern void RegisterPredicateLockingXid(TransactionId xid);
 extern void PredicateLockRelation(Relation relation, Snapshot snapshot);
 extern void PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 22955a7..5cf8ff7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -82,7 +82,7 @@ extern int	GetMaxSnapshotSubxidCount(void);
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
 extern bool ProcArrayInstallImportedXmin(TransactionId xmin,
-							 TransactionId sourcexid);
+							 VirtualTransactionId *sourcevxid);
 extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
diff --git a/src/include/storage/sinvaladt.h b/src/include/storage/sinvaladt.h
index 07ac2f8..df98ee4 100644
--- a/src/include/storage/sinvaladt.h
+++ b/src/include/storage/sinvaladt.h
@@ -39,5 +39,6 @@ extern int	SIGetDataEntries(SharedInvalidationMessage *data, int datasize);
 extern void SICleanupQueue(bool callerHasWriteLock, int minFree);
 
 extern LocalTransactionId GetNextLocalTransactionId(void);
+extern LocalTransactionId GetLastLocalTransactionId(void);
 
 #endif   /* SINVALADT_H */
-- 
2.7.4

#25Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#24)
Re: logical replication busy-waiting on a lock

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots? I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#25)
Re: logical replication busy-waiting on a lock

On 09/06/17 01:06, Andres Freund wrote:

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots? I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

That sounds reasonable, I'll do that.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Petr Jelinek (#26)
2 attachment(s)
Re: logical replication busy-waiting on a lock

On 09/06/17 03:20, Petr Jelinek wrote:

On 09/06/17 01:06, Andres Freund wrote:

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots? I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

That sounds reasonable, I'll do that.

And here it is, seems better (the 0002 is same as before).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

v2-0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchtext/x-patch; name=v2-0002-Don-t-assign-xid-to-logical-decoding-snapshots.patchDownload
From f6df47e8257cbce4a78ceeeac504c9fe86527b05 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:07:49 +0200
Subject: [PATCH 2/2] Don't assign xid to logical decoding snapshots

---
 src/backend/replication/logical/snapbuild.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 8848f5b..e06aa09 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -262,7 +262,7 @@ static bool ExportInProgress = false;
 static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
-static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid);
+static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
 static void SnapBuildFreeSnapshot(Snapshot snap);
 
@@ -463,7 +463,7 @@ SnapBuildSnapDecRefcount(Snapshot snap)
  * and ->subxip/subxcnt values.
  */
 static Snapshot
-SnapBuildBuildSnapshot(SnapBuild *builder, TransactionId xid)
+SnapBuildBuildSnapshot(SnapBuild *builder)
 {
 	Snapshot	snapshot;
 	Size		ssize;
@@ -562,7 +562,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
 
-	snap = SnapBuildBuildSnapshot(builder, GetTopTransactionId());
+	snap = SnapBuildBuildSnapshot(builder);
 
 	/*
 	 * We know that snap->xmin is alive, enforced by the logical xmin
@@ -679,7 +679,7 @@ SnapBuildGetOrBuildSnapshot(SnapBuild *builder, TransactionId xid)
 	/* only build a new snapshot if we don't have a prebuilt one */
 	if (builder->snapshot == NULL)
 	{
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 		/* increase refcount for the snapshot builder */
 		SnapBuildSnapIncRefcount(builder->snapshot);
 	}
@@ -743,7 +743,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 		/* only build a new snapshot if we don't have a prebuilt one */
 		if (builder->snapshot == NULL)
 		{
-			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+			builder->snapshot = SnapBuildBuildSnapshot(builder);
 			/* increase refcount for the snapshot builder */
 			SnapBuildSnapIncRefcount(builder->snapshot);
 		}
@@ -1061,7 +1061,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		if (builder->snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		builder->snapshot = SnapBuildBuildSnapshot(builder);
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1831,7 +1831,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	{
 		SnapBuildSnapDecRefcount(builder->snapshot);
 	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder, InvalidTransactionId);
+	builder->snapshot = SnapBuildBuildSnapshot(builder);
 	SnapBuildSnapIncRefcount(builder->snapshot);
 
 	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-- 
2.7.4

v2-0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchtext/x-patch; name=v2-0001-Use-virtual-transaction-instead-of-normal-ones-in-ex.patchDownload
From aaf6a332bd5010a6f4f6fd113c703a26533d5737 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmodos@pjmodos.net>
Date: Sat, 3 Jun 2017 02:06:22 +0200
Subject: [PATCH 1/2] Use virtual transaction instead of normal ones in
 exported snapshots

---
 doc/src/sgml/ref/set_transaction.sgml |   8 +--
 src/backend/storage/ipc/procarray.c   |  11 ++--
 src/backend/storage/lmgr/predicate.c  |  26 ++++----
 src/backend/utils/time/snapmgr.c      | 121 ++++++++++++++++++++++------------
 src/include/storage/predicate.h       |   4 +-
 src/include/storage/procarray.h       |   2 +-
 6 files changed, 109 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/ref/set_transaction.sgml b/doc/src/sgml/ref/set_transaction.sgml
index ca55a5b..116315f 100644
--- a/doc/src/sgml/ref/set_transaction.sgml
+++ b/doc/src/sgml/ref/set_transaction.sgml
@@ -221,9 +221,9 @@ SET SESSION CHARACTERISTICS AS TRANSACTION <replaceable class="parameter">transa
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 SELECT pg_export_snapshot();
- pg_export_snapshot
---------------------
- 000003A1-1
+ pg_export_snapshot  
+---------------------
+ 00000003-0000001B-1
 (1 row)
 </programlisting>
 
@@ -233,7 +233,7 @@ SELECT pg_export_snapshot();
 
 <programlisting>
 BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-SET TRANSACTION SNAPSHOT '000003A1-1';
+SET TRANSACTION SNAPSHOT '00000003-0000001B-1';
 </programlisting></para>
  </refsect1>
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 8a71536..dfddfc4 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1793,14 +1793,15 @@ GetSnapshotData(Snapshot snapshot)
  * Returns TRUE if successful, FALSE if source xact is no longer running.
  */
 bool
-ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
+ProcArrayInstallImportedXmin(TransactionId xmin,
+							 VirtualTransactionId *sourcevxid)
 {
 	bool		result = false;
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
 	Assert(TransactionIdIsNormal(xmin));
-	if (!TransactionIdIsNormal(sourcexid))
+	if (!sourcevxid)
 		return false;
 
 	/* Get lock so source xact can't end while we're doing this */
@@ -1817,8 +1818,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin, TransactionId sourcexid)
 		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
 			continue;
 
-		xid = pgxact->xid;		/* fetch just once */
-		if (xid != sourcexid)
+		/* We are only interested in the specific virtual transaction. */
+		if (proc->backendId != sourcevxid->backendId)
+			continue;
+		if (proc->lxid != sourcevxid->localTransactionId)
 			continue;
 
 		/*
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 27c4af9..bce505a 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -148,7 +148,7 @@
  * predicate lock maintenance
  *		GetSerializableTransactionSnapshot(Snapshot snapshot)
  *		SetSerializableTransactionSnapshot(Snapshot snapshot,
- *										   TransactionId sourcexid)
+ *										   VirtualTransactionId *sourcevxid)
  *		RegisterPredicateLockingXid(void)
  *		PredicateLockRelation(Relation relation, Snapshot snapshot)
  *		PredicateLockPage(Relation relation, BlockNumber blkno,
@@ -434,7 +434,8 @@ static uint32 predicatelock_hash(const void *key, Size keysize);
 static void SummarizeOldestCommittedSxact(void);
 static Snapshot GetSafeSnapshot(Snapshot snapshot);
 static Snapshot GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid);
+									  VirtualTransactionId *sourcevxid,
+									  int sourcepid);
 static bool PredicateLockExists(const PREDICATELOCKTARGETTAG *targettag);
 static bool GetParentPredicateLockTag(const PREDICATELOCKTARGETTAG *tag,
 						  PREDICATELOCKTARGETTAG *parent);
@@ -1510,7 +1511,7 @@ GetSafeSnapshot(Snapshot origSnapshot)
 		 * one passed to it, but we avoid assuming that here.
 		 */
 		snapshot = GetSerializableTransactionSnapshotInt(origSnapshot,
-													   InvalidTransactionId);
+														 NULL, InvalidPid);
 
 		if (MySerializableXact == InvalidSerializableXact)
 			return snapshot;	/* no concurrent r/w xacts; it's safe */
@@ -1643,7 +1644,7 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
 		return GetSafeSnapshot(snapshot);
 
 	return GetSerializableTransactionSnapshotInt(snapshot,
-												 InvalidTransactionId);
+												 NULL, InvalidPid);
 }
 
 /*
@@ -1658,7 +1659,8 @@ GetSerializableTransactionSnapshot(Snapshot snapshot)
  */
 void
 SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid)
+								   VirtualTransactionId *sourcevxid,
+								   int sourcepid)
 {
 	Assert(IsolationIsSerializable());
 
@@ -1673,7 +1675,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("a snapshot-importing transaction must not be READ ONLY DEFERRABLE")));
 
-	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcexid);
+	(void) GetSerializableTransactionSnapshotInt(snapshot, sourcevxid,
+												 sourcepid);
 }
 
 /*
@@ -1687,7 +1690,8 @@ SetSerializableTransactionSnapshot(Snapshot snapshot,
  */
 static Snapshot
 GetSerializableTransactionSnapshotInt(Snapshot snapshot,
-									  TransactionId sourcexid)
+									  VirtualTransactionId *sourcevxid,
+									  int sourcepid)
 {
 	PGPROC	   *proc;
 	VirtualTransactionId vxid;
@@ -1741,17 +1745,17 @@ GetSerializableTransactionSnapshotInt(Snapshot snapshot,
 	} while (!sxact);
 
 	/* Get the snapshot, or check that it's safe to use */
-	if (!TransactionIdIsValid(sourcexid))
+	if (!sourcevxid)
 		snapshot = GetSnapshotData(snapshot);
-	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(snapshot->xmin, sourcevxid))
 	{
 		ReleasePredXact(sxact);
 		LWLockRelease(SerializableXactHashLock);
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+			errdetail("The source process with pid %d is not running anymore.",
+						sourcepid)));
 	}
 
 	/*
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index b3d4fe3..7e45400 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -58,6 +58,7 @@
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "storage/sinval.h"
+#include "storage/sinvaladt.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -211,11 +212,15 @@ static Snapshot FirstXactSnapshot = NULL;
 
 /* Define pathname of exported-snapshot files */
 #define SNAPSHOT_EXPORT_DIR "pg_snapshots"
-#define XactExportFilePath(path, xid, num, suffix) \
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d%s", \
-			 xid, num, suffix)
 
-/* Current xact's exported snapshots (a list of Snapshot structs) */
+/* Structure holding info about exported snapshot. */
+typedef struct ExportedSnapshot
+{
+	char *snapfile;
+	Snapshot snapshot;
+} ExportedSnapshot;
+
+/* Current xact's exported snapshots (a list of ExportedSnapshot structs) */
 static List *exportedSnapshots = NIL;
 
 /* Prototypes for local functions */
@@ -558,8 +563,8 @@ SnapshotSetCommandId(CommandId curcid)
  * in GetTransactionSnapshot.
  */
 static void
-SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
-					   PGPROC *sourceproc)
+SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
+					   int sourcepid, PGPROC *sourceproc)
 {
 	/* Caller should have checked this already */
 	Assert(!FirstSnapshotSet);
@@ -617,12 +622,12 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 					 errmsg("could not import the requested snapshot"),
 			   errdetail("The source transaction is not running anymore.")));
 	}
-	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcexid))
+	else if (!ProcArrayInstallImportedXmin(CurrentSnapshot->xmin, sourcevxid))
 		ereport(ERROR,
 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 				 errmsg("could not import the requested snapshot"),
-			   errdetail("The source transaction %u is not running anymore.",
-						 sourcexid)));
+			errdetail("The source process with pid %d is not running anymore.",
+						sourcepid)));
 
 	/*
 	 * In transaction-snapshot mode, the first snapshot must live until end of
@@ -632,7 +637,8 @@ SetTransactionSnapshot(Snapshot sourcesnap, TransactionId sourcexid,
 	if (IsolationUsesXactSnapshot())
 	{
 		if (IsolationIsSerializable())
-			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcexid);
+			SetSerializableTransactionSnapshot(CurrentSnapshot, sourcevxid,
+											   sourcepid);
 		/* Make a saved copy */
 		CurrentSnapshot = CopySnapshot(CurrentSnapshot);
 		FirstXactSnapshot = CurrentSnapshot;
@@ -1075,33 +1081,29 @@ AtEOXact_Snapshot(bool isCommit, bool resetXmin)
 	 */
 	if (exportedSnapshots != NIL)
 	{
-		TransactionId myxid = GetTopTransactionId();
-		int			i;
-		char		buf[MAXPGPATH];
 		ListCell   *lc;
 
 		/*
 		 * Get rid of the files.  Unlink failure is only a WARNING because (1)
 		 * it's too late to abort the transaction, and (2) leaving a leaked
 		 * file around has little real consequence anyway.
-		 */
-		for (i = 1; i <= list_length(exportedSnapshots); i++)
-		{
-			XactExportFilePath(buf, myxid, i, "");
-			if (unlink(buf))
-				elog(WARNING, "could not unlink file \"%s\": %m", buf);
-		}
-
-		/*
-		 * As with the FirstXactSnapshot, we needn't spend any effort on
-		 * cleaning up the per-snapshot data structures, but we do need to
-		 * remove them from RegisteredSnapshots to prevent a warning below.
+		 *
+		 * We also also need to remove the snapshots from RegisteredSnapshots
+		 * to prevent a warning below.
+		 *
+		 * As with the FirstXactSnapshot, we don't need to free resources of
+		 * the snapshot iself as it will go away with the memory context.
 		 */
 		foreach(lc, exportedSnapshots)
 		{
-			Snapshot	snap = (Snapshot) lfirst(lc);
+			ExportedSnapshot	*esnap = (ExportedSnapshot *) lfirst(lc);
 
-			pairingheap_remove(&RegisteredSnapshots, &snap->ph_node);
+			if (unlink(esnap->snapfile))
+				elog(WARNING, "could not unlink file \"%s\": %m",
+					 esnap->snapfile);
+
+			pairingheap_remove(&RegisteredSnapshots,
+							   &esnap->snapshot->ph_node);
 		}
 
 		exportedSnapshots = NIL;
@@ -1159,6 +1161,7 @@ ExportSnapshot(Snapshot snapshot)
 {
 	TransactionId topXid;
 	TransactionId *children;
+	ExportedSnapshot *esnap;
 	int			nchildren;
 	int			addTopXid;
 	StringInfoData buf;
@@ -1183,9 +1186,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * This will assign a transaction ID if we do not yet have one.
+	 * Get our transaction ID if there is one, to include in the snapshot.
 	 */
-	topXid = GetTopTransactionId();
+	topXid = GetTopTransactionIdIfAny();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1205,15 +1208,23 @@ ExportSnapshot(Snapshot snapshot)
 	nchildren = xactGetCommittedChildren(&children);
 
 	/*
+	 * Generate file path for the snapshot.  We start numbering of snapshots
+	 * inside the transaction from 1.
+	 */
+	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d",
+			 topXid, list_length(exportedSnapshots) + 1);
+
+	/*
 	 * Copy the snapshot into TopTransactionContext, add it to the
 	 * exportedSnapshots list, and mark it pseudo-registered.  We do this to
 	 * ensure that the snapshot's xmin is honored for the rest of the
 	 * transaction.
 	 */
-	snapshot = CopySnapshot(snapshot);
-
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-	exportedSnapshots = lappend(exportedSnapshots, snapshot);
+	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
+	esnap->snapfile = pstrdup(path);
+	esnap->snapshot = snapshot = CopySnapshot(snapshot);
+	exportedSnapshots = lappend(exportedSnapshots, esnap);
 	MemoryContextSwitchTo(oldcxt);
 
 	snapshot->regd_count++;
@@ -1226,7 +1237,8 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "xid:%u\n", topXid);
+	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
+	appendStringInfo(&buf, "pid:%d\n", MyProcPid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
 	appendStringInfo(&buf, "ro:%d\n", XactReadOnly);
@@ -1245,7 +1257,8 @@ ExportSnapshot(Snapshot snapshot)
 	 * xmax.  (We need not make the same check for subxip[] members, see
 	 * snapshot.h.)
 	 */
-	addTopXid = TransactionIdPrecedes(topXid, snapshot->xmax) ? 1 : 0;
+	addTopXid = (TransactionIdIsValid(topXid) &&
+		TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
 	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
 	for (i = 0; i < snapshot->xcnt; i++)
 		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
@@ -1276,7 +1289,7 @@ ExportSnapshot(Snapshot snapshot)
 	 * ensures that no other backend can read an incomplete file
 	 * (ImportSnapshot won't allow it because of its valid-characters check).
 	 */
-	XactExportFilePath(pathtmp, topXid, list_length(exportedSnapshots), ".tmp");
+	snprintf(pathtmp, sizeof(pathtmp), "%s.tmp", path);
 	if (!(f = AllocateFile(pathtmp, PG_BINARY_W)))
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -1298,8 +1311,6 @@ ExportSnapshot(Snapshot snapshot)
 	 * Now that we have written everything into a .tmp file, rename the file
 	 * to remove the .tmp suffix.
 	 */
-	XactExportFilePath(path, topXid, list_length(exportedSnapshots), "");
-
 	if (rename(pathtmp, path) < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -1384,6 +1395,30 @@ parseXidFromText(const char *prefix, char **s, const char *filename)
 	return val;
 }
 
+static void
+parseVxidFromText(const char *prefix, char **s, const char *filename,
+				  VirtualTransactionId *vxid)
+{
+	char	   *ptr = *s;
+	int			prefixlen = strlen(prefix);
+
+	if (strncmp(ptr, prefix, prefixlen) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr += prefixlen;
+	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	ptr = strchr(ptr, '\n');
+	if (!ptr)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+				 errmsg("invalid snapshot data in file \"%s\"", filename)));
+	*s = ptr + 1;
+}
+
 /*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
@@ -1399,7 +1434,8 @@ ImportSnapshot(const char *idstr)
 	char	   *filebuf;
 	int			xcnt;
 	int			i;
-	TransactionId src_xid;
+	VirtualTransactionId src_vxid;
+	int			src_pid;
 	Oid			src_dbid;
 	int			src_isolevel;
 	bool		src_readonly;
@@ -1463,7 +1499,8 @@ ImportSnapshot(const char *idstr)
 	 */
 	memset(&snapshot, 0, sizeof(snapshot));
 
-	src_xid = parseXidFromText("xid:", &filebuf, path);
+	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
+	src_pid = parseIntFromText("pid:", &filebuf, path);
 	/* we abuse parseXidFromText a bit here ... */
 	src_dbid = parseXidFromText("dbid:", &filebuf, path);
 	src_isolevel = parseIntFromText("iso:", &filebuf, path);
@@ -1513,7 +1550,7 @@ ImportSnapshot(const char *idstr)
 	 * don't trouble to check the array elements, just the most critical
 	 * fields.
 	 */
-	if (!TransactionIdIsNormal(src_xid) ||
+	if (!VirtualTransactionIdIsValid(src_vxid) ||
 		!OidIsValid(src_dbid) ||
 		!TransactionIdIsNormal(snapshot.xmin) ||
 		!TransactionIdIsNormal(snapshot.xmax))
@@ -1554,7 +1591,7 @@ ImportSnapshot(const char *idstr)
 			  errmsg("cannot import a snapshot from a different database")));
 
 	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, src_xid, NULL);
+	SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
 }
 
 /*
@@ -2141,5 +2178,5 @@ RestoreSnapshot(char *start_address)
 void
 RestoreTransactionSnapshot(Snapshot snapshot, void *master_pgproc)
 {
-	SetTransactionSnapshot(snapshot, InvalidTransactionId, master_pgproc);
+	SetTransactionSnapshot(snapshot, NULL, InvalidPid, master_pgproc);
 }
diff --git a/src/include/storage/predicate.h b/src/include/storage/predicate.h
index 8f9ea29..941ba71 100644
--- a/src/include/storage/predicate.h
+++ b/src/include/storage/predicate.h
@@ -14,6 +14,7 @@
 #ifndef PREDICATE_H
 #define PREDICATE_H
 
+#include "storage/lock.h"
 #include "utils/relcache.h"
 #include "utils/snapshot.h"
 
@@ -46,7 +47,8 @@ extern bool PageIsPredicateLocked(Relation relation, BlockNumber blkno);
 /* predicate lock maintenance */
 extern Snapshot GetSerializableTransactionSnapshot(Snapshot snapshot);
 extern void SetSerializableTransactionSnapshot(Snapshot snapshot,
-								   TransactionId sourcexid);
+								   VirtualTransactionId *sourcevxid,
+								   int sourcepid);
 extern void RegisterPredicateLockingXid(TransactionId xid);
 extern void PredicateLockRelation(Relation relation, Snapshot snapshot);
 extern void PredicateLockPage(Relation relation, BlockNumber blkno, Snapshot snapshot);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 22955a7..5cf8ff7 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -82,7 +82,7 @@ extern int	GetMaxSnapshotSubxidCount(void);
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
 extern bool ProcArrayInstallImportedXmin(TransactionId xmin,
-							 TransactionId sourcexid);
+							 VirtualTransactionId *sourcevxid);
 extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
-- 
2.7.4

#28Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#27)
Re: logical replication busy-waiting on a lock

On 2017-06-09 22:28:00 +0200, Petr Jelinek wrote:

And here it is, seems better (the 0002 is same as before).

Cool, looks good on a quick scan.

/* Define pathname of exported-snapshot files */
#define SNAPSHOT_EXPORT_DIR "pg_snapshots"
-#define XactExportFilePath(path, xid, num, suffix) \
- snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d%s", \
- xid, num, suffix)

-/* Current xact's exported snapshots (a list of Snapshot structs) */
+/* Structure holding info about exported snapshot. */
+typedef struct ExportedSnapshot
+{
+	char *snapfile;
+	Snapshot snapshot;
+} ExportedSnapshot;
+
+/* Current xact's exported snapshots (a list of ExportedSnapshot structs) */
static List *exportedSnapshots = NIL;

trival quibble: *pointers to

Will take care of it over the weekend.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Andres Freund
andres@anarazel.de
In reply to: Petr Jelinek (#27)
Re: logical replication busy-waiting on a lock

On 2017-06-09 22:28:00 +0200, Petr Jelinek wrote:

On 09/06/17 03:20, Petr Jelinek wrote:

On 09/06/17 01:06, Andres Freund wrote:

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots? I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

That sounds reasonable, I'll do that.

And here it is, seems better (the 0002 is same as before).

I spent a chunk of time with this. One surprising, but easy to fix issue was
that this patch had the exported file name as:

/*
+	 * Generate file path for the snapshot.  We start numbering of snapshots
+	 * inside the transaction from 1.
+	 */
+	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d",
+			 topXid, list_length(exportedSnapshots) + 1);
+

I.e. just as before, unlike the previous version of the patch which used
virtualxids. Given that we don't require topXid to be set, this seems
to largely break the patch.

Secondly, because of

/*
-	 * This will assign a transaction ID if we do not yet have one.
+	 * Get our transaction ID if there is one, to include in the snapshot.
*/
-	topXid = GetTopTransactionId();
+	topXid = GetTopTransactionIdIfAny();

which is perfectly correct on its face, we actually added a substantial
feature: Previously exporting snapshots was only possible on primaries,
now it's also possible on standbys. The only thing that made that error
out before was the check during xid assignment. Because snapshots work
somewhat differntly on standbys, I spent a good chunk of time staring at
code trying to see whether it'd break anything. Neither code-reading
nor testing seems to suggest any problems. Were we earlier in the
release cycle I'd be perfectly fine with an accidental feature, but I'm
wondering a bit whether we should just make it error out at this point,
because it'd be a new feature. I'm -0.5 on that, but I think it should
be raised.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#29)
Re: logical replication busy-waiting on a lock

On 2017-06-13 00:32:40 -0700, Andres Freund wrote:

On 2017-06-09 22:28:00 +0200, Petr Jelinek wrote:

On 09/06/17 03:20, Petr Jelinek wrote:

On 09/06/17 01:06, Andres Freund wrote:

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:

One thing I don't like is the GetLastLocalTransactionId() that I had to
add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots? I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

That sounds reasonable, I'll do that.

And here it is, seems better (the 0002 is same as before).

I spent a chunk of time with this. One surprising, but easy to fix issue was
that this patch had the exported file name as:

/*
+	 * Generate file path for the snapshot.  We start numbering of snapshots
+	 * inside the transaction from 1.
+	 */
+	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%d",
+			 topXid, list_length(exportedSnapshots) + 1);
+

I.e. just as before, unlike the previous version of the patch which used
virtualxids. Given that we don't require topXid to be set, this seems
to largely break the patch.

Secondly, because of

/*
-	 * This will assign a transaction ID if we do not yet have one.
+	 * Get our transaction ID if there is one, to include in the snapshot.
*/
-	topXid = GetTopTransactionId();
+	topXid = GetTopTransactionIdIfAny();

which is perfectly correct on its face, we actually added a substantial
feature: Previously exporting snapshots was only possible on primaries,
now it's also possible on standbys. The only thing that made that error
out before was the check during xid assignment. Because snapshots work
somewhat differntly on standbys, I spent a good chunk of time staring at
code trying to see whether it'd break anything. Neither code-reading
nor testing seems to suggest any problems. Were we earlier in the
release cycle I'd be perfectly fine with an accidental feature, but I'm
wondering a bit whether we should just make it error out at this point,
because it'd be a new feature. I'm -0.5 on that, but I think it should
be raised.

Just to be clear: The patch, after the first point above (which I did),
looks ok. I'm just looking for comments.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#30)
Re: logical replication busy-waiting on a lock

Hi,

On 2017-06-13 00:50:20 -0700, Andres Freund wrote:

Just to be clear: The patch, after the first point above (which I did),
looks ok. I'm just looking for comments.

And pushed.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Petr Jelinek
petr.jelinek@2ndquadrant.com
In reply to: Andres Freund (#31)
Re: logical replication busy-waiting on a lock

On 14/06/17 20:57, Andres Freund wrote:

Hi,

On 2017-06-13 00:50:20 -0700, Andres Freund wrote:

Just to be clear: The patch, after the first point above (which I did),
looks ok. I'm just looking for comments.

And pushed.

Thanks!

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers