Proposal for CSN based snapshots

Started by Ants Aasmaover 12 years ago113 messages

ants@cybertec.at

over 12 years ago

Given the recent ideas being thrown about changing how freezing and
clog is handled and MVCC catalog access I thought I would write out
the ideas that I have had about speeding up snapshots in case there is
an interesting tie in with the current discussions.

To refresh your memory the basic idea is to change visibility
determination to be based on a commit sequence number (CSN for short)
- a 8 byte number incremented on every commit representing the total
ordering of commits. To take a snapshot in this scheme you only need
to know the value of last assigned CSN, all transactions with XID less
than or equal to that number were commited at the time of the
snapshots, everything above wasn't committed. Besides speeding up
snapshot taking, this scheme can also be a building block for
consistent snapshots in a multi-master environment with minimal
communication. Google's Spanner database uses snapshots based on a
similar scheme.

The main tricky part about this scheme is finding the CSN that was
assigned to each XIDs in face of arbitrarily long transactions and
snapshots using only a bounded amount of shared memory. The secondary
tricky part is doing this in a way that doesn't need locks for
visibility determination as that would kill any hope of a performance
gain.

We need to keep around CSN slots for all currently running
transactions and CSN slots of transactions that are concurrent with
any active CSN based snapshot (xid.csn > min(snapshot.csn)). To do
this I propose the following datastructures to do the XID-to-CSN
mapping. For most recently assigned XIDs there is a ringbuffer of
slots that contain the CSN values of the XIDs or special CSN values
for transactions that haven't completed yet, aborted transactions or
subtransactions. I call this the dense CSN map. Looking up a CSN of a
XID from the ringbuffer is just a trivial direct indexing into the
ring buffer.

For long running transactions the ringbuffer may do a full circle
before a transaction commits. Such CSN slots along with slots that are
needed for older snapshots are evicted from the dense buffer into a
sorted array of XID-CSN pairs, or the sparse mapping. For locking
purposes there are two sparse buffers, one of them being active the
other inactive, more on that later. Looking up the CSN value of a XID
that has been evicted into the sparse mapping is a matter of
performing a binary search to find the slot and reading the CSN value.

Because snapshots can remain active for an unbounded amount of time
and there can be unbounded amount of active snapshots, even the sparse
mapping can fill up. To handle that case, each backend advertises its
lowest snapshot number csn_min. When values need to be evicted from
the sparse mapping, they are evicted in CSN order and written into the
CSN log - a series of CSN-XID pairs. Backends that may still be
concerned about those values are then notified that values that they
might need to use have been evicted. Backends with invalidated
snapshots can then convert their snapshots to regular list of
concurrent XIDs snapshots at their leisure.

To convert a CSN based snapshot to XID based, a backend would first
scan the shared memory structures for xids up to snapshot.xmax for
CSNs that are concurrent to the snapshot and insert the XIDs into the
snapshot, then read in the CSN log starting from snapshots CSN,
looking for xid's less than the snapshots xmax. After this the
snapshot can be handled like current snapshots are handled.

A more detailed view of synchronization primitives required for common
operations follows.

Taking a new snapshot
---------------------

Taking a CSN based snapshot under this scheme would consist of reading
xmin, csn and xmax from global variables, unlocked and in that order
with read barriers in between each load. If this is our oldest
snapshot we write our csn_min into pgproc, do a full memory barrier
and check from a global variable if the CSN we used is still
guaranteed to not be evicted (exceedingly unlikely but cannot be ruled
out).

The read barrier between reading xmin and csn is needed so the
guarantee applies that no transaction with tx.xid < ss.xmin could have
committed with tx.csn >= ss.csn, so xmin can be used to safely exclude
csn lookups. Read barrier between reading csn and xmax is needed to
guarantee that if tx.xid >= ss.xmax, then it's known that tx.csn >=
ss.csn. From the write side, there needs to be at least one full
memory barrier between GetNewTransactionId updating nextXid and
CommitTransaction updating nextCsn, which is quite easily satisfied.
Updating global xmin without races is slightly trickier but doable.

Checking visibility
-------------------

XidInMVCCSnapshot will need to switch on the type of snapshot used
after checking xmin and xmax. For list-of-XIDs snapshots the current
logic applies. CSN based snapshots need to first do an unlocked read
from a backend specific global variable to check if we have been
instructed to convert our snapshots to xid based, if so the snapshot
is converted (more on that separately). Otherwise we look up the CSN
of the xid.

To map xid to csn, first the value from the csn slot corresponding to
the xid is read in from dense map (just a plain denseCSNMap[xid %
denseMapSize]), then after issuing a read barrier, the dense map
eviction horizon is checked to verify that the value that we read in
was in fact valid, if it is, it can be compared to the snapshot csn
value to get the result, if not continue to check the sparse map.

To check the sparse map, we read in the sparse map version counter
value and use it to determine which one of the two maps is currently
active (an optimistic locking scheme). We use linear or binary search
to look up the slot for the XID. If the XID is not found we know that
it wasn't committed after taking the snapshot, we then have to check
the clog if it was committed or not. Otherwise compare the value
against our snapshots csn to determine visibility. Finally after
issuing a read barrier, check the sparse map version counter to see if
the result is valid.

Checking from the dense map can omit clog checks as we can use special
values to signify aborted and uncommitted values. Even more so, we can
defer clog updates until the corresponding slots are evicted from the
dense map.

Assigning XIDs and evicting from CSN buffers
--------------------------------------------

XID assignment will need an additional step to allocate a CSN slot for
the transaction. If the dense map ring buffer has filled up, this
might require evicting some entries from the dense CSN map. For less
contention and better efficiency it would be a good idea to do
evictions larger blocks at a time. One 8th or 16th of the dense map at
a time might be a good balance here. Eviction will be done under a new
CSNEvictionLock.

First we publish the point up to which we are evicting from dense to
notify committing backends of the hazard.

We then read the current nextCSN value and publish it as the largest
possible global_csn_min value we can arrive at, so if there is a
backend in the middle of taking a snapshot that has fetched a CSN but
hasn't yet updated procarray entry will notice that it is at risk and
will retry. Using nextCSN is being very conservative, but as far as I
can see the invalidations will be rare and cheap. We have to issue a
full memory barrier here so either the snapshot take sees our value or
we see its csn min.

If there is enough space, we just use global_csn_min as the sparse map
eviction horizon. If there is not enough free space in the current
active sparse map to guarantee that dense map will fit, we scan both
the active sparse array and the to-be-evicted block, collecting the
missing space worth xid-csn pairs with smallest CSN values to a heap,
reducing the heap size for every xid,csn we omit due to not being
needed either because the transaction was aborted or because it is
visible to all active snapshots. When we finish scanning and the heap
isn't empty, then the largest value in the heap is the sparse map
eviction horizon.

After arriving at the correct sparse map eviction horizon, we iterate
through the sparse map and the dense map block to be evicted, copying
all active or not-all-visible CSN slots to the inactive sparse map. We
also update clog for every committed transaction that we found in the
dense map. If we decided to evict some values, we write them to the
CSN log here and update the sparse map eviction horizon with the CSN
we arrived at. At this point the state in the current active sparse
map and the evictable dense map block are duplicated into the inactive
sparse map and CSN log. We now need to make the new state visible to
visibility checkers in a safe order, issuing a write barrier before
each step so the previous changes are visible:

* Notify all backends that have csn_min <= sparse map eviction horizon
that their snapshots are invalid and at what CSN log location they can
start to read to find concurrent xids.
* Increment sparse map version counter to switch sparse maps.
* Raise the dense map eviction horizon, to free up the dense map block.
* Overwrite the dense map block with special UncommittedCSN values
that are tagged with their XID (more on that later)

At this point we can consider the block cleared up for further use.
Because we don't want to lock shared structures for snapshotting we
need to maintain a global xmin value. To do this we acquire a spinlock
on the global xmin value, and check if it's empty, if no other
transaction is running we replace it with our xid.

At this point we know the minimum CSN of any unconverted snapshots, so
it's also a good time to clean up unnecessary CSN log.

Finally we are done, so we can release CSNEvictionLock and XIDGenLock.

Committing a transaction
------------------------

Most of the commit sequence remains the same, but where we currently
do ProcArrayEndTransaction, we will acquire a new LWLock protecting
the nextCSN value, I'll call it the CSNGenLock for now. We first read
the dense array eviction horizon, then stamp all our non-overflowed or
still in dense map subtransaction slots with special SubcommittedCSN
value, then stamp the top level xid with nextCSN. We then issue a full
memory barrier, and check the soon-to-be-evicted pointer into the
dense map. If it overlaps with any of the XIDs we just tagged then the
backend doing the eviction might have missed our update, we have to
wait for CSNEvictionLock to become free and go and restamp the
relevant XIDs in the sparse map.

To stamp a XID with the commit CSN, we compare the XID to the dense
map eviction horizon. If the XID still maps to the dense array, we do
CAS and swap out UncommittedCSN(xid) with the value that we needed. If
the CAS fails, then between when we read the dense array horizon and
the actual stamping an eviction process replaced our slot with a newer
one. If we didn't tag the slots with the XID value, then we might
accidentally stamp another transactions slot. If the XID maps to the
sparse array, we have to take the CSNEvictionLock so the sparse array
doesn't get swapped out underneath us, and then look up the slot and
stamp it and then update the CLOG before releasing the lock. Lock
contention shouldn't be a problem here as only long running
transactions map to the sparse array.

When done with the stamping we will check the global xmin value, if
it's our xid, we grab the spinlock, scan through the procarray for the
next xmin value, update and release it.

Rolling back a transaction
--------------------------

Rolling back a transaction is basically the same as committing, but
the CSN slots need to be stamped with a AbortedCSN.

Subtransactions
---------------

Because of limited size of the sparse array, we cannot keep open CSN
slots for all of the potentially unbounded number of subtransactions
there. I propose something similar to what is done currently with
PGPROC_MAX_CACHED_SUBXIDS. When we assign xids to subtransactions that
are above this limit, we tag them in the dense array with a special
OverflowedSubxidCSN value. When evicting subtransactions from dense
array, non-overflowed subtransaction slots are handled like regular
slots. We discard the overflowed slots when evicting from the dense
map. We also keep track of the lowest subxid that was overflowed and
the latest subxid that was overflowed, lowest overflowed subxid is
reset when before eviction the highest overflowed subxid is lower than
the smallest xid in the sparse array (i.e. we know that the XID region
convered by the sparse array doesn't contain any overflowed subxids).
When constructing the regular snapshot we can then detect that we
don't have the full information about subtransactions and can
correctly set the overflowed flag on the snapshot. Similarly
visibility checks can omit subxid lookups for xids missing from the
sparse array when they know that the xid can't be overflowed.

Prepared transactions
---------------------

Prepared transactions are handled basically like regular transactions,
when starting up with prepared transactions, they are inserted into
the sparse array, when they are committed they get stamped with CSNs
and become visible like usual. We just need to account for them when
sizing the sparse array.

Crash safety and checkpoints
----------------------------

Because clog updating is delayed for transactions in the dense map,
checkpoints need to flush the dense array before writing out clog.
Otherwise the datastructures are fully transient and don't need any
special treatment.

Hot standby
-----------

I haven't yet worked out how CSN based snapshots best integrate with
hot standby. As far as I can tell, we could just use the current
KnownAssignedXidsGetAndSetXmin mechanism and get regular snapshots.
But I think there is an opportunity here to get rid of most of or all
of the KnownAssignedXids mechanism if we WAL log the CSNs that are
assigned to commits (or use a side channel as proposed by Robert). The
extra write is not great, but might not be a large problem after the
WAL insertion scaling work is done. Another option would be to use the
LSN of commit record as the next CSN value. The locking in that case
requires some further thought to guarantee that commits are stamped in
the same order as WAL records are inserted without holding
WALInsertLock for too long. That seems doable by inserting commiting
backends into a queue under WALInsertLock and then have them wait for
the transaction in front of them to commit when WALInsertLock has been
released.

Serializable transactions
-------------------------

I won't pretend to be familiar with SSI code, but as far as I can tell
serializable transactions don't need any modifications to work with
the CSN based snapshot scheme. There actually already is a commit
sequence number in the SSI code that could be replaced by the actual
CSN. IIRC one of the problems with serializable transactions on hot
standby was that transaction visibility order on the standby is
different from the master. If we use CSNs for visibility on the slave
then we can actually provide identical visibility order.

Required atomic primitives
--------------------------

Besides the copious amount of memory barriers that are required for
correctness. We will need the following lockless primitives:
* 64bit atomic read
* 64bit atomic write
* 64bit CAS

Are there any supported platforms where it would be impossible to have
those? AFAIK everything from 32bit x86 through POWER and MIPS to ARM
can do it. If there are any platforms that can't handle 64bit atomics,
would it be ok to run on them with reduced concurrency/performance?

Sizing the CSN maps
-------------------

CSN maps need to sized to accomodate the number of backends.

Dense array size should be picked so that most xids commit before
being evicted from the dense map and sparse array will contain slots
necessary for either long running transactions or for long snapshots
not yet converted to XID based snapshots. I did a few quick
simulations to measure the dynamics. If we assume a power law
distribution of transaction lengths and snapshots for the full length
of transactions with no think time, then 16 slots per backend is
enough to make the probability of eviction before commit less than
1e-6 and being needed at eviction due to a snapshot about 1:10000. In
the real world very long transactions are more likely than predicted
model, but at least the normal variation is mostly buffered by this
size. 16 slots = 128bytes per backend ends up at a 12.5KB buffer for
the default value of 100 backends, or 125KB for 1000 backends.

Sparse buffer needs to be at least big enough to fit CSN slots for the
xids of all active transactions and non-overflowed subtransactions. At
the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer,
or 203KB total in the default configuration.

Performance discussion
----------------------

Taking a snapshot is extremely cheap in this scheme, I think the cost
will be mostly for publishing the csnmin and rechecking validity after
it. Taking snapshots in the shadow of another snapshot (often the case
for the proposed MVCC catalog access) will be even cheaper as we don't
have to advertise the new snapshot. The delayed XID based snapshot
construction should be unlikely, but even if it isn't the costs are
similar to GetSnapshotData, but we don't need to hold a lock.

Visibility checking will also be simpler as for the cases where the
snapshot is covered by the dense array it only requires two memory
lookups and comparisons.

The main extra work for writing transactions is the eviction process.
The amortized worst case extra work per xid is dominated by copying
the sparse buffer back and forth and spooling out the csn log. We need
to write out 16 bytes per xid and copy sparse buffer size / eviction
block size sparse buffer slots. If we evict 1/8th of dense map at each
eviction it works out as 520 bytes copied per xid assigned. About the
same ballpark as GetSnapshotData is now.

With the described scheme, long snapshots will cause the sparse buffer
to quickly fill up and then spool out until the backend wakes up,
converts its snapshots and releases the eviction process to free up
the log. It would be more efficient to be slightly more proactive and
tell them to convert the snapshots earlier so if they manage to be
timely with their conversion we can avoid writing any CSN log.

I'm not particularly pleased about the fact that both xid assignment
and committing can block behind the eviction lock. On the other hand,
plugging in some numbers, with 100 backends doing 100k xid
assignments/s the lock will be acquired 1000 times per second for less
than 100us at a time. The contention might not be bad enough to
warrant extra complexity to deal with it. If it does happen to be a
problem, then I have some ideas how to cope with it.

Having to do CSN log writes while holding a LWLock might not be the
best of ideas, to combat that we can either add one more buffer so we
can do the actual write syscall after we release CSNEvictionLock, or
we can reuse the SLRU machinery to handle this.

Overall it looks to be a big win for typical workloads. Workloads
using large amounts of subtransactions might not be as well off, but I
doubt there will be a regression.

At this point I don't see any major issues with this approach. If the
ensuing discussion doesn't find any major showstoppers then I will
start to work on a patch bit-by-bit. It might take a while though as
my free hacking time has been severely cut down since we have a small
one in the family.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Ants Aasma (#1)

Re: Proposal for CSN based snapshots

Ants,

On 06/07/2013 12:42 AM, Ants Aasma wrote:

Given the recent ideas being thrown about changing how freezing and
clog is handled and MVCC catalog access I thought I would write out
the ideas that I have had about speeding up snapshots in case there is
an interesting tie in with the current discussions.

Thanks for this write-up.

To refresh your memory the basic idea is to change visibility
determination to be based on a commit sequence number (CSN for short)
- a 8 byte number incremented on every commit representing the total
ordering of commits. To take a snapshot in this scheme you only need
to know the value of last assigned CSN, all transactions with XID less

You mean with a *CSN* less than or equal to that number? There's
certainly no point in comparing a CSN with an XID.

than or equal to that number were commited at the time of the
snapshots, everything above wasn't committed. Besides speeding up
snapshot taking, this scheme can also be a building block for
consistent snapshots in a multi-master environment with minimal
communication.

Agreed. Postgres-R uses a CommitOrderId, which is very similar in
concept, for example.

The main tricky part about this scheme is finding the CSN that was
assigned to each XIDs in face of arbitrarily long transactions and
snapshots using only a bounded amount of shared memory. The secondary
tricky part is doing this in a way that doesn't need locks for
visibility determination as that would kill any hope of a performance
gain.

Agreed. Para-phrasing, you can also say that CSNs can only ever identify
completed transactions, where XIDs can be used to identify transactions
in progress as well.

[ We cannot possibly get rid of XIDs entirely for that reason. And the
mapping between CSNs and XIDs has some additional cost. ]

We need to keep around CSN slots for all currently running
transactions and CSN slots of transactions that are concurrent with
any active CSN based snapshot (xid.csn > min(snapshot.csn)). To do
this I propose the following datastructures to do the XID-to-CSN
mapping. For most recently assigned XIDs there is a ringbuffer of
slots that contain the CSN values of the XIDs or special CSN values
for transactions that haven't completed yet, aborted transactions or
subtransactions. I call this the dense CSN map. Looking up a CSN of a
XID from the ringbuffer is just a trivial direct indexing into the
ring buffer.

For long running transactions the ringbuffer may do a full circle
before a transaction commits. Such CSN slots along with slots that are
needed for older snapshots are evicted from the dense buffer into a
sorted array of XID-CSN pairs, or the sparse mapping. For locking
purposes there are two sparse buffers, one of them being active the
other inactive, more on that later. Looking up the CSN value of a XID
that has been evicted into the sparse mapping is a matter of
performing a binary search to find the slot and reading the CSN value.

I like this idea of dense and sparse "areas". Seems like a simple
enough, yet reasonably compact representation that might work well in
practice.

Because snapshots can remain active for an unbounded amount of time
and there can be unbounded amount of active snapshots, even the sparse
mapping can fill up.

I don't think the number of active snapshots matters - after all, they
could all refer the same CSN. So that number shouldn't have any
influence on the size of the sparse mapping.

What does matter is the number of transactions referenced by such a
sparse map. You are of course correct in that this number is equally
unbounded.

To handle that case, each backend advertises its
lowest snapshot number csn_min. When values need to be evicted from
the sparse mapping, they are evicted in CSN order and written into the
CSN log - a series of CSN-XID pairs. Backends that may still be
concerned about those values are then notified that values that they
might need to use have been evicted. Backends with invalidated
snapshots can then convert their snapshots to regular list of
concurrent XIDs snapshots at their leisure.

To convert a CSN based snapshot to XID based, a backend would first
scan the shared memory structures for xids up to snapshot.xmax for
CSNs that are concurrent to the snapshot and insert the XIDs into the
snapshot, then read in the CSN log starting from snapshots CSN,
looking for xid's less than the snapshots xmax. After this the
snapshot can be handled like current snapshots are handled.

Hm, I dislike the requirement to maintain two different snapshot formats.

Also mind that snapshot conversions - however unlikely you choose to
make them - may well result in bursts as multiple processes may need to
do such a conversion, all starting at the same point in time.

Rolling back a transaction
--------------------------

Rolling back a transaction is basically the same as committing, but
the CSN slots need to be stamped with a AbortedCSN.

Is that really necessary? After all, an aborted transaction behaves
pretty much the same as a transaction in progress WRT visibility: it's
simply not visible.

Or why do you need to tell apart aborted from in-progress transactions
by CSN?

Sizing the CSN maps
-------------------

CSN maps need to sized to accomodate the number of backends.

Dense array size should be picked so that most xids commit before
being evicted from the dense map and sparse array will contain slots
necessary for either long running transactions or for long snapshots
not yet converted to XID based snapshots. I did a few quick
simulations to measure the dynamics. If we assume a power law
distribution of transaction lengths and snapshots for the full length
of transactions with no think time, then 16 slots per backend is
enough to make the probability of eviction before commit less than
1e-6 and being needed at eviction due to a snapshot about 1:10000. In
the real world very long transactions are more likely than predicted
model, but at least the normal variation is mostly buffered by this
size. 16 slots = 128bytes per backend ends up at a 12.5KB buffer for
the default value of 100 backends, or 125KB for 1000 backends.

Sounds reasonable to me.

Sparse buffer needs to be at least big enough to fit CSN slots for the
xids of all active transactions and non-overflowed subtransactions. At
the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer,
or 203KB total in the default configuration.

A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.

Performance discussion
----------------------

Taking a snapshot is extremely cheap in this scheme, I think the cost
will be mostly for publishing the csnmin and rechecking validity after
it. Taking snapshots in the shadow of another snapshot (often the case
for the proposed MVCC catalog access) will be even cheaper as we don't
have to advertise the new snapshot. The delayed XID based snapshot
construction should be unlikely, but even if it isn't the costs are
similar to GetSnapshotData, but we don't need to hold a lock.

Visibility checking will also be simpler as for the cases where the
snapshot is covered by the dense array it only requires two memory
lookups and comparisons.

Keep in mind, though, that both of these lookups are into shared memory.
Especially the dense ring buffer may well turn into a point of
contention. Or at least the cache lines holding the most recent XIDs
within that ring buffer.

Where as currently, the snapshot's xip array resides in process-local
memory. (Granted, often enough, the proc array also is a point of
contention.)

At this point I don't see any major issues with this approach. If the
ensuing discussion doesn't find any major showstoppers then I will
start to work on a patch bit-by-bit.

Bit-by-bit? Reminds me of punched cards. I don't think we accept patches
in that format. :-)

we have a small one in the family.

Congratulations on that one.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Greg Stark

stark@mit.edu

over 12 years ago

In reply to: Ants Aasma (#1)

Re: Proposal for CSN based snapshots

On Thu, Jun 6, 2013 at 11:42 PM, Ants Aasma <ants@cybertec.at> wrote:

To refresh your memory the basic idea is to change visibility
determination to be based on a commit sequence number (CSN for short)
- a 8 byte number incremented on every commit representing the total
ordering of commits

I think we would just use the LSN of the commit record which is
effectively the same but doesn't require a new counter.
I don't think this changes anything though.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ants Aasma

ants@cybertec.at

over 12 years ago

In reply to: Markus Wanner (#2)

Re: Proposal for CSN based snapshots

On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner <markus@bluegap.ch> wrote:

To refresh your memory the basic idea is to change visibility
determination to be based on a commit sequence number (CSN for short)
- a 8 byte number incremented on every commit representing the total
ordering of commits. To take a snapshot in this scheme you only need
to know the value of last assigned CSN, all transactions with XID less

You mean with a *CSN* less than or equal to that number? There's
certainly no point in comparing a CSN with an XID.

That was what I meant. I guess my coffee hadn't kicked in yet there.

than or equal to that number were commited at the time of the
snapshots, everything above wasn't committed. Besides speeding up
snapshot taking, this scheme can also be a building block for
consistent snapshots in a multi-master environment with minimal
communication.

Agreed. Postgres-R uses a CommitOrderId, which is very similar in
concept, for example.

Do you think having this snapshot scheme would be helpful for Postgres-R?

Because snapshots can remain active for an unbounded amount of time
and there can be unbounded amount of active snapshots, even the sparse
mapping can fill up.

I don't think the number of active snapshots matters - after all, they
could all refer the same CSN. So that number shouldn't have any
influence on the size of the sparse mapping.

What does matter is the number of transactions referenced by such a
sparse map. You are of course correct in that this number is equally
unbounded.

Yes, that is what I meant to say but for some reason didn't.

To handle that case, each backend advertises its
lowest snapshot number csn_min. When values need to be evicted from
the sparse mapping, they are evicted in CSN order and written into the
CSN log - a series of CSN-XID pairs. Backends that may still be
concerned about those values are then notified that values that they
might need to use have been evicted. Backends with invalidated
snapshots can then convert their snapshots to regular list of
concurrent XIDs snapshots at their leisure.

To convert a CSN based snapshot to XID based, a backend would first
scan the shared memory structures for xids up to snapshot.xmax for
CSNs that are concurrent to the snapshot and insert the XIDs into the
snapshot, then read in the CSN log starting from snapshots CSN,
looking for xid's less than the snapshots xmax. After this the
snapshot can be handled like current snapshots are handled.

Hm, I dislike the requirement to maintain two different snapshot formats.

Also mind that snapshot conversions - however unlikely you choose to
make them - may well result in bursts as multiple processes may need to
do such a conversion, all starting at the same point in time.

I agree that two snapshot formats is not great. On the other hand, the
additional logic is confined to XidInMVCCSnapshot and is reasonably
simple. If we didn't convert the snapshots we would have to keep
spooling out CSN log and look up XIDs for each visibility check. We
could add a cache for XIDs that were deemed concurrent, but that is in
effect just lazily constructing the same datastructure. The work
needed to convert is reasonably well bounded, can be done without
holding global locks and in most circumstances should only be
necessary for snapshots that are used for a long time and will
amortize the cost. I'm not worried about the bursts because the
conversion is done lockless and starting at the same point in time
leads to better cache utilization.

Rolling back a transaction
--------------------------

Rolling back a transaction is basically the same as committing, but
the CSN slots need to be stamped with a AbortedCSN.

Is that really necessary? After all, an aborted transaction behaves
pretty much the same as a transaction in progress WRT visibility: it's
simply not visible.

Or why do you need to tell apart aborted from in-progress transactions
by CSN?

I need to detect aborted transactions so they can be discared during
the eviction process, otherwise the sparse array will fill up. They
could also be filtered out by cross-referencing uncommitted slots with
the procarray. Having the abort case do some additional work to make
xid assigment cheaper looks like a good tradeoff.

Sparse buffer needs to be at least big enough to fit CSN slots for the
xids of all active transactions and non-overflowed subtransactions. At
the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer,
or 203KB total in the default configuration.

A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.

8 byte alignment for CSNs is needed for atomic if not something else.
I think the size could be cut in half by using a base value for CSNs
if we assume that no xid is active for longer than 2B transactions as
is currently the case. I didn't want to include the complication in
the first iteration, so I didn't verify if that would have any
gotchas.

Performance discussion
----------------------

Taking a snapshot is extremely cheap in this scheme, I think the cost
will be mostly for publishing the csnmin and rechecking validity after
it. Taking snapshots in the shadow of another snapshot (often the case
for the proposed MVCC catalog access) will be even cheaper as we don't
have to advertise the new snapshot. The delayed XID based snapshot
construction should be unlikely, but even if it isn't the costs are
similar to GetSnapshotData, but we don't need to hold a lock.

Visibility checking will also be simpler as for the cases where the
snapshot is covered by the dense array it only requires two memory
lookups and comparisons.

Keep in mind, though, that both of these lookups are into shared memory.
Especially the dense ring buffer may well turn into a point of
contention. Or at least the cache lines holding the most recent XIDs
within that ring buffer.

Where as currently, the snapshot's xip array resides in process-local
memory. (Granted, often enough, the proc array also is a point of
contention.)

Visibility checks are done lock free so they don't cause any
contention. The number of times each cache line can be invalidated is
bounded by 8. Overall I think actual performance tests are needed to
see if there is a problem, or perhaps if having the data shared
actually helps with cache hit rates.

At this point I don't see any major issues with this approach. If the
ensuing discussion doesn't find any major showstoppers then I will
start to work on a patch bit-by-bit.

Bit-by-bit? Reminds me of punched cards. I don't think we accept patches
in that format. :-)

we have a small one in the family.

Congratulations on that one.

Thanks,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Ants Aasma

ants@cybertec.at

over 12 years ago

In reply to: Greg Stark (#3)

Re: Proposal for CSN based snapshots

On Fri, Jun 7, 2013 at 3:47 PM, Greg Stark <stark@mit.edu> wrote:

On Thu, Jun 6, 2013 at 11:42 PM, Ants Aasma <ants@cybertec.at> wrote:

To refresh your memory the basic idea is to change visibility
determination to be based on a commit sequence number (CSN for short)
- a 8 byte number incremented on every commit representing the total
ordering of commits

I think we would just use the LSN of the commit record which is
effectively the same but doesn't require a new counter.
I don't think this changes anything though.

I briefly touched on that point in the Hot Standby section. This has
some consequences for locking in CommitTransaction, but otherwise LSN
is as good as any other monotonically increasing value.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

kgrittn@ymail.com

over 12 years ago

In reply to: Ants Aasma (#1)

Re: Proposal for CSN based snapshots

Ants Aasma <ants@cybertec.at> wrote:

Serializable transactions
-------------------------

I won't pretend to be familiar with SSI code, but as far as I can
tell serializable transactions don't need any modifications to
work with the CSN based snapshot scheme. There actually already
is a commit sequence number in the SSI code that could be
replaced by the actual CSN.

That seems quite likely to work, and may be good for performance.

IIRC one of the problems with serializable transactions on hot
standby was that transaction visibility order on the standby is
different from the master.

Pretty much. Technically what SSI does is to ensure that every
serializable transaction's view of the data is consistent with some
serial (one-at-a time) exection of serializable transactions. That
"apparent order of execution" does not always match commit order.
The problem is that without further work a hot standby query could
see a state which would not have been possible for it to see on the
master. For example, a batch is closed but a transaction has not
yet committed which is part of that batch. For an example, see:

http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

As that example demonstrates, as long as no serializable
transaction *sees* the incomplete batch with knowledge (or at least
potential knowledge) of the batch being closed, the pending
transaction affecting the batch is allowed to commit. If the
conflicting state is exposed by even a read-only query, the
transaction with the pending change to the batch is canceled.

A hot standby can't cancel the pending transaction -- at least not
without adding additional communications channels and latency. The
ideas which have been bandied about have had to do with allowing
serializable transactions on a hot standby to use snapshots which
are known to be "safe" -- that is, they cannot expose any such
state. It might be possible to identify known safe points in the
commit stream on the master and pass that information along in the
WAL stream. The down side is that the hot standby would need to
either remember the last such safe snapshot or wait for the next
one, and while these usually come up fairly quickly in most
workloads, there is no actual bounding on how long that could take.

A nicer solution, if we can manage it, is to allow snapshots on the
hot standby which are not based exclusively on the commit order,
but use the apparent order of execution from the master. It seems
non-trivial to get that right.

If we use CSNs for visibility on the slave then we can actually
provide identical visibility order.

As the above indicates, that's not really true without using
"apparent order of execution" instead of "commit order". In the
absence of serializable transactions those are always the same (I
think), but to provide a way to allow serializable transactions on
the hot standby the order would need to be subject to rearrangement
based on read-write conflicts among transactions on the master, or
snapshots which could expose serialization anomalies would need to
be prohibited.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Markus Wanner

markus@bluegap.ch

over 12 years ago

In reply to: Ants Aasma (#4)

Re: Proposal for CSN based snapshots

Ants,

the more I think about this, the more I start to like it.

On 06/07/2013 02:50 PM, Ants Aasma wrote:

On Fri, Jun 7, 2013 at 2:59 PM, Markus Wanner <markus@bluegap.ch> wrote:

Agreed. Postgres-R uses a CommitOrderId, which is very similar in
concept, for example.

Do you think having this snapshot scheme would be helpful for Postgres-R?

Yeah, it could help to reduce patch size, after a rewrite to use such a CSN.

Or why do you need to tell apart aborted from in-progress transactions
by CSN?

I need to detect aborted transactions so they can be discared during
the eviction process, otherwise the sparse array will fill up. They
could also be filtered out by cross-referencing uncommitted slots with
the procarray. Having the abort case do some additional work to make
xid assigment cheaper looks like a good tradeoff.

I see.

Sparse buffer needs to be at least big enough to fit CSN slots for the
xids of all active transactions and non-overflowed subtransactions. At
the current level PGPROC_MAX_CACHED_SUBXIDS=64, the minimum comes out
at 16 bytes * (64 + 1) slots * 100 = backends = 101.6KB per buffer,
or 203KB total in the default configuration.

A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I guess
the given 16 bytes includes alignment to 8 byte boundaries. Sounds good.

8 byte alignment for CSNs is needed for atomic if not something else.

Oh, right, atomic writes.

I think the size could be cut in half by using a base value for CSNs
if we assume that no xid is active for longer than 2B transactions as
is currently the case. I didn't want to include the complication in
the first iteration, so I didn't verify if that would have any
gotchas.

In Postgres-R, I effectively used a 32-bit order id which wraps around.

In this case, I guess adjusting the base value will get tricky. Wrapping
could probably be used as well, instead.

The number of times each cache line can be invalidated is
bounded by 8.

Hm.. good point.

Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Rajeev rastogi

rajeev.rastogi@huawei.com

almost 12 years ago

In reply to: Markus Wanner (#7)

Re: Proposal for CSN based snapshots

On 11th June 2013, Markus Wanner wrote:

Agreed. Postgres-R uses a CommitOrderId, which is very similar in
concept, for example.

Do you think having this snapshot scheme would be helpful for

Postgres-R?

Yeah, it could help to reduce patch size, after a rewrite to use such a
CSN.

Or why do you need to tell apart aborted from in-progress
transactions by CSN?

I need to detect aborted transactions so they can be discared during
the eviction process, otherwise the sparse array will fill up. They
could also be filtered out by cross-referencing uncommitted slots

with

the procarray. Having the abort case do some additional work to make
xid assigment cheaper looks like a good tradeoff.

I see.

Sparse buffer needs to be at least big enough to fit CSN slots for
the xids of all active transactions and non-overflowed
subtransactions. At the current level PGPROC_MAX_CACHED_SUBXIDS=64,
the minimum comes out at 16 bytes * (64 + 1) slots * 100 =

backends

= 101.6KB per buffer, or 203KB total in the default configuration.

A CSN is 8 bytes, the XID 4, resulting in 12 bytes per slot. So I
guess the given 16 bytes includes alignment to 8 byte boundaries.

Sounds good.

8 byte alignment for CSNs is needed for atomic if not something else.

Oh, right, atomic writes.

I think the size could be cut in half by using a base value for CSNs
if we assume that no xid is active for longer than 2B transactions as
is currently the case. I didn't want to include the complication in
the first iteration, so I didn't verify if that would have any
gotchas.

In Postgres-R, I effectively used a 32-bit order id which wraps around.

In this case, I guess adjusting the base value will get tricky.
Wrapping could probably be used as well, instead.

The number of times each cache line can be invalidated is bounded by
8.

Hm.. good point.

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.
If not then what is the reason?

Am I missing something?

Thanks and Regards,
Kumar Rajeev Rastogi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Rajeev rastogi (#8)

Re: Proposal for CSN based snapshots

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a
big project, so any help is welcome. The design I have in mind is to use
the LSN of the commit record as the CSN (as Greg Stark suggested).

Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of an
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
compared with the snapshot's LSN.

Currently, before consulting the clog for an XID's status, it is
necessary to first check if the transaction is still in progress by
scanning the proc array. To get rid of that requirement, just before
writing the commit record in the WAL, the backend will mark the clog
slot with a magic value that says "I'm just about to commit". After
writing the commit record, it is replaced with the record's actual LSN.
If a backend sees the magic value in the clog, it will wait for the
transaction to finish the insertion, and then check again to get the
real LSN. I'm thinking of just using XactLockTableWait() for that. This
mechanism makes the insertion of a commit WAL record and updating the
clog appear atomic to the rest of the system.

With this mechanism, taking a snapshot is just a matter of reading the
current WAL insertion point. There is no need to scan the proc array,
which is good. However, it probably still makes sense to record an xmin
and an xmax in SnapshotData, for performance reasons. An xmax, in
particular, will allow us to skip checking the clog for transactions
that will surely not be visible. We will no longer track the latest
completed XID or the xmin like we do today, but we can use
SharedVariableCache->nextXid as a conservative value for xmax, and keep
a cached global xmin value in shared memory, updated when convenient,
that can be just copied to the snapshot.

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy. In any case, I think we'll need a cut-off
point defined as an XID rather than an LSN for freezing purposes. In
particular, we need a cut-off XID to determine how far the pg_clog can
be truncated, and to store in relfrozenxid. So, we will still need the
concept of a global oldest xmin.

When a snapshot is just an LSN, taking a snapshot can no longer
calculate an xmin, like we currently do (there will be a snapshot LSN in
place of an xmin in the proc array). So we will need a new mechanism to
calculate the global oldest xmin. First scan the proc array to find the
oldest still in-progress XID. That - 1 will become the new oldest global
xmin, after all currently active snapshots have finished. We don't want
to sleep in GetOldestXmin(), waiting for the snapshots to finish, so we
should periodically advance a system-wide oldest xmin value, for example
whenever the walwrite process wakes up, so that when we need an
oldest-xmin value, we will always have a fairly recently calculated
value ready in shared memory.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#9)

Re: Proposal for CSN based snapshots

On 2014-05-12 16:56:51 +0300, Heikki Linnakangas wrote:

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a big
project, so any help is welcome. The design I have in mind is to use the LSN
of the commit record as the CSN (as Greg Stark suggested).

Cool.

I haven't fully thought it through but I think it should make some of
the decoding code simpler. And it should greatly simplify the hot
standby code.

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

The core of the design is to store the LSN of the commit record in pg_clog.
Currently, we only store 2 bits per transaction there, indicating if the
transaction committed or not, but the patch will expand it to 64 bits, to
store the LSN. To check the visibility of an XID in a snapshot, the XID's
commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.

We'll continue to need some of the old states? You plan to use values
that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
etc?
How do you plan to deal with subtransactions?

Currently, before consulting the clog for an XID's status, it is necessary
to first check if the transaction is still in progress by scanning the proc
array. To get rid of that requirement, just before writing the commit record
in the WAL, the backend will mark the clog slot with a magic value that says
"I'm just about to commit". After writing the commit record, it is replaced
with the record's actual LSN. If a backend sees the magic value in the clog,
it will wait for the transaction to finish the insertion, and then check
again to get the real LSN. I'm thinking of just using XactLockTableWait()
for that. This mechanism makes the insertion of a commit WAL record and
updating the clog appear atomic to the rest of the system.

So it's quite possible that clog will become more of a contention point
due to the doubled amount of writes.

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.

It'd possibly also end up being less efficient because you'd visit the
clog for potentially quite some transactions to get the LSN.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Andres Freund (#10)

Re: Proposal for CSN based snapshots

On 05/12/2014 05:41 PM, Andres Freund wrote:

I haven't fully thought it through but I think it should make some of
the decoding code simpler. And it should greatly simplify the hot
standby code.

Cool. I was worried it might conflict with the logical decoding stuff in
some fundamental way, as I'm not really familiar with it.

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a bit of
a cold feet proceeding with it. I used the page header's LSN field to
define the "epoch" of the page, but I started to feel uneasy about it. I
would be much more comfortable with an extra field in the page header,
even though that uses more disk space. And requires dealing with pg_upgrade.

The core of the design is to store the LSN of the commit record in pg_clog.
Currently, we only store 2 bits per transaction there, indicating if the
transaction committed or not, but the patch will expand it to 64 bits, to
store the LSN. To check the visibility of an XID in a snapshot, the XID's
commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.

We'll continue to need some of the old states? You plan to use values
that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
etc?

Exactly.

Using 64 bits per XID instead of just 2 will obviously require a lot
more disk space, so we might actually want to still support the old clog
format too, as an "archive" format. The clog for old transactions could
be converted to the more compact 2-bits per XID format (or even just 1 bit).

How do you plan to deal with subtransactions?

pg_subtrans will stay unchanged. We could possibly merge it with
pg_clog, reserving some 32-bit chunk of values that are not valid LSNs
to mean an uncommitted subtransaction, with the parent XID. That assumes
that you never need to look up the parent of an already-committed
subtransaction. I thought that was true at first, but I think the SSI
code looks up the parent of a committed subtransaction, to find its
predicate locks. Perhaps it could be changed, but seems best to leave it
alone for now; there will be a lot code churn anyway.

I think we can get rid of the sub-XID array in PGPROC. It's currently
used to speed up TransactionIdIsInProgress(), but with the patch it will
no longer be necessary to call TransactionIdIsInProgress() every time
you check the visibility of an XID, so it doesn't need to be so fast
anymore.

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

Currently, before consulting the clog for an XID's status, it is necessary
to first check if the transaction is still in progress by scanning the proc
array. To get rid of that requirement, just before writing the commit record
in the WAL, the backend will mark the clog slot with a magic value that says
"I'm just about to commit". After writing the commit record, it is replaced
with the record's actual LSN. If a backend sees the magic value in the clog,
it will wait for the transaction to finish the insertion, and then check
again to get the real LSN. I'm thinking of just using XactLockTableWait()
for that. This mechanism makes the insertion of a commit WAL record and
updating the clog appear atomic to the rest of the system.

So it's quite possible that clog will become more of a contention point
due to the doubled amount of writes.

Yeah. OTOH, each transaction will take more space in the clog, which
will spread the contention across more pages. And I think there are ways
to mitigate contention in clog, if it becomes a problem. We could make
the locking more fine-grained than one lock per page, use atomic 64-bit
reads/writes on platforms that support it, etc.

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.

It'd possibly also end up being less efficient because you'd visit the
clog for potentially quite some transactions to get the LSN.

True.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Andres Freund (#10)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 10:41 AM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-05-12 16:56:51 +0300, Heikki Linnakangas wrote:

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on this.

I started looking into this, and plan to work on this for 9.5. It's a big
project, so any help is welcome. The design I have in mind is to use the LSN
of the commit record as the CSN (as Greg Stark suggested).

Cool.

Yes, very cool. I remember having some concerns about using the LSN
of the commit record as the CSN. I think the biggest one was the need
to update clog with the CSN before the commit record had been written,
which your proposal to store a temporary sentinel value there until
the commit has completed might address. However, I wonder what
happens if you write the commit record and then the attempt to update
pg_clog fails. I think you'll have to PANIC, which kind of sucks. It
would be nice to pin the pg_clog page into the SLRU before writing the
commit record so that we don't have to fear needing to re-read it
afterwards, but the SLRU machinery doesn't currently have that notion.

Another thing to think about is that LSN = CSN will make things much
more difficult if we ever want to support multiple WAL streams with a
separate LSN sequence for each. Perhaps you'll say that's a pipe
dream anyway, and I agree it's probably 5 years out, but I think it
may be something we'll want eventually. With synthetic CSNs those
systems are more decoupled. OTOH, one advantage of LSN = CSN is that
the commit order as seen on the standby would always match the commit
order as seen on the master, which is currently not true, and would be
a very nice property to have.

I think we're likely to find that system performance is quite
sensitive to any latency in updating the global-xmin. One thing about
the present system is that if you take a snapshot while a very "old"
transaction is still running, you're going to use that as your
global-xmin for the entire lifetime of your transaction. It might be
possible, with some of the rejiggering you're thinking about, to
arrange things so that there are opportunities for processes to roll
forward their notion of the global-xmin, making HOT pruning more
efficient. Whether anything good happens there or not is sort of a
side issue, but we need to make sure the efficiency of HOT pruning
doesn't regress.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Ants Aasma

ants@cybertec.at

over 11 years ago

In reply to: Heikki Linnakangas (#9)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 4:56 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening on
this.

I started looking into this, and plan to work on this for 9.5. It's a big
project, so any help is welcome. The design I have in mind is to use the LSN
of the commit record as the CSN (as Greg Stark suggested).

I did do some coding work on this, but the free time I used to work on
this basically disappeared with a child in the family. I guess what I
have has bitrotted beyond recognition. However I may still have some
insight that may be of use.

From your comments I presume that you are going with the original,
simpler approach proposed by Robert to simply keep the XID-CSN map
around for ever and probe it for all visibility lookups that lie
outside of the xmin-xmax range? As opposed to the more complex hybrid
approach I proposed that keeps a short term XID-CSN map and lazily
builds conventional list-of-concurrent-XIDs snapshots for long lived
snapshots. I think that would be prudent, as the simpler approach
needs mostly the same ground work and if turns out to work well
enough, simpler is always better.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#11)

Re: Proposal for CSN based snapshots

On 2014-05-12 18:01:59 +0300, Heikki Linnakangas wrote:

On 05/12/2014 05:41 PM, Andres Freund wrote:

I haven't fully thought it through but I think it should make some of
the decoding code simpler. And it should greatly simplify the hot
standby code.

Cool. I was worried it might conflict with the logical decoding stuff in
some fundamental way, as I'm not really familiar with it.

My gut feeling is that it should be possible to make it work. I'm too
deep into the last week of a project to properly analyze it, but I am
sure we'll find a place to grab a drink and discuss it next
week.
Essentially all it needs is to be able to represent snapshots from the
past including (and that's the hard part) a snapshot from somewhere in
the midst of an xact. The latter is done by loggin cmin/cmax for all
catalog tuples and building a lookup table when looking inside an
xact. That shouldn't change much for CSN based snapshots. I think.

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a bit of a
cold feet proceeding with it. I used the page header's LSN field to define
the "epoch" of the page, but I started to feel uneasy about it.

Yea. I don't think the approach is fundamentally broken but it touches a
*lot* of arcane places... Or at least it needs to touch many and the
trick is finding them all :)

I would be
much more comfortable with an extra field in the page header, even though
that uses more disk space. And requires dealing with pg_upgrade.

Maybe we can reclaim pagesize_version and prune_xid in some way? It
seems to be prune_xid could be represented as an LSN with CSN snapshots
combined with your freezing approach and we probably don't need the last
two bytes of the lsn for that purpose...

Using 64 bits per XID instead of just 2 will obviously require a lot more
disk space, so we might actually want to still support the old clog format
too, as an "archive" format. The clog for old transactions could be
converted to the more compact 2-bits per XID format (or even just 1 bit).

Wouldn't it make more sense to have two slrus then? A SRLU with dynamic
width doesn't seem easily doable.

How do you plan to deal with subtransactions?

pg_subtrans will stay unchanged. We could possibly merge it with pg_clog,
reserving some 32-bit chunk of values that are not valid LSNs to mean an
uncommitted subtransaction, with the parent XID. That assumes that you never
need to look up the parent of an already-committed subtransaction. I thought
that was true at first, but I think the SSI code looks up the parent of a
committed subtransaction, to find its predicate locks. Perhaps it could be
changed, but seems best to leave it alone for now; there will be a lot code
churn anyway.

I think we can get rid of the sub-XID array in PGPROC. It's currently used
to speed up TransactionIdIsInProgress(), but with the patch it will no
longer be necessary to call TransactionIdIsInProgress() every time you check
the visibility of an XID, so it doesn't need to be so fast anymore.

Whether it can be removed depends on how the whole hot standby stuff is
dealt with... Also, there's some other callsites that do
TransactionIdIsInProgress() at some frequency. Just think about all the
multixact business :(

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?

So it's quite possible that clog will become more of a contention point
due to the doubled amount of writes.

Yeah. OTOH, each transaction will take more space in the clog, which will
spread the contention across more pages. And I think there are ways to
mitigate contention in clog, if it becomes a problem.

I am not opposed, more wondering if you'd thought about it.

I don't think spreading the contention works very well with the current
implementation of slru.c. It's already very prone to throwing away the
wrong page. Widening it will just make that worse.

We could make the
locking more fine-grained than one lock per page, use atomic 64-bit
reads/writes on platforms that support it, etc.

We *really* need an atomics abstraction layer... There's more and more
stuff coming that needs it.

This is going to be a *large* patch.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Ants Aasma

ants@cybertec.at

over 11 years ago

In reply to: Robert Haas (#12)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 6:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:

However, I wonder what
happens if you write the commit record and then the attempt to update
pg_clog fails. I think you'll have to PANIC, which kind of sucks.

CLOG IO error while commiting is already a PANIC, SimpleLruReadPage()
does SlruReportIOError(), which in turn does ereport(ERROR), while
inside a critical section initiated in RecordTransactionCommit().

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Greg Stark

stark@mit.edu

over 11 years ago

In reply to: Heikki Linnakangas (#9)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 2:56 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Currently, before consulting the clog for an XID's status, it is necessary
to first check if the transaction is still in progress by scanning the proc
array. To get rid of that requirement, just before writing the commit record
in the WAL, the backend will mark the clog slot with a magic value that says
"I'm just about to commit". After writing the commit record, it is replaced
with the record's actual LSN. If a backend sees the magic value in the clog,
it will wait for the transaction to finish the insertion, and then check
again to get the real LSN. I'm thinking of just using XactLockTableWait()
for that. This mechanism makes the insertion of a commit WAL record and
updating the clog appear atomic to the rest of the system.

Would it be useful to store the current WAL insertion point along with
the "about to commit" flag so it's effectively a promise that this
transaction will commit no earlier than XXX. That should allow most
transactions to decide if those records are visible or not unless
they're very recent transactions which started in that short window
while the committing transaction was in the process of committing.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Andres Freund (#14)

Re: Proposal for CSN based snapshots

On 05/12/2014 06:26 PM, Andres Freund wrote:

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?

Ah, no. Even today, a subxid isn't marked as sub-committed, until you
commit the top-level transaction. The sub-commit state is a very
transient state during the commit process, used to make the commit of
the sub-transactions and the top-level transaction appear atomic. The
commit-in-progress state would be a similarly short-lived state. You
mark the subxids and the top xid as commit-in-progress just before the
XLogInsert() of the commit record, and you replace them with the real
LSNs right after XLogInsert().

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#17)

Re: Proposal for CSN based snapshots

On 2014-05-12 19:14:55 +0300, Heikki Linnakangas wrote:

On 05/12/2014 06:26 PM, Andres Freund wrote:

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?

Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit
the top-level transaction. The sub-commit state is a very transient state
during the commit process, used to make the commit of the sub-transactions
and the top-level transaction appear atomic. The commit-in-progress state
would be a similarly short-lived state. You mark the subxids and the top xid
as commit-in-progress just before the XLogInsert() of the commit record, and
you replace them with the real LSNs right after XLogInsert().

Ah, right. Forgot that detail...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Ants Aasma

ants@cybertec.at

over 11 years ago

In reply to: Greg Stark (#16)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 7:10 PM, Greg Stark <stark@mit.edu> wrote:

Would it be useful to store the current WAL insertion point along with
the "about to commit" flag so it's effectively a promise that this
transaction will commit no earlier than XXX. That should allow most
transactions to decide if those records are visible or not unless
they're very recent transactions which started in that short window
while the committing transaction was in the process of committing.

I don't believe this is worth the complexity. The contention window is
extremely short here.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Rajeev rastogi

rajeev.rastogi@huawei.com

over 11 years ago

In reply to: Heikki Linnakangas (#9)

Re: Proposal for CSN based snapshots

On 12 May 2014 19:27, Heikki Linnakangas Wrote:

On 01/24/2014 02:10 PM, Rajeev rastogi wrote:

We are also planning to implement CSN based snapshot.
So I am curious to know whether any further development is happening

on this.

I started looking into this, and plan to work on this for 9.5. It's a
big project, so any help is welcome. The design I have in mind is to
use the LSN of the commit record as the CSN (as Greg Stark suggested).

Great !

Some problems and solutions I have been thinking of:

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of an
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
compared with the snapshot's LSN.

Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit
LSN for every visibility check?

With this mechanism, taking a snapshot is just a matter of reading the
current WAL insertion point. There is no need to scan the proc array,
which is good. However, it probably still makes sense to record an xmin
and an xmax in SnapshotData, for performance reasons. An xmax, in
particular, will allow us to skip checking the clog for transactions
that will surely not be visible. We will no longer track the latest
completed XID or the xmin like we do today, but we can use
SharedVariableCache->nextXid as a conservative value for xmax, and keep
a cached global xmin value in shared memory, updated when convenient,
that can be just copied to the snapshot.

I think we can update xmin, whenever transaction with its XID equal
to xmin gets committed (i.e. in ProcArrayEndTransaction).

Thanks and Regards,
Kumar Rajeev Rastogi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Heikki Linnakangas (#9)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 7:26 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.

To accomplish this won't XID-CSN map table be required and how will
it be maintained (means when to clear and add a entry to that map table)?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Amit Kapila (#21)

Re: Proposal for CSN based snapshots

On 05/13/2014 09:44 AM, Amit Kapila wrote:

On Mon, May 12, 2014 at 7:26 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

In theory, we could use a snapshot LSN as the cutoff-point for
HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
that makes me feel uneasy.

To accomplish this won't XID-CSN map table be required and how will
it be maintained (means when to clear and add a entry to that map table)?

Not sure I understand. The clog is a mapping from XID to CSN. What
vacuum needs to know is whether the xmin and/or xmax is visible to
everyone (and whether they committed or aborted). To determine that, it
needs the oldest still active snapshot LSN. That can be found by
scanning the proc array. It's pretty much the same as a regular MVCC
visibility check, but using the oldest still-active snapshot.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Rajeev rastogi (#20)

Re: Proposal for CSN based snapshots

On 05/13/2014 08:08 AM, Rajeev rastogi wrote:

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of an
XID in a snapshot, the XID's commit LSN is looked up in pg_clog, and
compared with the snapshot's LSN.

Isn't it will be bit in-efficient to look in to pg_clog to read XID's commit
LSN for every visibility check?

Maybe. If no hint bit is set on the tuple, you have to check the clog
anyway to determine if the tuple is committed. And if for XIDs older
than xmin or newer than xmax, you don't need to check pg_clog. But it's
true that for tuples with hint bit set, and xmin < XID < xmax, you have
to check the pg_clog in the new system, when currently you only need to
do a binary search of the local array in the snapshot. My gut feeling is
that it won't be significantly slower in practice. If it becomes a
problem, some rearrangement pg_clog code might help, or you could build
a cache of XID->CSN mappings that you've alread looked up in
SnapshotData. So I don't think that's going to be a show-stopper.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Heikki Linnakangas (#22)

Re: Proposal for CSN based snapshots

On Tue, May 13, 2014 at 1:59 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

On 05/13/2014 09:44 AM, Amit Kapila wrote:

To accomplish this won't XID-CSN map table be required and how will
it be maintained (means when to clear and add a entry to that map table)?

Not sure I understand. The clog is a mapping from XID to CSN.

The case I was referring is for xmin < XID < xmax, which you have
mentioned below that you are planing to directly refer pg_clog. This
is I think one of the main place where new design can have impact
on performance, but as you said it is better to first do the implementation
based on pg_clog rather than directly jumping to optimize by maintaining
XID to CSN mapping.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Rajeev rastogi

rajeev.rastogi@huawei.com

over 11 years ago

In reply to: Heikki Linnakangas (#23)

Re: Proposal for CSN based snapshots

On 13 May 2014 14:06, Heikki Linnakangas

The core of the design is to store the LSN of the commit record in
pg_clog. Currently, we only store 2 bits per transaction there,
indicating if the transaction committed or not, but the patch will
expand it to 64 bits, to store the LSN. To check the visibility of
an XID in a snapshot, the XID's commit LSN is looked up in pg_clog,
and compared with the snapshot's LSN.

Isn't it will be bit in-efficient to look in to pg_clog to read XID's
commit LSN for every visibility check?

Maybe. If no hint bit is set on the tuple, you have to check the clog
anyway to determine if the tuple is committed. And if for XIDs older
than xmin or newer than xmax, you don't need to check pg_clog. But it's
true that for tuples with hint bit set, and xmin < XID < xmax, you have
to check the pg_clog in the new system, when currently you only need to
do a binary search of the local array in the snapshot. My gut feeling
is that it won't be significantly slower in practice. If it becomes a
problem, some rearrangement pg_clog code might help, or you could build
a cache of XID->CSN mappings that you've alread looked up in
SnapshotData. So I don't think that's going to be a show-stopper.

Yes definitely it should not be not show-stopper. This can be optimized later by method
as you mentioned and also by some cut-off technique based on which we can
decide that a XID beyond a certain range will be always visible, and thereby
avoiding look-up in pg_clog.

Thanks and Regards,
Kumar Rajeev Rastogi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Heikki Linnakangas (#11)

Re: Proposal for CSN based snapshots

On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a
bit of a cold feet proceeding with it. I used the page header's LSN
field to define the "epoch" of the page, but I started to feel
uneasy about it. I would be much more comfortable with an extra
field in the page header, even though that uses more disk space. And
requires dealing with pg_upgrade.

FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
pg_upgrade issue anyway.

I am not excited about a 32x increase in clog size, especially since we
already do freezing at 200M transactions to allow for more aggressive
clog trimming. Extrapolating that out, it means we would freeze every
6.25M transactions.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Bruce Momjian (#26)

Re: Proposal for CSN based snapshots

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a
bit of a cold feet proceeding with it. I used the page header's LSN
field to define the "epoch" of the page, but I started to feel
uneasy about it. I would be much more comfortable with an extra
field in the page header, even though that uses more disk space. And
requires dealing with pg_upgrade.

FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
pg_upgrade issue anyway.

I am not excited about a 32x increase in clog size, especially since we
already do freezing at 200M transactions to allow for more aggressive
clog trimming. Extrapolating that out, it means we would freeze every
6.25M transactions.

It seems better to allow clog to grow larger than to force
more-frequent freezing.

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#27)

Re: Proposal for CSN based snapshots

On 2014-05-15 15:40:06 -0400, Robert Haas wrote:

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, May 12, 2014 at 06:01:59PM +0300, Heikki Linnakangas wrote:

Some of the stuff in here will be influence whether your freezing
replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a
bit of a cold feet proceeding with it. I used the page header's LSN
field to define the "epoch" of the page, but I started to feel
uneasy about it. I would be much more comfortable with an extra
field in the page header, even though that uses more disk space. And
requires dealing with pg_upgrade.

FYI, pg_upgrade copies pg_clog from the old cluster, so there will be a
pg_upgrade issue anyway.

I am not excited about a 32x increase in clog size, especially since we
already do freezing at 200M transactions to allow for more aggressive
clog trimming. Extrapolating that out, it means we would freeze every
6.25M transactions.

The default setting imo is far too low for a database of any relevant
activity. If I had the stomach for the fight around it I'd suggest
increasing it significantly by default. People with small databases
won't be hurt significantly because they simply don't have that many
transactions and autovacuum will get around to cleanup long before
normally.

It seems better to allow clog to grow larger than to force
more-frequent freezing.

Yes.

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Andres Freund (#28)

Re: Proposal for CSN based snapshots

On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Yes, I like the idea of storing the CSN separately. One reason the
2-bit clog is so good is that we know we have atomic 1-byte writes on
all platforms. Can we assume atomic 64-bit writes?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Bruce Momjian (#29)

Re: Proposal for CSN based snapshots

On 2014-05-15 16:13:49 -0400, Bruce Momjian wrote:

On Thu, May 15, 2014 at 10:06:32PM +0200, Andres Freund wrote:

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Yes, I like the idea of storing the CSN separately. One reason the
2-bit clog is so good is that we know we have atomic 1-byte writes on
all platforms.

I don't think we rely on that anywhere. And in fact we don't have the
ability to do so for arbitrary bytes - lots of platforms can do that
only on specifically aligned bytes.

We rely on being able to atomically (as in either before/after no torn
value) write/read TransactionIds, but that's it I think?

Can we assume atomic 64-bit writes?

Not on 32bit platforms.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Andres Freund (#28)

Re: Proposal for CSN based snapshots

Andres Freund wrote:

On 2014-05-15 15:40:06 -0400, Robert Haas wrote:

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Err, we're proposing a patch to add timestamps to each commit,
/messages/by-id/20131022221600.GE4987@eldon.alvh.no-ip.org
which does so in precisely this way.

The idea that pg_csn or pg_committs can be truncated much earlier than
pg_clog has its merit, no doubt. If we can make sure that the atomicity
is sane, +1 from me.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Alvaro Herrera (#31)

Re: Proposal for CSN based snapshots

On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-05-15 15:40:06 -0400, Robert Haas wrote:

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Err, we're proposing a patch to add timestamps to each commit,
/messages/by-id/20131022221600.GE4987@eldon.alvh.no-ip.org
which does so in precisely this way.

I am not sure where my statements above conflict with committs?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Andres Freund (#32)

Re: Proposal for CSN based snapshots

Andres Freund wrote:

On 2014-05-15 17:37:14 -0400, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-05-15 15:40:06 -0400, Robert Haas wrote:

On Thu, May 15, 2014 at 2:34 PM, Bruce Momjian <bruce@momjian.us> wrote:

If the larger clog size is a show-stopper (and I'm not sure I have an
intelligent opinion on that just yet), one way to get around the
problem would be to summarize CLOG entries after-the-fact. Once an
XID precedes the xmin of every snapshot, we don't need to know the
commit LSN any more. So we could read the old pg_clog files and write
new summary files. Since we don't need to care about subcommitted
transactions either, we could get by with just 1 bit per transaction,
1 = committed, 0 = aborted. Once we've written and fsync'd the
summary files, we could throw away the original files. That might
leave us with a smaller pg_clog than what we have today.

I think the easiest way for now would be to have pg_clog with the same
format as today and a rangewise much smaller pg_csn storing the lsns
that are needed. That'll leave us with pg_upgrade'ability without
needing to rewrite pg_clog during the upgrade.

Err, we're proposing a patch to add timestamps to each commit,
/messages/by-id/20131022221600.GE4987@eldon.alvh.no-ip.org
which does so in precisely this way.

I am not sure where my statements above conflict with committs?

I didn't say it did ...

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Alvaro Herrera (#31)

1 attachment(s)

Re: Proposal for CSN based snapshots

So, here's a first version of the patch. Still very much WIP.

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to
other backends before the commit WAL record is flushed. With CSN-based
snapshots, the order that transactions become visible is always based on
the LSNs of the WAL records. This is a problem when there is a mix of
synchronous and asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and
transaction B commits asynchronously with commit LSN 2, B cannot become
visible before A. And we cannot acknowledge B as committed to the client
until it's visible to other transactions. That means that B will have to
wait for A's commit record to be flushed to disk, before it can return,
even though it was an asynchronous commit.

I personally think that's annoying, but we can live with it. The most
common usage of synchronous_commit=off is to run a lot of transactions
in that mode, setting it in postgresql.conf. And it wouldn't completely
defeat the purpose of mixing synchronous and asynchronous commits
either: an asynchronous commit still only needs to wait for any
already-logged synchronous commits to be flushed to disk, not the commit
record of the asynchronous transaction itself.

Ants' original design with a separate commit-sequence-number that's
different from the commit LSN would not have this problem, because that
would allow the commits to become visible to others in out-of-WAL-order.
However, the WAL order == commit order is a nice and simple property,
with other advantages.

Some bigger TODO items:

* Logical decoding is broken. I hacked on it enough that it looks
roughly sane and it compiles, but didn't spend more time to debug.

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to
avoid bloating the clog.

* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more
heavily than before, as you need to check the clog even if the hint bits
are set, if the XID falls between xmin and xmax of the snapshot.

* Transactions currently become visible immediately when a WAL record is
inserted, before it's flushed. That's wrong, but shouldn't be difficult
to fix (except for the async commit issue explained above).

- Heikki

Attachments:

csn-1.patch.gzapplication/gzip; name=csn-1.patch.gzDownload

����Scsn-1.patch�<kS���������
��0� �@���dkk�%Km�B�<��a�}��[j��q���*���Ow��Kv��X�j�n,���q������#��q��F���p��b���-�w�����~��������m�j6{��V�V�d��J���j�������}Q��py{~q~z'�]�������K1����b9��{���a(g|���F^��Cy�(�7��b�W�[��+����b"��E[�<��o#��0��<������F8s[Z#O
;��S?��-�?\d�[SY3o~��U�?���`eu�&����;{�`�������������8���F2VS�a��q��A>,B+y�����#��E� .Z&7�T<�X<?5��F=�^���g��"��C/�l7M�"���#�EAG�T���P:O��(��|�v#
����k 	��
�������5rl�[�;��~w��r�����+"����{�VST�[�`a���!�A(��� ��W'w��G���9$�5h������x��C<q4��>�O������5�#�V�[&�u�����`k|��f��j��z�zh�.AC<��X��0�+�7s��gz�Q#�]��`�<�iy�GZ���U>�Y!�-�>�s'v�	��FU�(�,
�1C0#t��[�&�q]�*����C��,VE���-c���B����Z,�_
�u���(�w�Mf:}%�K���?������4,�6&���kZ�Ab_�t����;m�^�����������vxq�L�	RR��}��Z�����)~�Q��P��"VI(x6�Vz��D-������H��;h��������l#���G��X��A,��P�h�:Uq#����KU����g��(�cQ�<\����������
���U�P��J'Q$��|����D�Y���o�
���,�X�P%5����Y���=|�OG�m���x�Nm-��7��D)������u5�|�p���Xh�����BX���[��Q�U�Z�����|��&����b�����J�����Z��G���|��At;�==���D��
�)C3���H*��w�$�I����l>�/����?�V��e���b�j��mI�N���� Zs��^2S�,��bD��' /��=I���S"_dW���'�T�{���[mf��~F>[�#o�G�AR/a��Xl����\�S�n�u�a���<~@�[�6����w����W���F��m��7H��
��S
�ke�hS��tl9A�"3v�XU�Pn����O�7W?����"o��h�*9�8������F��������,R���XH@�tD0��:����n���)��������!nw8�^\��r~6��t�_e��I�j�������i0��q�Vb���uzuy9��;?+@W��H�	J�����1�D�}����d�(���
+RcB4��;��^��A)�O�/
�P�Np��B8�>.'T2�Y4'�@�T��"s	���Qxb������R���h�D(_�Z��4�U����sN��$:"���{�y;�
;�_�lW^;J��2���H���������N�g�D�H$��
F�K,�����]�>�����s���U��y�	�#g�TE�+t�5��N��:/���r#|�H��pA4.�aK8:yu�����,������(�;�F@���N�$���8)b���`��*�n�.��WU����������`����4�_�����<����f�V4���������C�)!P����-��-3J�:}����r�>�R�!
R6��n�{�������%F���r$�eh@�C�	�'���Y�C�dC
}�������u�����#��;�����o'�3����10�j�Lsf�C�JBg��%�k���
����f����dpz��w#�9����6:�)hv���T�����?u�0JI�JK�
�2&���e���hqL�������x�9����$�$��)�?X(�kw��&"���/��zb��#	�-�
a����e��I���z�!�lL���K��O3�'�\G�����������z���"�D��1��e&��/f��m^�+����VOsI���P��<T�G��[�b"�{<I�8�7;�"S�f���=*�#:��N�qJ����|	��H����$������T���D��MmT�`R@M�S`ZX�Ja�F�����qV��*��D��eY6���1�)R���3���EX����^o��m���)��J#������(�X�H��_�/��g�'g	*_2�3V�)+$����1E�a�k��j�$��a*�]Li�g�����������d��T3�E��\p�@+2�+^�Z%��&������D[� H~����O�g��5�����B�bb�k��m��Y*q`����2,.�V����*�<��>C�VD�g�P�90���2:�^*��Wg^���5����q�8c�d(�`��o��6��ZNq���!�:��G�@*%,X�u}��Xtz���qU���X���et�o��dd���`J[,�x"~������$�?��UK�i�Lr�<Z�-/,n�4x�X�rn��)��t�J�R{^`k��V�TIV�r��H�:sk�
�������$��w"��9�2����"T���Yy j�T"O�I�z���2��>�����q���[���W����.Y"GU<=Z6�0�p�)�CxL&�@8�u/�<��e����x��OW7�'����{���W�����!l)��z��(&L���i7���+P�u�07�#���=b��"aJ�������g��H�?������M�c�
F�=;��'���C�6R���f�wO��U��LR�M�Xe�����2��X(�Q���@���AsM�4��if����H%��5�&���71)����R�d�pS�Y�*`�]���b!}�iT�g���0��!�ht:�VE�s�_�_4��#�~-8P5�N����'4�=i�������G)�j8�[f���>@(�	���OI���.09C2h�Uy����'���Bkn����yf���e�34���W�.0��wX=��`��j�xu�H����=����''���D)�u��Wc��)�z���w@�_����2���zia�]����=��]Pz|~�7�g@��L5���8�Q�����K9y���x�7�J�$����1���b$����K��j��3d��OG!������������1DU��SW���6�q��N�(�?-��V������"����*����Q<��h`�J���w��m�c��?�2b72
�G�x���k>�~�Zk��j���a�HT��ZB�<L�1�Orb�s��L��N�X�j��.��H�'7������h�)����"Vo��~��2���XR	v���ga,��$nc�����NO��r�9�Ed{�n��?b0fPK�F�VVE������`��V���3�[2r�i=w��!�w���Z���\��;��LF�]*2�F�b�p�K���a��z����^w�^�d{��`��1���1s$uG����n�����k�B����k���c�!w3�����
����V�"�Yah=�&��������.=SPx�3�����D���ln�T;r�%����CA��<�!���B�1A�
6�${3�?�z���&�K��'#�o ��17mdP�yE��_���|���C0uE���c@�����@��1�1���i��9t
���lX��'3[Y3}���j/�Q�������.���I�����\����Lj�+o����8Ka��t�=9�����1T��/��Zqh���'L�"g!�����:f����n�l�/�.(O�P�� ,���{F�%����1�<�k�U*�~�M.QB��TD�!�?\��Y�Yb�w;���#|>�@��W-9Q�v2���n1�VF������U�j�#*�����M���[���E�����g*V'`PX�:�c��������i���G�[��;�0m�As�QR��\�r�I�`�)��g(�y�*h��z���w�����r�^�IgQ�6�	6�by���D���^��MAPQ�_`���N�l����-Y��X��h�-P�Z��H~ry�c�IJ���7��o��AH}�1�I��Y4��a�-i~�}[�lL%h_`V����#�d6R$e]�7(l��Z+\�*L�jL,���a����|���Bq������s��� �
�J�q����,[��GB)���i������=T'
�p )wC��A��V�7�Bm]��Ts�.�]��1�2���2j��<&�)���2m����-oSUA7���cwy��a�_�������uEI���p'��4�(�������@M���}=�D5:���U���E�`��'��>�=j�\�+*}�.�8����L��c�u�<C�pC�M�;�F��A�P-~�I����.��R��`=�h���
��q�ZO��'S�8�+���d|A�g�z(M�����
|�{��}��*��\�S�i�E^����>`"�	N�'��P�Gj-�o������_s��cp������\x#Rz=a���eO}Y�e\p�5K��y;�s�����MZ����Oh���	���4Q��nMI��Y1R�us�c"n�'�2+�JCw�*��d
�CUp@B����?���!���Jx�8�C�81T=;�|6�h82X1�R�A����
s�P��^][�������q�4b�!HE~����
�I��<*��8�������M�\4��
���I ����������}�L��:�R���G*������1U���YJ�R2�.���kO$�@�x��@lq�����B�&p��-X9,7��/l�	��+!���e�-��������0�KfnB	��/�������������c�:O����dz��3|�r�,IQN���a�35�iS
R�J�n����+&��
�%\2�T�L������o�e��q����q���$.�M	��nlO�����^��k��6Y�����i�C�r�������UGSb�x�Lk��L�O��7��f��nP�q�MXnG��7�����|/�E2�Te��?���z�`�j���UF����F�c�����J���W/�osz��=+�*wGa��Uc��8� 6�o3��]���`�_�@��\�"�%��^�g��a-���~/R�M�[��p��=��MQg�-����Qz5��5
����8$��8��
�"?r3X$�B7P���5������;�6�0�Y)��/���Xp�e��+���S�*����<,{b�Lo
/�����G>����~�\;%PU���|�N�="�����X��U).���-M�F0���&���T
#�>��BP*���PX�3d������d"�
��sK�r��r��$�0m�����m��:��R1^�Q�/�f��s���`�C���F��9}#S�*��5�3 �_�Ycd��~������G7rG����5��M�����s�:��,�)���"��TWw�&���TP���%+P�3:B!r�������>KU��n���W]�����1o�hfI��G:��])���	� ���5T�����)>"	��2���qt�����������$I�=DS��M0���*��(
��&4���eA���6A�"R��t�5����#M��wiI�$,�t��z�-3^c����i�b���P/;�n��$���U�g���&���&�����Q������g���v��
�����>,��G��}�?��XL���'c�6�n�F���[V�����9*
�n�75F�kt�o�K9���hS����4V������i�W�ym����qD��B�Td?k��jL�y�C�<Jc�aN�����d�P[P���c�H�n&t���DL��i�]4�n*5A�A&9�!�����O�`��/�3��O.H���(�0�H�DE�(8x�C���bJv��
?��eY�r��I��� ��_��+��#��<���/{��D������"�}d4�y��~�qQ�
6�������IK);M���]�u>���b�A���g��}������k�����ct��-�P����tG�f��� �JO�I�%�2�TfD,�o��P�4F�����M�tp~|t|�����h@"U��$[�vX���|_i}q/�4����/���I:Q���0��m���Y8���V�HG���)�b���Y/�{I���X�e@M�<3���R��>�w
qdF!B��&�G�8��s�^�G���U�}�{s
����jo�R�U�C������/G����@�hw��bJ ����*d�|�
����c��*����36�P5�TO�e�8�PB�����<���-I��������+�-�<���v�e���`^�GL�~��Y�<g#*s��I'\C�0���)0���cl8��Sw��E��X��$[����v# ��a�%6��dt�Ew��8�5
������CH�Q|e�LD���$���/2!����Q�(�c~���Tnk��l�W2p�O���'�b��
Y}�������X�LNH����r��p6�����_T�sv69>gg����_pA��z����m">YvY(�#������ye��{E�"�����M�1�9�K���Z����Z�������BGf6���2�Hls;[���n����\n���B�\��J�\�!�O;�h�?��m�*���)��v�O��FD���SW���8:<m�r|�������m�����������x`����_#Y_y�����\,��-\�H(���\�hk�S^��}4�P���",$�!%o>97�����
	���o�o���v[��������@�a'��6c����������W�����5����cye`V�}t	�F���H��KC�>/�����3����4s�L��m�as��
d��_F��5I��g�����Z'��pt�C]�w��}]�dr5(�8�Z��������M<��R-�|�[3�wL+�^q�V�P�;�8������Q��6�)��HA���{�2�d#s��4�}73���F�`?�J��6�6wW-���������������V#�2����������5�c��;2��-S��/{����`��V�{���A���m���/�=��E'��$�>����Mm8!�U��H�<Z���%�0z�`k_'{�5c�aB9�b��u YiG.J�>��}�M`�	�Q�-vV�g�)��u���F;3Hw��M�z3���+����n$='R�*�/����~@����U������p��BM��b�l?5j;7�c/1������0@#�QV���x�p7��+�|@n)���)h��%��R�+z!�*�[���h��a4k�X%g��6���o�k����\��]�f��F�������Y~���{;?��2*q�|��d��������q����2��L����GOl��f��&�s�7W���/�We5�����Sr�DR��P�weK�k�C������`�F�Br(�y3��F=���n46����;-���}�f<}�'%��A2.b��85��k�b��a'��?���k@���!����5����������0oI���S����c��K��x�����U��[z��K\��a��G�� )���H������+%"��d6���a2
���Y_"���$�R�!�E�\���)��A�2�4�N���;�����B�����K�L������#:F���!!cM�zi��a���0�x�%>��r�]�G��V�EM�|�4,t�������f�x��C��Jp:����b��-!��[b�{�e��.�}b]�9�A��
f�E�0w!8��z�r\{�A+A��]
�������5�a�n�5M����v��E�O5��^�N�-0~[PSI's#�0v��\b@/�'�������wX���	@,J���`�y�NE�w[��9�s��zC���������d	H�*�t�h��h}WF���l6H$�����H/q�|���\��b��#\�j��}��p��y��Fse&���j�J�t=u=��u��(e_����=�����=@V��2
Z+B��Cp���KS.����.}���*w-���PC��� k8�*Z�Qo��7���v#d-��W�M�h"��f�&��I7�:i�_�l%
�����w7vw[�������������*-e�9%t��mi�l���v1�6 %��1���Y_r�6"d���.�L���4�2�����{S����Z>�'�q����,2��� 8��=�����^�?=����>�(���;���Dv/�����N���zxKO��������\��C�D/^D�U������UD @�o�^����l�|~|����w���yO�J�}��3��n�<��r�bn<�4 ����^�
������\��|#[�4�����U�V�|�w:{�N�����w���<�
{[��������D���dxC+�%�h[L�m
},�!���o�j:3
�*h�,�����>5{7&�|�.�s~�'�������(d>�\)=$',�����CX���m�(�/����9g�Y�c7��{��r?<��UrL���O~�A����U��	�G���k�u��;��f�K�}�B�?�Y+0�f���M�G�����>B��R��5�������n���lu����]�e]->9^c:6�;d"�O���ZDtN��I���&���K������]��,���!�o�=�&����3l ���n�[WP�P:r[���cBm 6�-v������_olP��������<o�x���}3SyG��������U,�|X�(�p>�o^\��>}{|�����m�k���\��7��(�`x��.P�j�#*��lz(�,����)A���<3����O�)�V��S��&]��~���
8
0�G�3������6�b+�&o��f��01{,=4rRx;���/���H|8�����Sag��X6�2�>~�g����iU��x{w�������=�^�-�����������+(W�=" j������W��J��e7[c��t�f�0/�y�Hve��,�^�_������s�p6z�`�6�[bz|�`��T7�F_
�W�u2=���V�]3t �Ca�2�&j�����i��t_
N����'��o�??�|:~�~�����h�RzgnLH�F�������k.���Fg�\*��>�K����q2�vc{���-b�~���)�E����
u;dk�Bs�\u�>@�F�Q����H�����.0ehBp'6w_"=��LF�>���6���i��S���cpxq~YI����_�%l�[mj���Y�����5U:�����F�����������W�NpE����\R���$����;�H(�F��=��,#pU���Db���*5��f��5�a��AH����25����8�F����9�fjsBxr�NP�>�-$�e�������x�l�>�1u����7�2%��e�0���g�Y/��~�(�`����wWH���a��#��&����k-��Uc���k#�g�@���m�-�j�DG�b��frUA��������������@L3$�>uQ��

���7��8.�g��V/�����fGG@�s2��p2������	�{���|��I��]���L�7I2Q��,��V�8��U�>���O^y�#����2�L�xD�ZJJ�]mr�����&�,��&�FM;�i����	�Lj8���;���D�5�|E��8H;���

@y�y[4��,��O�j]7Xo��ac�d�w,�i�Ob({����g0E'�q�915��t�� ��fDX�����2�����/��d�b$_g�4�	����K�T����5�Hx�]Y�`�5�?L���!V�����h+u�����W
V��y���*3��&��`<�vJ�*�a�!9�k�jf��Y
hH����o�]��Ci���(>��9q���a��4]MF��r�h� k��?���,�����Du�����H��$��N��^��S���v]o��GH{����*����)�Y����U�G�����t���c	�"R�����`q����99��H��2~�M���0Hl�/�w]H���h4��<���Yg�e��^���.-d�`�1>
3u~����#�����=oFWWDd�j�\r_�a��������G���3�� �
.8<9��0�����[�1���".4��&�%�x)��V��*�.O:��]��_���^5��OP{�����Bi5l���e?=�1��{H��MX[�+����������c+�������l��@��&M�hN-�~�A���D�(_Y�g����G��P��6dy�>r�{Op������9���w�8di�P(3e�P��o��Z����r���W�O��z��8�0���[cc5!�"<e���c6t����
�7��� ���T4)*��:�E7�6�|��4{���c��.!Fp&]W��Fh���k�6V�����qf��
�hxO9.�g�	�����a%�\|q�J#rU�������Vt"���;IW��7���y��5��y��6g�	�1^��Ah�*td�(�	Gis 0��ec�(z����������Y ��O�!SE@J���-s�{���y�����fq��j]�T&������]�������NM��`/��pgT$�;���D��c8%w]Lwi�)1,�����aS3���z�����dF$C��$�z�czG�@����@��l��lf�o���l���|}���S�4	����_3QB����&t��$,hB�0I�h!�V������;�Rt���	)�~�N�?�\d�!aX���S���%&�*�GJ�S��Cu�Vq����L�ei�kA�aU�����[kV��=TT�����3"���zc{+�o��469Yg��e�1C,�q�yL�����jr%	�fV�'��i�2����`��'Md�������� �1��Y��iX�v*��8r�\�S�p
@���n1��B���r'�k�K��b��X�HCY���f99��aX�
�S�gA9��3b"z1��1j�=�����sh�2�X��{�K�Ef����k�4�����E�6���a��#���~O&#N����L��(����Zz��lu�+Y���W�U���h�S�s�I?����E�W^G���)���5����P#3K�)HR�P�!s��t��������T$n2N��9Y�.B���O��l$P�QA���M�S��K��E�$!G�G����76;���&����<g������4jZ�R������G���0�����B�����O_�59bnl���c�\���\�����xNw�u"���$u��
x+lj�
c�5�$r��1�b�"��P	���q+����24^�V���yM��T/&d���,x���(�9���I�����l��u�b�l�p��A�&��T����J4�Q�5��[�� ��k��
����E�N�-F�I�2�oD����HT�Y���	3�tO]��D�H(��
�|�|�R�*3��

���������E�_����r��~1����X�&�y�j����j��?�$V3
,�=�$:Q�b�����R9�3�8;�g�0~I���\m�YI�F��rF�O���w��/�u�����&�>��fM�G�G'5/�y���7^ur��0��R �8���1�b�kR=0�p��J30�o:�#�(8�f�=�8�6|Loz3
�^r������+0����SO:�]�����E��j����a�����`�M�:f��K�d��'��I�,&��
�5Ap9���,2�Pl1C�v�'��h����2E�V�[#�"B�����d�6�wY��	j
���a8�zS9������e3���}e�GyY{�9��t��&#������/���xB���T#
�|�V'^A��}4eBV���01+�&M����N�ZB�Pz�:<�Te�����B�
����@�����	��N�X�D����ww%��U��4�����ym�O'����m�����"-k�z�f\���
�/{�9�����b��`D��:,L�]���dL�@�\��.�����������R�O����3��T-l_�Z��652�	A\�|���P��	��]��"GW�Af��h��0��EN�l�gC7����������Zg�2�
��@�l�����,I�E�Ry��K�����{������1[����0%�TS��V��!\UW����w�f"�vI�Z����3�1"�;��R�iJP�*GC��Y�G0�p+\���+�9h�"�D���T�K��0�m�����d���b"4���j�������:�omn )�|D@�`���%
����
pi�>��	���@�c����q�`d[��%����!#�Q���N�U�P��������j�12X�_FFB$e����K�FJ����8���g��o��JMoEi�(b�M5~;x2����\���Ia��o~���d�%O�(�&a��?���lY���$��kG���E�NF}�H(�@"C }��_A��y������m��3���*�����6��L�Z��tP�� �����P�w�zcdU�!i��!����������}��%�O������0�O5�\����#�o��(��e�^�:�X���6A�*m�f��#.F��"�*&����p#4��q�������X�FwX:���^i�Fhb����_:1�2����+���.�i��B|)B�[��9�:�#��>�1%�5d%A�5��c�������7g�#����I�K�X7m@��Qw�����_?|��9���Bk��8o��!�-����\V3�=`�u�	����r��5%��J�����W�b3CdNb��l�<���){s7�� q#\
#36{�mT\k*��31�Q��S"�5t�=�'��e�!���,6���������N���m�677w�{�C�s�,�xJ�������Z�����Z	��{sW!d���y�!^!]��z���`B�5n��K���u��,s�����D�`]�q`q�?��,������E���9��l���������?��+�F�^T{�����ifX��Z��x�������j�O?��h
��j�g\����
�K*n[=�����3x���<�L�������' ���?L!�Z������"xA�K��q�.�]�4�/C���'�.��{te�qr�n*��Tn��d4�:���8�	/�����V�6u�9I�l����V6�
��&�=Ec��">�����n4"�u3_K����3�a|�����pT�\9�����^0t���p�9K�����T�<$�cm��Hl��e�V>�a��A�Y4���_�2��M�8(;��7����,����}yGe�x7��T��_VD���V�7G�\�f7o
9/��{T5��^�k��kkc�+���D�(VA�t�r��\$T�H��
rrQF��`�D�w R �@�Rd�<�m��?������_��AR��
<�)}f~���07m_6��D�H?��D:j7��@Cq���G��F��������H�WO��44N�MA,��;��2{�����3�@=�
J��^@d����3�%b�^�����/JH$RS;fO�3u&�Ye�e�-�|�b��qu(�N�-�0��ISW��;��:;��gB-{�N��/�� zAj�`��g2���N�����������(�d(��;��&�
Om2����&��e��*U��[���2�������K�Z9;���-{s-}k�y�j��?.~��+_�{�����_��Rf�ZN���j3������,��s�������zN�N��������W���Mi�Q4�����s��7P��@�T��6��N�+�5�*;�(R����4x����-��IIz�.�M��D^d����4FFt��s�+�2�D �T���?��k��#B�Tlx7�W#���5���g�@�����d,�N�"�g�R�}�%����^�FS�)�s����d��/$(MhB�Ry��o;������D��D�_�A	��8Q��g�M����G{4l���-xsi>���e��8��$����vMgM���i��h1Of0N��
>4��^�)�H'��M2�I�����z,���%�oq�E��$���<�x8�>�C"�J��y��G�-���:rx�S#��i���$�;g������k@�t4sh����7)�&���������z7�G������,����]8V�jF���Q�6�/nZ2��/��V��n�?����wOT����R-�"�s�s�.���@���Nua��x
����Z9�1$�)Q�"fu��xMYH�,A���!���l�D�,�T�l�%�F�X�8<���L��Yv�@
`�����IMj#q}������
����hC�����g��ubD[�f�����G��( !��6�K�����Z�zj)�������H����;���g)��[����'u���B��Zl�$�jti���5|Uye�C�
W7}C9s�V�g���L�D	�R_����=�fX�?]���c�g�a�C�(�4Cq�	�Z�|�	d�M�e�e�g�,.�\������2�"���
e���n�4�n�N=Yo�Z�<�_�P�<nYq�( S9�W�4��Xk��d���JS�����oX�|J�V��c$�f��rD��������+o�������{j ���I*�Qs&��C����������������^��Q�[������D��LjGf�FF��9+�������7.N�v�f�&�6p<Qt��q�ek���03��|3�]������G���)�dB��������v�T)���i(=��
.2������$�R!��~l,��H�!wenu�?8����3�0���hE>����C��3	Q�X-T���h���< ���������mMT����_���p��@������2P��iP�Iso]�kz!m��i�Z_�k��"���c!��).�k�f'��L�����_h��^�m��5�.2?-R\�����/de��v�-�c���p��n���_n���s�=gy=�d�{ls�h�D��36k���X��6G����F��/�?�E�����u�q��/Z7�L����1�f�|UH���U�����6�8ks������B�"��k�����r����~�D�(X���u���+���������(��#��
��u���4���K�!�p!F<����!���p9l��3���jA���R���(��(z<��;�y��R��R���/
�TG5��y���������?^�*�D��:��e�!V���#�������$�v�&!S�7�U�E@�w���8�����M,�.P��SPz�P��K�p�T���������N�I�@ ~0��F�VhJt��\ly���DJ'az��9A��P_�
'C���|z/i�Y�a�{_�����#�<cW��t�>2Y�#N-�.�O�Y�s\��>]�����k�L�*����g��{7!q�;
t��N�e��h�S��\�R�q���'N��������5��1��,�m�e'x\���IPu�"h��5���	.�;�9���x��?��=��`
��r����9�~��y���D��;�I���~ne���c�-�����0�������(&��J�.����H����L�B��.U��$��}=�X-]F���b����y�ei��Hn'�q3�����?T)����������>0���8L��RMs���.p�$���Nc�(������?g�`���s�&+����)�p����{��P�q��+����^�4%n��?X������42��+�p�QiT���~�N 'W���/�c�}�Z��z�o>0�,�9�o���)��G�
V	hv��#�
Y�G+|uy\�C@Ql��(�����_�`����y���h����z�o��w��$����0�]�C���Fq�_���0j�Y�}�Z�r�������Q�|�o������M������&�t���A_
D����jqF~(�R�;���j#�����g�^p6��#������j*��=��G$��A���'M���;��!Sv�&�8���4,���2BMW�������6����]*�z#�p�����"���}Y[���g�XY8�9���E�#^C�9������Vkwws�rq����&���qT�w"[�.��
1p�=�!��}��uy?]��U |pAgbkD��,m��2U<K[Y���w��V�G~Mk��e9=�x��(s��#r������{�9�P==��+�vr�������m�`��mV�T�@��p�R����8�������,�����?�y3���g�v���-l��x��U�����LJ�I����Y��7{Mwg��,n��l��rv�F��v�����7�;���*��S�qF��������sP  .<�A�M�������P
k�����c���1 ���L ��!�T��\�1P/�	��T|`��,ku��\g�i%������A�����/�K�U��B�����r���4��j���D�)���~P��:3�����/��K$:���Ip5�=g����t��;+��lkp��}M d/$��ri�Vh��>�bwI��4���3N+
���/�d��w%K)G���q�F�d~T�{I�����>j��@QX�?�	���3�0�%��Eh�L�GB��;	�����GnZ�����K��������`(��Ko�l{A�%��|�S� \���F��/�7����i%_"{�}W2�$�V�������:<�G>5I�r��(	R�b�	k��+���2�2��`���i:��
� -f���t�V����,y���L1��d�B!V�
�YL+������jWv����x�MF������!��pd_q��9����)l�|��W�>���
�������q�
��G����1��f�|�����'�?YA��!;*�_�Z���1�k�Z���_��%�@=�}����������.�2+5@�)7���h�k��5v6���Fc{;gb]�Xx	�qO����������[��$������(@iX�^~ ?OF�'��Es^;�8[0�z�w�IW���u%w6�%���"<����7�l�y����'{�'���4�Jj
9�v.��L����U�*;�����*"����n���7�?�O��`���g���S0��'��?�N
�8�L0j;���!1�~�cw�Ut�1;����u��������A�pQ*���������b�
G<�����]�?,.E�)d���2��H<P���Q�B�=8YK�T�[o������������2ow�J����r7�|����-����@��n~���,�<[O�w$��WVJmVH�fQ-K7���������/�=�������}��f��-S�Bus�P��BuS\^O8���J���vcc�[�����R6"��O��r�U�%)�B
�Wa��Z������(�.��������|~T���+�����A�cg^�$�%�X`�jV��
��n�k�[D�uI��$D�r��m��t���9��[BAzN���!i���-��4m��8�X��]��Y<�B���s]���i����M�+`��)j�/�GI���A�.f�4����������C� �r9Q9R�����7$�K,=�����a��F&�x_�rgc��veOi�4���B:p?�^1b�[S�rA$�&�A���Ns�)��p��Od��U?H)�0����PH��.�#e���yP����Kh��i\Qd����7��d�a���Q�u�>��B�c�~+'�H�~�~��>���,5p[h���[�P�!*�%�x^��W`62��4����'@y��H���p������F%w�/�0�&�����3v7@?�^���N��d���MJ
?'��^y����@A�SdX���������D#L4�w2u)���=Jx^(���:�/����1��EL��IS�ALe�^���Yv��j��s�Z4^�����I���-�O��
����
�����|"lWp1���D������zgf>�c.��{��R����Fh��,����9C�PLz����Q��O��b�S'l9�<
�C������0�a���F�M��� ��vF��D��lB�`}��P/5������A�-�ZU,m!]�>32�d4�T�����'#�D�{|�&��K����(F���������l��KQ�~�8����	������h�����2������KF�`�zq68)!,8`R�v,�O?�+�y�+��CC�,X�c�Cj����A�0:���(^}���f��P�9=��s�������|���
[s���!���.�^�<W��4D��\����
3[2�k���q�?*$�/'0�4�<kZRkv�+��R�-%��x��b+G�T�#�&!��'����1��GRO^���B �,�@irV7�����oK�q����E���R.IE"%d\�F�%r�����=�9�������F�aiGK��nl���E���������~��;3m<d���Rje�V�{�flp����&���,���E�k�p�����&�a������f��9��p5ZB��o�X��
#�p���9��VZ}���)���xN!E��D��=����"	S�DJ�f��5��������S�2"������u�S��b�d��%D�s�A�n �J�i���B�J�.l1	5��-"�����$�����T�G��~����b)�)�la�q�sM(��Ac���%ys�ai��q���k��&\���z��-��{k������Va��j|nM��p��M�'���DM�7�NS����j!N�k�['2�����Hyr�����i������F�y	Bw�j��aR$�s�.�)���\�z9����`m�����
��;*D�A\��=�iy��<m�}� ��K�q�MCsy���o�/wdk�$�Az5��V�B�������(�}`�Ui�t���>V����R�J�������a���-[C%���Z��$LRW��������(���
�sS�#zhJ�%����;��0�w:�XL��p�4V����,�2	;�����J�������BUa�*y���3Y�2.��:�DbN�T�<T��,y:f��+(8WH1b8����5���S;�S�`AO��G,�n�������
d����(�*��UEE��2��g��5�O�2�
�O,���$�����~��Z��foss�h~���p~��/�u�s�ds��0{G��
5a��$z|���%�J�_�hh�o��Jv�%�����S�%��/���S�����k�d�����nI6��P�������W�=*���_�M������1R,��h����?���L e"c����l#�?v�~q�c�H��z�S����L��o�o����~wx�����MJZ�����5Kf��0�������v����h-�
[sJ/%� f��������{a�_�<17]'`�P��V�}K2�BfF+��z��1����SZjFo�L�;�0���<��<���i����������7�a|Sr����P�]c(;��ei������#Zg&�G��$`q�����Y�G���y�mU��l0 �t�������R|��������,|�=����J�y�Y]4r���gc�GAj���3A�����~�V��o��Y�>E���eOS����������B��}[�#�W��8������n�B�����S������Wp,va���o�!����})D|����%����������*��=�
�Vm6K;d�����E���FuhB����o�"$�&*��:-([iP��-�^�&���b���=����x�#<�"C��U�����I+�p��=��W�68Z��V>3w�@RSi��0��&Q(z��������[�]z����q{�wn�,KM��7M�����|c{�/t�~�����E��y��jT��R����r������8�N��Y���wsI�������]�@�R��2f����uA�N�!i�C��9/S�J�6%ks%a���A5���L&����`b�/����^mU���������=x��u?���u?���u������^�����y�,�~cck�UV�G����sg�4C������H�T����a�z8���<$���[������
��I1�)�&AO�qH��^[kq��c�a� 4?����^\v��,��)��Vt6a	�/2�}L{���2"8���2�
��8��_�s�)~�3R������S�oZ#��0j�����z������y@�[���|+,��*8��>�hVWF�
}�I����/������{+h]�1���J�/����on��93�g���vS�mM���a��3ZK��/�Z�6�
4 S�0`[��^�+�~��Pa.�L$��E�@b{$���6�����r��rOr�j!��_G����%�U'�����U���ajN�V�ll��9\|��e���|
�E�;<=9r�5� ���?����Pd��)!P�mY���x����l#�J,#{�[���Vko�`k����e����.b[2�� ��u�����&7![&�)��i�$"(�)��1�t�T��u3[���F�����_��d�����2/C�U��GR�P��0(�k�K�2x}:$`��"�0u���_�O���h��/#���t�0���/�7�������[?cPu�{�d��[���(�r+���!z.,Zm�}����8XF��)����S�&<���g�O��O�^.=�~T[%�N������N���d!�5"-���",��"�W��������I1���3���&�nHM~���Z$v�O�.�J]��
��Uo�@���n��?P��"��|�����}J
�
2���d8��j�r���q	$������e%��Py�S����$�L��	�-#��:A0
vB�$�h�E�^]rk�����M�@�n:5J�`��v|r�6��d�K��yj��U;�&OF"p{��5-�:�a~��7� �@AIE��gM##d �������,WC$q�JY>��T��^��1�O��I�y�y�?J�U���=*�Q�����-}����������p��Ko
�����C�uaC�6d��.=���~�[/~�%Y�h���c2����p=����w^I)E���6� qG�Jw��1*���"N^\��>}{|�^���FJ����` p'd��p�d�d�����>��?�VP�3?�ao����-�<Skt�p����(��$��b�u���4Cx�P��X�@ys<��&��D��:���
�)�C���H�W�\4�_���
��>��|e+T�L��%;\��������U��w8�SE����*V��9VW	?f�������t���kr(c�0�F�O�ke���9��
-1{�4�V�*(��"��#i����)�jw� ���q�O8�$��Z*q�>"�"��51bm#��W���95/��.�;��s{S���lR8��� �5e)����u?�k�4s��G�I'9����K<������lH�6��9_!+��/�������d�Ir)�j�(����eL��\	~{oC.�^Jp��vBfn����@��Q����5]�����@�z�KO�9V~�a�v)�4�coH�9��l5���&���X�b7�2fl{j�r�86��|���HRc@�z"���g}K<�cT~
������n��	��@��cDQ�8,7!
aq���l��W�%P��=|Xe	��*.���)��"�9[���aS$�P��E�����^�I��^�OK�1�����;��<[��0}��j�?�{������(=�&/�����j�e���R
Rt���Xkn�zZ~���d���xU�"�z�X8�����b`ZY���U%
�B\�y�0��y���~��8�����L!1?���&��ZS��pE�5q�H&�M'^V�|-����2�J�	�`��>��$���M������|���l>��zJ�S�:��SZs���e��
�w�.�W�i�~�~�F���./��z��]MsC;��������!�%�3�������cT�����b���+ �+s�&�Ic!bE�6;yx�����:�s�tN�� ��;l���amd���\�D���2C��xB��%�*��jX3;���
�/r�����+���	���n��\X�;��o�A�3<U����S��!�I^���`�^j��2���
��
��R�����P�HT���hOd��=�[�0��5�/��D)@\Z��?�������
������\c�\}�t~�~������"�=9��d�b��2����zI�Q��������(k�������'���������a���]<R�\U�i�����Sr��`�������5��,���G����������H[aF�T�Pb	-�M-�������=X����C�u���iA�4�XX���xz���I!%���f<+X��trJ}+���w���&D��}�����p^kL�=�����Y�LZk!�g�"_{^�r�2@�)���O6k�$��������)Yk����d�����4������T��������X?�6��+��������7���7�{��L��0��>����qM�����a�R�����ga'����W��O�U�]�'�b`iVCz@�	�
	Pz���jH���s=b�d#ta�)k;����N��?��sA�`)	S���c��P����Y��������V�8��L�V8�� ���p�u�s�����������R��M}��(!���[�Ijz.�\jLF��'����*0���x:���?Bd�0���V<�9�����4��aw-/��U�z���1�aiH��dD�{dU!�/'���+�u������nl�����������R��q���e/Zg���!(���u�8]�^%����K��6��D�n��aL�4�`��kNs�0��g���^#��Mg���6�����w)��V�PX-������9���
_7b��es��P(.�":�t�v��^�0J�iJ�
�P?x��|d��L�[[f�l��������c����u���#����wS��>�[������t�ws�h�3�|T������/���Y3^.����Z�R�l"xvccW�g;�zx*�3��QA���q�`y�:\���Ra��4�P|��`���-k��(����)B�\,�Y���-��R/�\y��s1O:�61l{���������*��x�=��r�'��*c��2gv�k��"���S�a��B���_��Q�'*������A����������;0+�N �-7�e)bWp�R0��`c3�.�D�
�lL�)�����������#k.t����C#���"�?dG��g����KW�~kX��x����"�-KKuT����m���bH1#�]����6����)(9eg��Y��������\T/��7���=�\@��[��9
�[���pk�����9c~�p/�����!�&��-��|d�[S����
Lra�"�H4� ��^"H�U�q8e�[����h�>������e��
�|�nW�����o�����&�P�A��l���*s\����������������5�A����)
E�W%�Q���/1����G��wg�����i�,
�8���M�)�l�MQ)��I�fM�m��6�]rZ���@�&n�������D���������k�oPpY�G����8�d�`�!�*A�e�8�y��J.����onl���o��m��;
U�����M1���S�}=@I>����A��������_�A����0���]����`=�_s����K2�������UB]@�u�}��	a����nmB�"�-}�����y=q5"� /h���H����u�Z�Y�n��
B���P���q���\���E�����N2��r�������a8�T2����'�M�;������y��t�=_��V�����PP|�bMQ�{Fs�D��Y�����$�=�����4y���/��=�(�d<MG#lug�8r�f
�}A����m���D����3>�0+����My�u�<<��Q\��N<T��=f�����C���#�&�C��Z%,[�*lsb)�����w�����]2-N"��;f�DL�����>�\/i�Gb�1�E�a!:8C��F����Q`-��
P���^�� :6��%��P��Z�
� ������H�d���T�r�b��+����Ad��y���E/M�Y>�o��$�K��������G��s��m^�~`X�M��K��tJp7!��t�$o���������9�Vc��xL���Z��g�[l1�hR�`>�PW$�d��e[)*��%�v]
_������Z�OrvHC@d�p����2���-����5� �m�t1�1�!�,����=1�=�[�*���w�s�S�]^�"le�I{����2���Q����,�oF����[�5�R�r6�+n��+q����� �&D1�}��Sz�pR���C;��0�_<�.���z�w#!��?9|s�����������gd(�a'�h�U���P��2���D���Df&�EI�N4sN���
e�����@�������2�������Z����x��|���8_����~}��8�Idz�\�����}���
�����L����
��:��}@����
�l3�0��j��M�]\��;
���z��mqD.Ige�����J���:�G�s�T�8*yK� O��k=��g���Aq'fj���������L0z�S�Tz��tS9���m�`�o���P0�>��(�|:�x���=-���mVd�Z���T���_�e��z�7i��C����V�J�>I�6���\S����<j���)����2��~Vt_������k�.$��"^M�J[�-K���(0�6#�m�LQEP�V��sVS���$������f����/I��\�um#Y�U���5�_�?���Zn������Z��0����g��5z������r����
��J �h����7;����E����������7.N�vL+q=H�')RE�����lAYC�PJ�d��ia�i0����E�:�&1UP�JCP��7�l��py�]��y@���&,�T��3H����Q�2����q��Xp`B���<A	��d�r8*��	��J��y�-�e��3:�@4�1��'$bS�����L��e1�V!<�e�]P�4:O�>������P��uiEZ�9�������$oC��]k���`���]q��~=�f���8��5��n�{����p�'���=�'��m<.~e����y-������U/gi��P�E�r��������#}F������^�������1* .�d'��h���hAf�Q]���#����6���n�s.-1R_��U$cC$��?z���W
�H���'�jr������Y��R��7�����	�(%XF1�;������H�����>�=a�[�@*S�.s2�G�����}�_jd^��
�����:�B���=C���a�T����|��o��f��}ot;�K�L���@�|�����?��/���F��_~x���.���-rq]�J<w�2'�>�"�xV��T�ZB�^F0g�&���[��������?�!���p	��F����-�����K$n��^�����bw��������}_��^�������J��#m�G��X���%�z}sG�����a�d�iC#����i��,Qg�U��BQ�QFB<�����6�gv���`=�d��<f���\Dyx��x+�96�1X�.��������jc�a���?�6�B��~)����$��Z�����z<s:�x���k@�6T���&��V���T���jzB?���x��~P��D��\��2V������:���g����g���=�;���/��fD�_��}�����Ev����J��g��k��[�,k���i��������8�E�B��EMIr��w�QB�R#�g0Ln�����/���d�{2�U��N�����P�~���9����R)��t0Dogf�U�&&A�������x+)\��_�;������������?_M�������o�	���X���\�n�q:�l�1]	�8�8�!��}b�c��`!�Gl[+j�*H8a-���_\�����Ngb1�C�2�R���h~��e��o?�O�_����)�z�.��3��7�������r�"L���y_f���;�:5��v�E����v�h�H��Y9�&s��j�_A2��B@���N�c���rV�=Q�+l��)6�Z�b��y��QkW���g|���������bu?��.�����o3����L ������]�}L?CbG��gI�<�6m2b��66p>��(�W�rsS��6��l6�
�:�&��z���v�r�kD/?����9#�_�rx~,�0��d9�j��/L�Y/M���?���hxG���-z�S�a]Pi�L �Z`X��_�������q�������fvgh~���&7Y+�g�����	����;������oo��-f����yi�)�P�J'��>|~|��8����GR�
�l�~I�w�������.Z�������8l�S�����/s��.���]{��������go1t���^5�L�7���9E� �e�}�^>�r|~l:����`�O_����~����U��HJ��O
�l_��)�NW|��r�f�[��yy�h!?�n��cA�����[2�y_�DJ6�Q\!oq�v�%z��
���+|L�p���7���'��y��%�}(��l���>�-W���B���&l`j<�RPda1jD�or���>���������F5],%�y����6b�0���^,�Sl ���~�������n��m��N�.J�R���PF�0�4���hgT��=�d�V
Q����	%���wG��~���5�!���W�D�(��2��
���x:�>��phn���h��~�!QR�"��#C��}��xA������Rv�*��R�o�R�o~�Ry#'���V��GC�j����>M8�p65�)WA�?��Nf)i�������r�Q�1�<�zb��@(�C�V�m�,w���\"�����#�\a!5��6J���_��++�u�ujM���a���bT���������kx`L5��hL��f�5�S�I�W���k�����v���|��wU/vU!��!F��+��9���<����e�p9�Zx��fos}o���?H����ex�t0��I�����?�$�}K�D������5�\lb�cQ�$9:a��FgaUS����?Z���K��'/����+ir��`�I.�zNN�pk+X�q���c"�V��-������+������UP>���B�2W�=r��&��Me;������v;�I�h��4�|���s4t>\^���8!��[��vl PH����M����K0�&��D)Q�V��C����g����A��>j��bDo��{;[�Vg'�����8l��9��5�PN��������0S��3s
�N^�&7�cF_���X��&�X�a�$��I�+�)�J�)��������}��i�UL��o��$����5���D������k�V&���'7"Y�����������bH.�r�+n�2����S\��y�������#=��t��5#�����t������.*B�<�pO���D��E��{%"��$VHJ�z/�Q��9������$���p��������;���m��(��Y�hq���u��T�o���,��C�D�E�#7�H����I�$�9�������O���;�J��7p u)�YR���;�TQ�]�H�\�wv����C�����
��Pdh�����F����M��Y��k��O��(z���C{*�����q���}�m�2OG���8�1����������n	4���T��r��I���0+�FcD�����"L��H�M��u�����[4��_����/Ws)����p���~�p��y�1��:���P����[E�����Rskd��,��ZUUhy�DK��b2�/�:H���Y���5�����������n�o��,!y����=�f�����|@sLh;i��&B�e���]�F���b%pW����|c�0�����S#�^����A|�D���=���-�Lg����Q�}h�v�	�<�����!��d��X�s�zm2�04��J��ad�{C��
d�tnNj�����yM>�Lp�o-wI*���m���N��^��;B�b�<9}u�)::;=�p~NW`C��^FZ�o�zFaga�*�P�L��l��Z��(n�>�(ck���8+����K������v����w����FM�����eU���Q��=�}r5iPc��F^-if��Fc��q.D� (rG}T,7{{����%i��d8H��w��,��<������(H�(�-b*p�g�D������9����Ku�rJ����3��B$*� }\l"���_G=�	�X<K�DT�`KK��D>Ln4H�t �.:*
m��6oH�������
�11��v���)��?�z�D�3�m�+f�vGG'�H��L
���GH�������j�dO"��O0���NE�O�+�T�d�4���p
2�?t�c��^f4]�Of&`������E������HP4�L��-V'����`�Q�T�}@� ��V�B�~���|T;9^��?Q����]��[��'�t�����Ht�T����S8�=:3�k�D%�q};�	�{�w��:��u�`O�f�/�P�DA�D��&1�p!(zB�B�OPe����$��������*��D0����`�Q�Ul��2���I���
���U�uV��7���RJ =����8Fr�H�DL%!��x� ��
�6kO%A.�H	F����Z��Y��d*:���;����;���
}��{����3�G���\�?�?�������i�?�=�0'��0��
����7��1_4v�~ �g�2���"��������G����J�"�w
�fsm�ft����OQ^:��Q���,U=bO�Q���jL��Z��l�Z��`��/���.�(4��3T�S�o�7�TT�'���?���j�h_�E�O���h�O�C��6�#d����n���*��I�]�/E��n�1�Y�3\G�v/�L�=<���N��W�M�@bH�4��z&���Q�\(iu{]L�d���u{M4z{csw�������z������"�����L�v
d������-�?4zcc��#6�E�:�M:F���������\��&x����}�?��O��-��J��5��?�;k�g�v�M�	I3(\���m6�Z�.�4��{[��>���@��O��{\��M���I��?��h����gv~����������������7g�_�����e��a�j�Y�i���������o�18s����m���N�	�����GF�7�C�58����&��|�������_|��?�dL�#�;�K�;
c~�����8������:���7����>����1��f�<�{6��&�c�V(�����-GOFh��\��N���s��RXv�q%_��fo^����Q���{e������A�u��l��,���=����
��pl���V��_l|�Dj �R�&Y��t#�/|;�����3�D+�7�}��q4�A/�N��G������$+��e�y�����k�{|�^"Z]*�P�������R�8�d`�J��#o�o��y�l�������UH����}�b������ ��7w`��������m�p"�&C�R|�S6=W��Z&M����ed�~� ����="��0�eU���Ar��i���Y�tU��&��;jo��em��t�.������,�%vBw���,
g<�k�e��%f�[��4c�3%x����?��'���-[�a��5M��T�P>��J8�&����@�b��S�e���Q/E�����C	S��3�no3���}N�����'��	5�8���Io�W��+l��\��A}�Bc�Y6�F�23��o�g�-=�"��T��������D������{]����������0�Z2����1����D-F�;|��1�{�ZX{p�^�ah�F#�����f&2��,�ddX� k=X�<���%M?���>�`5\}�h��I��|������s�/Ybf������A�(9Zp~2�+�
����A?���>�c�2�$�	u�N��A?�>��h����m*����G1H��Qa�i�.�6�pl�;5��j��\����rD�3�v�%��,���C�13���C�{'�����V(���Sr��Bi���3���&��"��0�!�Q�Q����L��.���^z�x?/�Dq�/F������%��������~��@*�,J��{d�������d��u� +N��9�zu{����xP������o�_VK��*��oz7N�I/��73t�8��:�	ww�ob���T��wb������w�a�{���z�����n�Q
m��6���B�1� �T���@Y�W����mx#����4��)������������p�P�!����Ep���gk�
��N�f����'�b�^�����dq���M*-<��Z����pM�G�rn���$[�
� ��Z5�����q@l�0�`���������U�����/����-�pqz�SE�j�� ��G��
�{�CR���a����2[����] �FGo�3o<��@5��W�8���k��y�����y6������!���=D��&Fo���t6{x�V�8w�+M��1!1�x�R��QD.sz{&%��[Y��P�,N�n�e4�=��.���#�������m��x�*��������S��h��h�6x�0��v��C���xz�7i��]��*��9��m(K�'������M���
�"	�,�����P���y�����Z#��"��Xw6�*]��\����;����%%��g���8n��<uD����l�l���K���,��x):�8���%
�N�������I*��9�����oY[{�J���6�\�~Uy���r�i2���}�.Xh�t��5k ����~�H!�>����l�.f*_�Bw=xn�%^��U�E)3����F����OP��FGzR�2�Ey�/����kS}_��t��������K&�	8^*�pA��\(�y��$����m)��-����d8�]]��YX#)�=gN���{���*1r�\P�VA��SF���RbLyg�q#�f5Jk��XV��l�$A�� =o�j��3�5�&lZ���\�44����R����J�1s�+A"���T���t2Fzorcm6��2�D�+�h������O��&��M�T���_�}*<?��M�+^����|W�����x~D���s�����F-�58"�)�~����-�eI�g��s�*��796Pn.�x��{r/�/�������p�
�A�w����+�K���^Qs�w����|D�A������yN0�/m���Qh]�����fC��F�� y�n�R�0�an-!\]$�I8y��a
��d{TN����#r��\��Tr*�[�|�����Z��j��6�
dA��dDD6���9�����j:�}�@��ys^Y��$%�L,�
��o�L.����;\���D��}B����d�S���U^q��K�������k������}�/w��k~�-/�|��yQ��M� H���#��`7�j��������[�N�����T��+;*Z�+�25���I?���?�C��w}��J�z�?#�Fg��H`��L�tx�(����%�H;0Q��I�>=A'�TJ��I��TN�Yyrn�����_���37�:�,_����:f]�/��:��>d�� �
����D���>��|!'�"� H�#��q���5'F�3���.M��8������9)�����dw�6V|�<Z� ~��X,���^������k!�}��P���"+E5�JL�j?S������'\p����� F��������/i����������w�676�����(���VdA��#���4��m��9��x�"s���7F�
X����H�����k��7��9
�L:�8��^�>��cB1"�����W������H��I�k1<�&�<U��yMek��/w.��Z�^�2�L����Yq��6'���A�]�J�B�wI���k�/-mgz[����s	�����K�����\�%1"|�R2���<�R.��96x������n`r����}���������y����������2���G%���:������o��
��S[v9q,��P���������������X#�*��r|��wg'����|x�������E���t��B^$r��p-	Y�/�h�^�������<����AJ�v������,up!���M�N�g2����������+���%�h?�S��?1��0��z�W��I��|b$$`�g`�������L����
$��G\>�V�R/o���.;gIB�*�2����������'o^�6x^9jM�T�x���������)`:������Nc0�{[���}�WVrV�:���P�s!�UeE+V!6 --T�Xj[J	E�qX]����6�V�s�(�+����o��m����R���j��S$n5W��B2�
�G�(2$�*0�k5��&c2�y�%����M��r�������9�^���2&�b}KmPL,7���wa,s�����m���R#�[����K.��D?����g��B������k�./v;����?��r��k�X���A���9
����\�U�����r�H2�-���P�.�e��B~�%0��"#���������#.N�^R���yMd�g4�w4��'�
+��u��i����Xw�O����n=�U�c���d�����q4m����c��j�/����y#\W�,.���P~���I�EBwc�$��| H�R����!�6�`���Jy�J�������,�#��XmJ_���3C�a��f�MI���~�m[�����-��P�v�~K'E�Q����|�������J�t��DGP���n@	�\�^R��>+�+���������F�3Wk%"!,�(Y�E�l!����Cct����_��+�
NS���H�2����O�,�`��-FH}$TXB�vw&�K$�>�4���POd�`q� 0�.�����n���[��~�~�k�o/�/���9#��R�ZOO�4�![96gv�k:q�-�����6�t�d�����`u���.s[�~}��:�����e���	s'Xq�X
�����y�����=���A|��g� �? _��z�F���\Q9���Fx���t=��b��zn�T�\*�*r$�*�n*��"#�H�Cp�|J���6p"Z�u�-y�'W����?}���5�����5��sYq2�����Z�`��j���� ��\VCL0y�
���}YB��� B����!(H�P�^:1���A�<J��3�k��<��RR��F*y\�w�ud� ?��-�%����f�U� ���'2>�3?�&��A-c������K�\?���G��Q��
aI��)g��W�xQ��o��u�V�OP�����]����m��{O�'���	H�	�&9(�(��cJ�oFV(3�9�Q�_��J.C6w�?SF�����u�N�M��93K%����vM��\�O��sn3�p����m��H[���c�w\Y��J�?%����j��������jb���R�`N�TA���G��R�Q1���)��jb	���������DZ��Y8$��Y}<4�K+��eK��lm�?kg�W�4c������b�}��!?>0��/�������+��Kdh��6�N�=���Y��\2U-���I���a
,���	`U�af�,0��%.KT���oLenZs�W�<�E�g�u��}�k��7���C(�2B�@��5w�e�`�ad��kT1#u�u�%�}��"^{w?<���<������3�%%�2\8����$�t��s+:$��X{�LbC����`BRc�CeJ3���5IS����L������y}���y��3�� �e���|B���+�^e��R3Y<����\���U���]K��h�U��2��8���e2�76��H-��q��1�u8g�c�������y����X
k�r����>*��XQ�����}3�rR�^m'\�gtU{u�����F�@^A�|�8Sy�?hD�$��<��?e��b�LQ���^��,�]��E��`}��Y-@o8N_�XR�R$����:� �'H��^kK7\���A������RA����r%`����iOi����W��l���cG���V�f�BK��t'[����O�����������y����
K�sO�gpc�`w���j]��sB}��s9�`�)�i`�����qtmWC��h|������e�>!n���y���p@���=:�~�80[��`����.���p���W�!��������W�@	1$����!� ��:i+�Mp"7�N�N#��I��h\3s�
�K��
�j�l���<m��LF����k��
�v<�hlo��zc�r��q3x&���N���ca�+��L����|��:�$T�����Y������,�d����C�`]�w������Y��1cpP���(��`�C�v�s5���i��S�������3�o,3�A�u6�j[c��X�f����X�$�����~�E�{}�k�kZ����q�2	���2{������bbO	����=�����nJ*�DIh�JL �<{�"��O�T����f�%�H������O������b^�=�|�Zv�,�i��#�2P�%�P���=���iU��J�S!.�4�Q�2]s�B{y�y�1(CJ��fGU�����E�������-�5�����l�Qm���23:�Y�5)�F+�O4���lD��*�k%6q�����;�/��f��t:	{N��f��BF,ax�I���:L�M	����?����w����[��7�M���T)����l\��{3P �9��|�s����
����R�r�[��H����E��Di��ju��I�Ji%�yQX����_�&T�<�����\����J�����������xe�V��
�oT�������w��W_�+�Wyp�/�T���U-�r7��A��9��JA���'���W�����
��a�������76i�w$���.�\���L�(���/�U����)�]�adk��,78���$n���G\�k~�����)t��j���<e�[!-����g�����R����������������o(����U��G�on�o66�
)m�nJ�`@����OI+���C�������jF7U���x���zu �A�r�?��^�<�a�����f��\S��5j�0,7)����<�'Y�y<�����l_M��JmA��V����H�X�X\�U���@����IG#
*(A��)s��2��`��oC�A�
s����eSr�G���Y���6�l�����~~?����LCb�z����*������98�q��p��$�@GfSR��z�X�%=Rf���s�b��c���.H�pz�;b�������Zx��M��mR��va����9��Y�rh�C
����z�)�~�?Ov��S[���:�a�3��|Z����0�g��j�4��{o�[k����2���p�=���f~�����^�7ph,�1�N*��4��?f��3��{�=�k�mYX�_�fX����y�!��c���)S�W��c`��z�r/� FD��RvA��f����{�Vkkg{g/���M��o9{���fAB�1?�H�$@�>�d`�����~%;"�6t@�$��cb
������s5vu�`����dz�$C6��^�~F���t:��=�s
��2�w���q�w3t`)���c~��w.��!����L��lP1:�8G%wB�b�B7�iA�)b8��-!H���F4���uw���]�����m�>YkM�M-U������i�1jn�m?�7������+M.����2�]MO_�M�!b�e�w{�B������������tV&3e�}�5��2ed ��k�G�U�1+����)�VR���Q<P
L�KAJX�U��Ay0z������X�p��f�����=E�z�x?���B���,���u��n��	��e�n�Xk�f����S��GX���	�"6AI^^�$S��b�r
�l*�5����=�=����w��,#i��#A�������z�������*��C��!�Z����qE�T\����<,t�2~��������5��B�MUB3*?AX�:)����T*�'�����G��t�E3���E�M�T����Q�)�Q�I�)�t=+�\�����te�dd������ K+�<���Q�E|7L7�*�y<Z6���~��/`K��3�5!lk�����GBZ?T�����������JX�/��������z���r_0�����c���i��:{ �-����v&���h�F�<J��_R����f�f����$4��?P������\�&1'[R��A�5��t��@
{��[���y
H[���>Qs��3N�?��Xj�XJWy`�K�{���S-�
�D�
�syd�4W�}l��8��$c�l����#b�-����a��d����W�"Z��P:����R#y�|8���WgO���F�'F�v�#�J%
����������c����K���kN��`9���=E�\��Pd�YL*��	C��d5T�;�@�'!
"9�\��;��/�V$d��4E!�,�����*�j�@0�Ar.��(��&Y3���Q:07}����KE s��#O�;q4z=���T�y�t�j�K�N���E��y~���]@��m�E��<���I�n�f���������|����y���d�N���Ih�`k��9�Jr�3U����o�.Y���ye�����a4_�]��{)!����7�d��H����������3|J;���'K�����t���w"�����p�gP�x��J���mAy;P�*�!~9v���q���f�f���^�l������������
Q=������V��8��f��E���Qi�1`c�Z���OK8�����������|"ba_�a,��g���S
y�-1q�9��>6jzMW����r5�:K���D��2����S3�������0Z���Y�N�!�3�w����W����O&����.��q������X����*�P�5��~��g9;�\���O�T\����q���2���W<.�K�U�R�s����e��q�o+��8L�,�P��{�Gp��h��m����2�8�2I���8.�w���.G]ON�1�CR�=���S��r�)O�)-�>d������!s���fC�(!�������\��8����6�r��/���[���/^���������\z
���t��DDjPH�6��=:�L���^<����M�=���H���$�I���]�`�������!xd�|�$z��T�=�7�������i�l����llm��u[hGdVRq�}�e6��$����Qh���*���j:I�������l4]���6���FM��.��8����	��OI�Qo��b�UC"5~dzz�J�
�,���zE&p ����K\ ���8\���D�]="�y�H�@��pC�[����C�AY��`6�%7^W�-z��5O��m��dM)0c�*�+D0�a{�1)b$/;����7�;�3�1����%,m�%�o/k�|�/����t5������i��	f����kKP:$z���M����<`�)��s����C*��Y��D*�������F�Rw4��.!������0���3��rgG�zzZ�S)���.ga7�L�~B�I��d�q�s�n�}�i���v���V^qV%{=�F���@����M�#�9{��/L��sq�@����}9�{���@��|s���h��(]���Vz��:��1��v�^�A2�H��n�N�������;����c��@�t��<���o�j��1�wl��zP:������:��Kb�mh�}xNh��%�*7��03�����(*}l��x��DV�*:����3�G��w�{���H��.�`�B�r����Rx�(m~�!����\����=W��%��Te��������Q��<ke
��a[�QL��� �F��|<��t�����}�=�;u*��E��P�*������4=Tr���nc� �o��H��3��!w���a{	T�'���R�t+���?Ln��-s��veA�T�/T~�Pl8�����y#zP��L���`�b��@/y y�tu3�"K���^��{m02������z�v7f.�
cpFH�$[���_�����+�#�
���Yt	�^�T�+���^�q'�r��1���l(JH'�7��������������J��ZYs{���s��.a8�:g+T�8����}|~�������7�mT)y&��Tz���
�\[�ji�I�[�s���j	����/?����i����D$�v}1�a�8��+�)�dF}�}�l��^��r������l�8�l*\����r��s0�tt3�c�.��?�koz
H��`Vl�:�s��
�3�Y����.�'��0
9i�/s����g��A��th8��_��b�8�j>�7Ig�=Y�k�t%B����L<����,���>���{���Q�����F�f��T�K�M��:gNeb��/r�@���������"R5+���5������5�t�<v��`�u:�LQ0m��U����B��{z�V�r"?>����n��vU�E���1���W� 9�_�e��y5�����z��O���j������9�PS%&��<��sU���sH~��j#�I�����Je�1�������.{\�~��K\
 lp��\�|P5�m8����HeF�k��+F��W�7g��j�d2��j��2��l�@�6H������@IJ_����A��������?)b�{��]C����5�����������p�un/	��R�f����\����������-e�-�(�Ug����$��:����� %��Bc
��0�_W"q��6p�U��<������<�W�����H#11^�p'��*�9���������VjP�pt����@���XT��L�Wd��X�������^c�)��A��}.j[_�[�Z��
�({@^�����\aB��z02�r����W��m��dd�Lf#/>���G-��g�K�v�������'d�8o�W����v
PO�KJ����Z��!s����Hi�-H#����|O6t�����$��;R�h���x��
c�r�6�wzI�K�_���
:�|���~��W=�V�f����a&����uk+T���\U14����oH�a#a(���T:/����������@$`��\�V�������J����a>;��J������V[�u����#�<
Ed1�������F���2"x�z-��q*!�A��;�����4i�,��V���a)f+��j�{��5�+drE4����p��������<'�8M��l��������
BI���T�����YZ��(�fX.r��:A�&qnQ��aBc��:+������*XiU�%�1������D2��9G���z*�&F�$[�_�hj�,fW��\�z�W.J�J��{�d9��kEg�$���d����w�z�����/�������<���;��|!�X?_OSF�@'{��U��]���!X�;+i^�T/��LZ'��Y���h6���R�xx������*%3���F`=up�B�)�����q���u��Y�����7��gM����l��0���g�������fP��"�3�j�*��i�4!X��5���&��q��.!Q9��������.��C`4��[<kR���\���&l�����9����u�_~�t�k.{�/�	�	9	�A����XyF� G��[}��M_�I�R��9/��/�[�y���d8<���,�`��u���u���;e��)�Z���EH��q��I���6(`���a^0
�"+?�J#t�WE���)��g���v��r���2�����83|�����9>�&Fb@��9��UA1��h@a�������,��2,����U��;����,h��$+��&y�l1��
�X�Gk!�^%����n�Rq��y���qz��'�H�hD��P#i�vPtJ(jN�1A0�ge0���@^�l�����"p�:BC�R��my�5P_.���~h6K�y[���4L�(5��<%�~,�����1�"	�H��)1�1ka������{���w�`5<��,��2�|O0��<{������3J�O��Hk��n2��~����ivp��~����^H�&�f���1_�yq�PdD���,d�Z��?�_�S&ck8�+E������%�*��l���9�c�"�R&�|�b��;�D�6����Z7���������0������_�L����j�+�1	]"�����!���}�������ul9(��Eb�%g����
VeF�j�qH�F����2�vN�� 3K��@Q�8�	]��M�b�de�M�l3��Y/�o�l�8��kl�%��kl�~�5�)����-(�f�bJ9H^xI8V�MFQO�w���R�Wk�"K����++�6��P 	4���Y��Q?�g�iWFT��������#&@D���
�}�  ����M0������ 86e0w�	|������>_���K��z����M�
��o���s�"���V�
O����c
�M���.��"p���$4��Cq*l�F�3��])�TRV�2G����������t9�RK�|
A�Px��ud}�o�����������x��M��8�63�������\xI�`�3}�\�xXq!���"#�
��q���=��s|���{d����P����p�:bw���D
���*�dV3��(���,%��Qr�D�r���]S����2��I�������+��vK��H�4�o"�@7�O�hG�D=��FpM�oV�;�q���Uy��J���	�KN>��f�o#����D}%�%(u�D
OK	�n�0]�� [��Chg�>^(��Z��a��Y���H����t����KU�"}�<>L�Q��(�.�M�G�
]�K�%
��������x*o4%�x�N���2�������D���2q��o������l�|���C�nd|���yw@9�z����=�p1����NK2�6�.�zx��v&����F�i�,�C-&���f�]9�����	�_���^����S����`k]1�$(O�#.^�&����|A�,�����H�=�(r�'REt�X�����(����"a��w��x� w��R�	��/�n��B{��	^�� Yy1�AH�U2L&~��\��,U4!�)�
�a�@5#E��U��������R��x��`��fc{epw��e�JbrD�e�5�(y�A���%/�������K�������1���Bi�sL���w�?���2	��Z�gb���O�m4���w�K�'�F�Tj��nR{�a���W����	6�1��7���'(o4�~gc��YG"_v�`�PXT�����;y�&w�|�J�&j��]yd/���0��
��V���-�h�m�e����Y{5��$U���B�L� � �1��������p����D:5
��vLgR��������BZ��u����y�M^��z<P����"P���
�)KF��r~��D	�l��C��drd�����;S��S����
�Sh
�����T�����N!i�:���o�M�
�z�a�i���r��X���$��0�����
���H,D>�!5,#����]�!e~��k�!�e"��t���[T�����C�yUmj���2b�2UR"{W5�)��M_��l6 Z���X�;�
��I�;#�2����,���F7.�zN�k�V,R��������p{�4a�����9���P���X�MLD��w�!������?�'|�����T�L���|�4`=�������)_��������Q�p�SO0XH�9��^W����K���{\�+�e^'S&#�icc�����o�hw������.*�����\!J+i+IU�����+���������+{�%������������7��]B�V5m��PFC�$fm'�Mv���.C��T�}�U�����U�wW,j�>��n�\d�+�K�:��z\�>��\��w�1��/�hd�V2~	���}^TT'����
��&(����x)}�T�%q���b�:�a��H�����+����0?u�;�@�{���&�.��"X!����0����\�UA!J���n!�C�M�h�"F]CO=�3^��7I���K��������}i�6�Nm�d0�k�D�t0H��a����U!2����(�����i\7qj��.�H�������c,�8�W8GH�w���cv1��t�
),�QJ���XY��E����r�;���8�.�I2��a�������5�;0R0!z_�X&
LT�S�-���G�$^b���LW�	Y��/�Q��Rk?��>���.�^�a������'N�RI����f��\4��3q����L�a�s�hM����0�j4\z���b�~�)����o5B����f��_�]rd������/�u���F�R�e�on�j@1���*J�M�!���O��o�0�|���{%TE>���s�L�JA��:���9�$�L����
*�>�T�3J�X��2Q�!u�����G7.l����>�I�}�C��,���fs�l�'����<F��r�Q�3�\!��0{�c��E�m�������$�)���NG��26�������F^�1�������q�ng�ASv��r��>�P��2�Q|-����s�5+E�`��.�,:klkT�%X!��t��r.�Ux�T����<��9c"�c�r������R��J��e��
FS��\��#�9}���Y�C0������

K���&����on:��dN<]P^
�U��,����
������B(�5��B#����]2�Q��,#����e�7i>U��uB����R�x�H%��$U�##�
�5���xJ��0��/�(�����.�UM������ry�20/�i���~�����O�{���L���QN��&���DO��jo$vD > <J����K#�~\�]�R0��Ai���U?��G��	{A���!����Q�/�Q���	��I��?������d��sV�i�����B���A��-$H��n��`�Uo��HvQ1�2B������$�GT�n�}R���������������P�jF�������l��6�>%/���9����
K���`�]h�U����Zet�[�CQ~g/}e�76p}��R&�y(%9t�5�����JM(m�.������"�����$WC���^&�pR��D=�CJ<Cc�{9�ct�l?q�0Q���3�(`3Y����I����������e�JtS!����%I�rYS�L�w����x��O� ������>o=�-~�0��xT{��p�At�D�}L�l��
\\B9x���H��5�m&��6��D�?��>����r�o���q�a��u>��4��%�t��~z��c�g�h1P�5�t�R},fdC��x�!,���^���R�3���9����q5v���;ExG�q���	�@����N�/q}��J� �l�%�2]�����v�-��h�dT��&aA�l6K�&Kv���S|��r�])�|A3�Y�P��8��k
��S�N�?y6|e�{F�������R�.�R^C���=�G~����n�I�b �'���X�3���O��c_��
7�������J�j�c2��2
a� ������#���������I�|����Uu�Md4lr�9�D��J
�y�l|5M0y��tmv��!�5	�_.���"��z���0�k2����}������Yn��[2������SF!���]t�	����G	��6��T�$���L�V��2�3F�L�kh%����NG�#�s���QVc-�t��>9��ow67���7���������4���.�1&O���nD�K���g�}U��f���	e�T'����6��D���	��b�gSw������R!���.s7W��^����������D)���<[�W|��2l[���H�Ec��=2�<ra��Q�j����CI��wE���b��
�U����W&��e+9����]>���?jwLK�k�Lc����������������������C88&I��/g#U�~gf�8\�DWo0��-<�:��h
?�T"_t���
�����<�.G�������>^�W����L1"�Gq�>J�H9o�R�3�ES����ErS2���2��,�
��s`�CNc��;2�S��lH�6��!�����z$S�^ixO�Rs��1�v��K���kw�z4Nt�U�g�E�� ��f�����,Wmsq]=�kc�����Z�[��%T�p�Y|������
�4�!"4��l?M�E*m�Hfa[\�2,��eB~�;�d<��O�}p_T������T$�@��
�6��\o�jC���E����({iuh�-"���f{�_<�5�X���jh����C�)PX�t�F?�
���9�����|W�����T7��b����n������n�KHO���WP����8[������_��0g�k\�\�Jv7��x�������8ePFaT��&�1����(��<����"P����e
�
��8avrI��h0B�Y��7IG����n�;^�D��"������0����
��e�\>h��V�OU3q��\���W�YO5�F��kl� ]���V��N,�s�54�I
����ru|��)W��M���w�7��#:
`/���\9��bG�B�f&,�46����w*H�!�h�~Q/���2� ����f��V���W����~N����%�)"��Z��f����5~��,[-Ck��� o�V���2������i7��"�,`��g-�ri>�@#���Y�JH��n�L#QaX�\���o�p:,[f@W���&qv��8��t�!3��5��0��!{� �$�ybo�!GA��,d�W3�U�+^L�'�zr���@W~��n��m�m���,�j�O6wE�;]� V��������,,�Y��Y��B�.��"n���f0b����/	D|ss=��|A&���O���D
������.jA�89=y�>:?���ic5������������\�4nb�h���O�334w�����z%��6����N��g�����c3L��&�s��hr65x9��\�-���E��-����+�]��w8V��c�J�%7��Z�E�Z�:�����|�ww��st���������~_H���CZ�{8
�}��D*K��L"�k�=����R����<?����)n��S�K�R���LS�'�~6��������33�������o�����/��N������u��������G.�'�.�������7�!t��~E�����A�5��y��`�W��~����q���	����,�U��^�e��$�;����3r,	�+�aT��bu�4���D,R��(�r��+?���{_���b� �"��0�=�l����������)��7��s�]CSc)+U��D��]OFC��%S�B�G%����4yxf�8r��qsy�������H�C��i�w��|d�"��h�7�G���c�V��\)h�h-
� ���l��w6S:�.��\�*��Vp�i!��[w@4;op���P��&<s�t�(r���9���l�L��.D?i4��*$�'b-r�L��@���:@�0���/��W�FYG�M�Px�L%/�]��5/��ya�]J��Ff���`S�� Mg����S�PJ��h�{@0r��W�*$ue3�-�eG]A���X��X�/�����h�/1�{�zk���/��I6�}��������2>�����		_���z������}�� �X�a'�8�y�"�#0��s	����kG"V��n���:�'B�b�l(����b�����,ij#��C%�a�zwxzrD�9����8?t��WH,��$�8.E1�9Rd�Q�_D8`��9~x�66�O�����V�%�L����yu�d�����7��r�J�f����$v>T�s��ka����>��(��"x���O~�����=�Kn��@6�_��d*�e�5��j�=���Ad��`��rG��?;vF�������.�Hg��Q��iHK���Pb�������pTy��S��0�a�4<�P�Yh�9���� ��zW����Z`*�b���`q��Q��K��.?Kj��O_lG	_�*7U5"K���Z1��H��sE���9j�}��/^�p����o���)�������6������X`���F��m\����gL�vU�����Q���;�������S7�A����uK3����j�7�����z�T��k�{N�rn@I�$u��{��ZgS(s`k����S���ku��
��VQ�����H=���44�E-k.�:�MS_� �$���LS����i~��x���,�#���UA��h��������EJ����&�P���@6�f���i���+h7_���������V��2����)����s��!B[����
�}\�nf�T�A�\U2�ga3��*4cL�B�R��>U�P��/�'.
��>z"�.�
C�91��0����"��\
%bwl?w��m�fTa�M��g��]�n �(*z�	�����<�����{�9�,��q���	k%������US.?��=�YZ9f>�����H����C�c�X�<�--�K�����/�����-��w�S�l`�
qT����Xd�r2\������=+]y��*#���7�q1��W%2�r%�i�E��\!0�T��fOZ~���W���R��J�s����,����W��jx32Wu�EY��TR�k�81
�Bx>8�EQ9u�;J8�_�\�+O	L\����lp�?*N�0��%��0p����Y�.���x�z&��t��9���@�h�f�M�A���7BN�cU{��)�h��g�|��Af:d�����x��a�3�J�W�C��#|����,8��A�n��"�0Q���sA���#L�����t���K�x�����^iN��-������#������n�NX��%�^��� K%M��%m^�<z
.��R�������5X�A�����`A
���^a���0b���4E��o.N�w�L�oQ������3y�S3��"~)Z�~�Ul�l���PD�W����M��q�j5�j�J���Q���N����]���?�Dm>jp>���O�
U�N:�����������kW�(E�
�%'�j��#n(K��)�ro��S������Q>|�p�>�8?~]�o������M���D;I�f�x"����4��
'V�P�b�B�%�Lk.�DRQ\}
Oo/��i/�eOj��3N���\��Vq�����: ��6�-!������Qr����"��t�r�H�%j���!�����?J�=RZ��ZzE~��
z��86
+��>�>����a�w���njT����3-x	3�|��,3S��P����^�����N��O.0��0)��g^�7b�0�x�d�����:]�~��H����0��2j(���qg�>�?:T��B8����g�����G��|���lp�N��M;8S$6L�����lJ5.k����@��7�j��]�a$$�`����b�hB�N����������h^������[�5�Pn�Qg��R1,���+z[a	��NH;��h�����8�_#��c���]o6���$�t4���'<���D��"��6J���^g���t��vz��������j��\���z���=�����l�Gu�a>�]�G�R����]b|� u/��	���*�������=#u���*&���9[�tz��Q�/��\�hq�2���.S{n���1�L�1B�#��p/��>�5>D���Y�#�yo�������=~u�����)-MU�_����c^�������P)���.E������N��S�@�@�{�����?���j2vX��������#�Aoe#6�GDX?���B%��\��
	�\�,��i,D}Q��l�}.Z��H�R>���s@���gq��1�"ET�}j���*�!��3e���:!v��O�T3_�t8� ���T�\fu��J�SwybT��J�����2&�8���������v���b����3�k�/Ypt��3�i�j	Z^`go|��a���
���m���m")�-��ro3�y�aq�0[�:�"�y�����m�oUs�V��&�[�w�w��h��N��w=���o56�����e��_A���_Hye2���q:D��p��C3��y�T�[)\C_���|��(<SX������\�%���l��x�?O����S��(��ID���	��e�5R<������ll=���o���LjM��<�U���\y�~���o�i������T��Z����m�7����%�XU�dX�w�F���b�?�_j@�n��P�����N�����r3,<�S�������?�T��k	|��?���@��>ZH
^+���$hL�V�U�y&R���I�mn��NI�\uQd���}�����`�����G����5R��J+�X(R~�s��N&|���K�)F���7{�������P���hl�Gu���ZU�����1�u�/V��P1;�	�.@�aVW�r�q�Xm�YY�W��)��9*��\NA,}������U���1�����
���-RMR����{@K7}�3�vU���� dM*�J�*���vog�����${����*��b����Cs�����W��dZ��k�^��pz�����}x���N�Q�g#���Y��j��>~�>?~�����Z^�9{�>:{s�-7�������_�yQ�����+e����U��J�������mE��Y�5LD�,�on����~�{m��?���Q�Z��$��8zm�y��Tmi�'*���Eek���|JA����k�_��mB@#1�?��d��[f��=�o�6*-�Y/���N�D�5g-T�����r~�bm�]���+��"��W��~\����!m��>XR��)�q���Q@Y�<�@��3�����iE_��x)�~����}��e�A;a����v��2�}o{����{{�����R�{������.���H9���y�������������
A�}bN{�p����{_�����x�G��bk�Hv������k�/-4P	��u�������S�(�3�E��<N�
��h�Z�	�����]D�����
y�]@`j�-8g�<j�0�zg�lr[w��o��&��b7}��tR=�S���X':\e_�OS�r�Gi�������H�1?wY�Y���y�����x<��xkkldl����"V3�Q)��\�J���^ck���iln�k�?���Q�(in�9��[����--K���;�+w����kA�
�,`�"���{����)�8��s�R4���p�tdgC!��	c~���������x��Z���r���.���Cy��,/��z���"�������g�������������twXE�q&��E������~�^��}��V�v����(�T��[OQD�
D�Ib2���P���"��^L�w2��jq�������?�x��n���nD���m(�������
@���db�}������������������*.�
���z��n��$���-���O��	����X0�{R�b�h�+M${���6��X�J5e-+RaUT�J�����$���Z����~owiq���P��-����m6��2�������K�Y��eP�*���rq�rTh$�l$:}�����C��irCW@(])����z��9�gP�.�C$AV�;����z�>�A%���w�o��U��9����wT;���T]M���e����~M���Q�C9P����`�EH�����xw�32��V*H�9K����,�1��MQw.���G�US5O��NF��3�[a�(�I\h	L���r��T��H��;xr�6B\>6/����PB����y�Od��jG�Z��$���W������a=/�"p�#$���?����o���m<��*�L�j�5�����M�xf����������e{�x����P�
���s�_�����E���� D��-�������-6�������\�qbA��of�\��
�g�#HkN���FF(�J��]�X��������������,o����+�X,����#�omr����P�����O(��J�/�C�L(Ot�'R���/���a�o��Le<n�w�Si��d�}K�Q.�JYv���r�h�K_�����FX�L��<!�����r��DV�=�� �[���D��	"��=N���!y���)�B�y�qzjnV�T����B+Y��C�L�1���	��V�\
)���g�C��������2�5�@��8|!J-�k�'��T�bH�W�+��E�����Oo��:)���u���c��{X�$,���:���w��~x�)�D}���a�VU�t=�X�C�U9�lr�O�Y0���_���
�`�;�k�����Nw�G�	S]��$x��3e�e�Y�P,�I��3����U6�?#���q��t`�2dH����������3���#�zu�N��'e��Wj,;HL-�n4AQ0���$�S?��*��&��qG�\�h�3�i��Yr�d�&~>����"Q\93�awt#��J�9<u��,��"���G+8c���(�{#���Y�~�F�"��z��{��s-�sd�;[+E�\�K���[T:"\Fv@��]�L�b~���M���M�H�?N&���>��V#��k�s��ZI��Ry������$��I�sa�&���G�<������SI�����h2�	H�n�$��~�������Xrn��R�0����)�|cww���]4~T���F?}��4:��V�5f��a[�Y���d`N�r�o����xM�	T�4�}\f�)�h@0�������6�2�)�-M����r��"�C�U�ZRQ�^A.+h��-�B�����?I��'(�SV@��W�Js�������Vkco/���c����h��j	����m����r�9]��m�
����X'2	��{����uJ&i��O����c�8]x������|(�n����jh��t�*3���+n���j}(�
6o,}�}U�M��Y8���n���y`6�w��k����OEj�.�F���ar�.��IU�z�\����u��@�i�;�y��2O��x�Y���<��/�Y�U����_[�g)v9��0dY\2�K!�=su<��\�"��=4=2g��0W�)F8������k�a��)�lB�;��{�����TB��3�yK��@H���Y��K����-���F" ��F��r_}iD(c��x�pg2@Z$�����.�wEKyI��8����b����|,�{�5���>����#��C����SX����
$�)��
<h8�~�
�Q���?�@��2�o�n���y�������VT������{����(zl83�U�
(���K��\3��x9}��$c-n�<�����n���%��n�>����"j<W����P����sS����?�$a�+-&�|9Q�p3������^��
��_������:9}���
UT<x�NfCX/b���7��X:���K1��-.W:�^
���������H{�}��vtx��q��jR�\L	�g^t�fD��4����9|�i_�Q��p�f��gem���fe�����y?q�Tw���
�����m�k�������
K�i�]���G������5k�_Y��A�_�9�����%O���#��|\��[��k�e����=��<a�#����r	�������?����>����F�Ox�AY��U������ ��
�0�����`��L����mT�-PDe�R��5O�-��`Z��.o^���-�v�o	���"D������.d#�1���OU��Fja���.>g��t��G;�3���n7��3�3V���%�#�T@0�z��)�L��.���.�x�����^E�Yt_��1��,v�p������2\���s2q�A����~�W�a�����B3+t�:��z|��`n;QvV���p��H�%^\����~��z����� ��l,^p�����)L�F�v����F��/�	Y2�W�K�wNL��\��}�S
���3C��	����������oqgh���s��#D\�hMM�#TP	]2���Y.��A*��
�`I\�m���m,��7��CR/���gVi����1
��0h7���6~���FdT��������������"3\���Q��{��_��S��V����oL�M����6�v7l
c���H~!��a�[&0\p���������eZQ�b%�m�
�
��E�|O�f1�����:�>����C�v)���&�G��kz��YZ�73��!�=J�����.Z�Y����Z,����LU��G�D�������2�I��#q�V���-�����ls���W��/��y�W�;�&42�^u���$=��e�����3�����Z���b���x�I��jQ�X��K�R:��/����5���/0[��U�������X/��m�$@�y����!,�.�zGK�r������ReUc��{�����]��)�*�R�7�v�?mm
~N)3�<JBhV��M��Z�Y�
�S�����4]�q�yklM�EE��y���/3k������[����kj���0��z�Ur���L��P�-���d��PC���|�]������b-Z%m��%�O&0������d�����i�c�����Y�@�0v�������8��G|�w��3�l�-
�f[�������x�(�!Z����Qos���Px��	�[�b
���q3�$�7��c�E�~D�k��p��������?`T�>L�����+�]i�d�Q�� �8�1
��B�f�������ZL������^��2�#)`��0O�(t���H�U�R��~�����/��V������"�2�'9�� �3�%{�#d�zp��aB"��:���+	NH�A�������������I����>�����6��Q}���|O�����^^%�D�+,'VG�i�!�R�y����#I��XKP������-"�!���k(/ ��Kz�g2�x^�|16���jW��+�Z�B�9#�hA����G*�����o�^tiv�[-���XH���1�R�[J����U9��qE7A����8
9!�Z��
�'>�7������B����\�?G�Jv������KZ���*�������E��7P%������
'�	[���~<|C��W��+�0��a,�$o�����$�����BH���^?�M/�~:�kqQ[��������#�3���(c]��v�`?��P�y�`)H4�����d����#Q�f!hv�
�P!'!	#����#�������
��o��B?�V����o��	���@�
v�,=��y^���HX��+q�<�<�^Z88��^d���U���%����_�>"�n�������?���f����,�K��;�	Ky��?	]#�Ux7�
@�
����f���2���.E����wZ��j��	�g�������co����}x9=��|q;G���0��^J�j�t.,m����	�}*���}!C����*�>��R��vx����������F��k0�nb�����c�eI����Y���#������&�N�#	+��K��k�������	&F1%58����ouP>YE���C�����nJ��8T��F
�����������0/����u,:	������%�@}�#��[��Q_���}�����=��`����3����U����l���M�-d��\E1�b&v�B2'm0�\K4����UR������}"�;%WY��$��u��co)�-����O���#"����r���B����!������n/��$8���$���M��Dz�8��r6�B4g-�q*�J��r�����j�5��v(g�J��y�s�D����������������6j��Acc/P3�������b+��tL��
u\X���0��i���xt\6�>;����(f���;��B��S�'80/�������	e���l��S�(�{E���F��s�5�����V�����|.u<�q�E���4klD&�a1����d�,��y��wkE^Nl�s_*�������Atgc���������)k�esh����0�����V�Tm��"G�'@���2��R�����#75�����S�mv_�	��s�yrz������
a���d�f4wo�'8<Xf.�6+��^�lQR��N���c?0ed�r0L�Ew����>����`��a�Q��{�j�UmClkg�r��^�%���lH��^��1EpPd��H
�P�Bupb�$����F�� �]�=}�����S��5<gv����C��P2t����@�r�]�'��>�`��������qCJ2v�ef����h_��;fcv�r��.C������q�07�=�A��Wf��BK^��s#v,�X��;i����g`���`-��G
2�g2����QZkAp_H�K��f�
}���r=�h�\9��������x`�O�C.�F6HyCZ�J���T*����];1�����M�j���Z�!�������}�#H'/.+��0�K)!w$9�S�V1��K��N��PV���,�<�@S>�7m���S�nO>�!�n� �+iy�R4+��E��`Iz�����J8�>E���xz'��"�e�FK?cE�����W2�����x?X&��������(`<��#c{�����G�}��H[���Q�"�sR�cz=��JQlX�bb��I���L5����\��"�N����TvikE��<���'U�3��h��M���\�C�'��x�c
uY�t��Z��_n�2)|c�x�u�`��%g��L�QI��M�N5�xh��2k�z�]<0
^��|��0���
]�q�1�a�?���SNN�m2���^����R�[��v�e����o�������9[j�W��.Ui�� 7]�n,���A=�;
"G�R���b�aG�l�����g\jE���x/�dS?��W����r�5d�������J�HkY�%5���is�L�����y:���9���ry��n����njtK�!�^j���
1�E��X�W���w
�3�pm��h���������������c�����9H�Z��O����b�hb�
�l���5 �g�iD�G�W�V�_��"��R��z��J�d��W3L������)q�<���%U���|��p�N�7��� ����:�\�%t�]h�3���s�:��QI�4�
����@��rh���/msK;o�2d0���`X�h��/i��/=�C�f����0�Ln��1R�i>�w�t���M�\W���ra�w����,[e��@_U7g2��(���J�GP��BJ��C8���B����u�������m}����Y�C(-r�
{|~~v����k�XAb�\��Z��P���FeDa�����#[�<X��iW���,���Gy|V��xOG��<y�e����C��+��@�$��F�K���@;9#4(� �:����2	g�E	7W�=$�hF
sF�,$���
�s�1\����\.>�|~xz�����vg�g6F�Z��kU���� 9I���9h������1kt�9?������%����6l,�*������'S���
��F��S������GM���BP�&}�J�e-���~�[qm��\=���!J���(��2i�F��2�AZ�
�/�4+���
��R�k�r�����G.����MfK}�!�����ryC�+�NhXi�o_i���F�T6��`�[��%�joG�"�A��kiH&N�
m'�������D�0�����#�����Pu'_b(����y)�Y��0���x\E\�����UJ��}X����/�gN6�����	n/}(@��>�p�U.�?��>^O$�n��b>�B�gE����zKC��P��*����[r�(s�E������.��V����7h���a�ph��>�%�a��0�vK�����wx��`���$�@��,�e�*�W��er7���GB�`�p��io{�������<�����xY�X3�F���i�9����UG�9m9�l��������3�����SjH���������`�A�����u��������o�O�S@13+3�����93��������|��fl�6��J����y"����'HG��\o���d���4r������_ ,���%�����O=���V��EUe8��������NfC�5��-/�207�����P:��`���c<V�u��]���"�����&S�[���hEQ�M<��s,j�� ��u�������)�JXL���m�46�#0�5"��/�1[��k#M��m��Nb��!]~�����#�%�b���D�F���c�3j��R������K����H7"���(����)�.
.���	��o���1I�Y��U���\�n*�2�6�U�E2�9��E�Y���5N���
�nj�X���)���=��$�-t���
C+%[�&/��/I���]�l��5G�e\�����K����h�D���;����J�l�&S�#��f�x��n{�|��1Z�w�>�d�j��c(��]��5�k`��0��,X�����)����C����$\�e�6�������,
��X�]���ko��Y_V����F:5��*����HA����H�Ta��@o��!��1���|mYg���k+���
j��
s�	����AQ2S*�@������ys#}j���|���v.Nk�gv���^��PH2<3��&�79�	w���>K}CN#"@�\�Q�
�����86K�� n#������KT�l����,&����2]}���6�5#���|�~j38;��*��U��N�������.\��e�����Q��������y��
��J8#|bm�H��*J�9����pX3h	w���.l�l�L�X��W��)^%#02�
�80y-�M�vf�G�+@}&��B��NU/��CVI�\������ I��T�E��/�K(�kgy��5���6�ft#~>�m���1<�c�l�nm5\t����b�=�L��3��;��a�a$TNy��q w�-DQ�&����d*��bb+p�=����
7�@�H�|8���
���)�O{��X�����xJ�<m�Sf%�����j,�"��j���ZR�h9��8���e�_]�����E[*�:�����)}�I��
���2�O/9�^�5j����x�3��62��
��p@��jt��)8��g�R�~<i0����m�$���@��T�r������d��0�v������qL��,�;�9��vu}UC\�������8$���.�����>m�v�j����H.�(
W� ���t���x��SJ�,Ul8������Z�������"~B���N�`SU�~']�Fh�������2:UK�$@;^B(L�|�St������dM���c^R�4��lvT���ZC�`"�t2Q�����oF���F��#P�5}��C�PH�~l��x���6��0a����'���+�Z"��o%�G�)��A]Z�}}81�*!$v��!n�cX��~B���0��3iv���xM\�6@����
	P���^O��g�#��
�/���=<��T�i=���sHN^TA��lUM_������S�++	cH���k�d>=:{u�>���}�s��������@�l�]���.���`��Z>U'0E���k�p8AP�-�UJ�"��$���U��I���`�>	��,��\��yb��D/��(���w�Q���p
*=GW�dN5����[\��l�	��2	�&�'M�;(wD��r�����������C`E����l�Wn/M��R���M,�D�d�����X;�u�����~�:<����������Q������Q��$}�T)��Y%�*�[���4��J����h|+81����^L�q���V-s��2��J��#��~�p-rv����t��`�h[����
T*CO��fm�jO�u]��3���?rk#��L�)$����F3�l��c�Bx��|g�Y')^J�qD��s�BYo\G�T�x������rKci~r�n�����#F�6�����p����f3��?������"E���5��"[�m>�
������fm��b�[)����r�V��
�!��,y �C��2uP�s��wzy��W�������UYJ����<��HX�'�8��L�!�m6����:��FV�i�����'>#�:�n�U��=���^�{��	�Z ��.�����`�'���t���u�3#g�@�yh�6E�3�^P��s�Zk�#]�fe���E���G!g���u2,����9�E�z�	R��H�.J��e�89E�&�`������Q�+���g��S���D������B�Xp�_K�|�KC��+r���w��hA�e
1 \��r�D����M�$t[1�Q����]�e�J.b����T���3J�f1#�h���]f�P���V�3��������e����Z��Ph��V�9�'������lsfO���]s$�$N��}���F�#sN1+p�5�s�M-+5����11���Y%Q���,Y�q��e��R�}��q�@���6�M���9R���|^Q�vk���sC��YB�/�V���r�Y�s���L|���x8e)�@�6�TT�����P���Ey4rH��x��B1�%L}����V?n�P�uR�����IJV��I@}=�R"��@�o\�Y,��Y������V�&;iVK4h0��UC�rX���R]�;��XbyJ��#6�5��P��Kj��Q���vxiB%�R��l��!�
��NC���^��Fe���7�N�8����S��eTM:�2�i��a��^S�����!��y�������L�Yh��{���8W47��r��?����QdoTd�:����,��X`;/�0�pO��"
�8���I�>[��q��v�!����������,��w[���Nd�S����/Od2����E��GI�=s���������/)��5�RS��qW���S�e�
+J��AK�~���~7�L�R���*�dO�p�/FA��gA���c�c��-R�����D�M���R^��Fbz�6s=�V��m[��cl�y.��HF.��'&r�?d��c�'��!
z���Ud��3�q�`L]���)+�[���������L�����j:�B������+8�$�^��h��7?fK��d\��2��r?�c��u�H�
n�zy��-��_]�2�I��Y�������T�B��������,7B�W�����>!J�������w��@c?/��p{���T+�k_k�+�G�WBv�#K=�q��?jx�{s�!6zs	�w� ����FC��x�Bc��x5[�g_�@Hk�Z����5�T
)��2�y0q����Hy�}�.K������M�H����Z���[�	�Tn�r��+Ce����'�����us�/�RK��"?�	s���sx��B��;w��d-`��)|���d��D������Y��>��H#�����u���=%����
�L�������+K1���Tz����6�vJ�s��B�z8�Eu�D�[�rE�v��0l�#%�Ah2�@���B����0((�����B�$F��A����e��0ade�Gzc�1;���,.����dT�]/���L�
�gyEAz2�!h/�?JK����k����+a���n�K�?Y�0L����{X$O[����l7~��9���������O���u
Y���jO �F6�fx�Q��!Hg��h	{��nlm����b7���r�����!K���\C�fI��b�m����f�z5@FG�����-��&��vgR�g��i���XW���]?�zZDvr��|�)��!�"���
��WP?&����������nRI�q��4��l�lh�D�,Z12D������*N��S�P0 �1R��m3�8*9c@��$q�q�!{3Tk�~����D�{�����0�C-
dv��DM��q�M�z����g����#�
Z�K�<�+�'-u���F��[M�c���C�y���F��~�%���S%�������R*T�xG����T���.a��1�����yYB�70�J7
$`�@�T
�L% �'��7g�O���_�)Pd!�	���
.��2�tIxq������������������ ������Y�����we��5�5Edhd7�>�����pP���n)����{���H�-��e6�J`��
ol��c�����na����WsV^a.���sRE�9��{�)K,��>������ �o&-C�AB{&1y��������o��d�&��08���	��^�]��[f{h7Av,��R����i�k�����'��o9YV-19B�o�LJ[�v�'��>\�?��D�b��4��B]�CE��S����_����T�W�f�Y�{�\����M��=��\���7u�����|�����Pi�H4 `�CFa������"���C��9��-�#����N)��X"��Y:,'Cn��DK3�mZ�]��'����"'����E!���wU��`���/A��"yz�<�����fb��,��E[�G�I��t��
�po��3e�!���.@�����m#z������A�������J���q8���T9����:�z�QW�*����fI�T��c&��������������\���DjDM��!����d*%_������f����a0M����tMH'c����)�3��fE8G��J�Y��h��0�{.��fD�rS���D����h�7��N�����"pi�x���#�"�@	���N��Qp�)K!��������h���L��yG���D%�&�l�����=|:��*?�fSE{
{S���9y�>�����|^��M{�;r:N��HT�����Q>^c�E��yG.��/'	4 �67!{+}t�0G�e�<�d��Q�>�m;����R5�,�"0���.�"�7�A�wxG��� �$�Q��\,:���{��H
g
�`���k4������7��H����i0T���'�7������~.G�(�U��w�,E��S�����G��%��x��\r�� *�l���Vm���2;N��@v�4�C�ir�]��6��RMn���2�8�%}�&�'
�W
����K�	At��=��K�����.(�b��I��g��D�����y��I��r*��)(�A�nJ�E2W��!��l�L����l�&M��
����5�[Tn?��m4v��Y��y�_��gT��U�}<N�t~������O�G�Ob���Iu�v�?2|������?�{Q\����X��@�U��'E�x���]����0�Xp���
�4�h����f���j��l��Y�6��R�E�{����>�^��i��n:�7@e�D��tnV� �Ja�U�&�7��rI�R-o7/y�~Nn����bE�G��U�UG9]cs�F��J�GDO�]J.Q��A��]8������N:%��D��yBO�v��f�Q�����B���O )��g�h<�D��pU� r�e�"#�a����Bh��Ad :�J�����VK;Q���(��"lL��p8�N�;"��
�W��e�C1�#��*��i��������h�<��P����s#,f��-���������3s���X�����+���g�+�^�������=n���iP�����g���9L��j_�
���w�5a��2_�
)	��|N���_�E�@~4�n�1�6�9F_��(�5*�!�W�s���V���LhN^����>��n�K���:R��p�/��*��\R���[�7�0w�k���~y��\�W��e>�g���u5��b�S���D2xm����u�g�$�L�3��+���m��T6��k"���������+�<s���aE��E���M�(�YK��.�"�`-:V��+�g�R�m�>��q����'��C]�=3_�E���`'���O�9���G�1]���a�|������2�� �WO��9N��79�� �\�Q��*�^���k��s��J����p�	�U|��8)��STS��v,��&�q��D��T_03���$�=�f�k+{r�3O� �*���@$A5#*���)��J���3(Z��|V��V�P����k,]�Y�a[:�q��������P�9��-1nG��gR0`�cU�@�!�nOG�.�	�_I��kB���B(�'M�I�ifNB��ui��Xx:W�E�����S������e�+��N,�'Y�]�b�h���A�B��O��}���U|".K4Vv4i7]�7���D���_��.x����Hg�V9M���6���8����Y����)[4E-����2��T^��Zvk-j6�e�=�N#�u�w
�e�~��!(pX<�	{RG�Rrf*I7?~��q�������@��=�-�Yxku��i�G3��J9u��-a+a�i�\Y��fMI.H��*=��1=U+�Jk~�L	��x��zY=R�_{%"3 0��h7A;�\up�7���G:Y��k��E�1RU:H�����O)�A���<OP]D�7��k��P���(���W�x�&�n�aB'���LD�o�X���������9q��n���=�8��K�v�o�5>	����mA@��H�5�$�k8���A����HX��?2\a�pZ'����gF������"��gS�04�<	�&��`��^��~	sT��{�}e�T�����
8A�bgb*��?�b��U�}f��WX�%r`m�bh�
{�_����(�SZ��`��#�W���q�
o�Z�Jj������-6��
��sZ����6_�$Z�!����A���3���h��M�[���'������l��3I[K]�e�&H��.�1����s''w�!��d��oGGj�B�>,�t�M4Un6�8�p5���H����\��m�U�r,���F8l��b��j����L��]�^�@wrD�=��K,�0��
m�`�tP�/M�H�&��M�|m�uSJ��pG�,�!�;r��Q!�"��oNfw��6RA� C���=]�\�R�+Sv8�^$����X���m��8]Eb�*�+���Yb	�������;��Q�g5��e�h�l\��)��<�6���������r9�fy�P!������Q8��JR\���KJ��$���\U����Q�G�V���DTs��	lV<����Bj�t�����F�?XP��R�BR��Z�w0s�)-(W��P���zk����c!�w��,y��*�AI��#���NUFN�7�M-=����*������h�{��2�����q$H|�pV�V����R���-K-c��<L��$��&�	U���`�f�'����+����ve������.^��,!�r��Ov�wp��L�F��4�Y,n-��O&)J��	����n���Q��������,v$8�g�
���/�+�#�����A�G�����
+�I��@��<2���+�k��M:d�5�YP�Q��P�I���)y�J�;�t�1������Ib��A�G�+��z��$��X���Sk64'�A�s�3�?T���[����59�p�����������m������8������?h�]�S��b���3IzD1����LF�P9�`�3����l�E���]�����V��V�8a��2�qU`
�"��ara(1Q:��{��k�^����Q�a����������I}xz�0���`DU��Fd ��'p>J�)Y������Y��������M���z��?S���`{�~#����82g7�H;��z���W�g�R� ���������+�&�����2������"=Q�y9\�|�Jev������l��%TDI�]�8���q�X2t����"��iv�����\*�=_-���'��?�v�� 1s�	'���W`������7�g���������R��hi�8������a7����_����<
���{�������	u�����%�Eo�KF$���=�������"����������&�M��I�U�\��n���/[l��g��lp^z���-���6�����[9.���'���m����F0s6�~�CA��1 �����v��**`TE���n������#h%�q�\�}������C��Hi?������|���S�����!d��~ G1��B��H��7���
s����h�a&�����#����8�����a	*(]�n��".��a���dc��;�*��&������K��L�DH�F�(�vE7���^�X��\%�0z�����#�O}��y�h����=�{ e�Y�{��@aQ�p��=}�2Qw�`���B!Bq*P�>�-��X���[.��
n@���S�#U��(l���V$b�!V�1q�� ��YEvL�����P���uF����K�5�G��h�g*�B��e�����w���I_Ew�NT���H~Ig'�D��+'��/IZ����2m��Y��KP���]4�d��$3m4"�=���6Ygf�>����F��Du��j�"DjIG��5�g����&.tG�A��k�Mb��m�S�m+G-�5������U���#��h����.�������[{ �!R�9Q��5z��,�T���o���U�;v��H��)���Y�$�h8U�7W�Z}�I�]�{����~��f���},Z^�+V$�*���,��]�1�YL�*�W����}q|[.j��H�9y`����� �
������t��t(���K�1E_]^AxM&Z���qJ���W����(�3
��S���y�9�����,��0)f�����12�����^�������I�����8@����r�9&.F�"�8G��4���S��=l0��n[���H%9:���1��s�e�����6��
z���X���C����|uF�4Q��-���&��&��L�������*��%�p��Tt 	_d	��
�22�\+�N��!�)��ze=�e(kVf�T������\���/Fu�H�Vt��+gr� b��C�!kZ��7z�q��0D�*d�<\V�e%���������.�Z�Z����*�W�P9%�rI���3���g{��l���{V� w�GZWx���"w����>V�	�	�e$5�����!z�vv���Hl'B2�Lw�������<����D����~���i���*:y����Y/d��O���2�������T�vn�e0o^$�����m�0/u�9��tS�����:�|��(�w����]���>/�0^J&L�/|�v�p��{�����U���R3��������fw�?3���Ya���"��)�����mm\���g�+*��D���`��cls/��}���)�T[S�$������]R1$�����6�j��^�`�<�rTT#J��uj��U����{G�ZUt#\�	�%9�.�����	��`c>�?�@-��n��v�F���^��)���y$)��A�2O�Z����c�A���:{�������
9*U��o������Z�0�<�TFB�����z�")��ZT#5����1�;��������J(��"����;l�,��t��`y<��\wap��mS�Y�EJ���vJk2pa�F�����,&�i�`����[�Y���X�dn�4J���8U)j��-�?aUP{PJ���Z(wJC�<-t�qw��q�%+���t�}4�����&T������l�����c��
$(&�����|����q��iP�s��R�``�2�b����������D������8/''��x�r*�ScHtu�T�6���j*��J���9���a���s����'��D��8�l�����l��M�����Hvy�C�.CU�����OK�E{���Ca���9�/*�:4S3���C�m���V	wQ0f�'���D
�<�X9�!����(Z�OD9��9����L��,_B
�q#\�|O����
1�*�3p8K�ZH�x���5��HC����"��L��QB	no�ja���VY4�B"j�K���G���c�����qy���b�x�Gy���M8�MB��2Ck��i*�j��pyf�=���1q��]�y�J���O�x|x9�Q.�s��[g�a���=
��Hi���-V8�d.(��t�P
��	_�:qI51|,��&�_W���j
VH����[V�)�a����D��@h����Ra�bY?���8�-N��'z�l��*M���NSl���d�9��L���6�pv��lid���?m������+s{-N$�^D0��(�0?����u�����&�\��i�5��N�N2�E<�xX��h��������~���)�u������6U'�='���������^�\��t��`g�
S���)H9$��&hI
��l���Z��}�H6z�;z�3DAF���Jn
��
����s9�(.�����/"`�L0��+T$�L"����3�����;���p�Z�|C!�"�
��"q�0%Z�=7������G���#d�R�+�&����<5�`���h",&��^ �TY=�`��L;mb��$����t���H�2"��<�����o�������t�+�'�M���D@���;a�3W�n�U���u]1��Y
��P���Hf5�=A�9��&��E'���3�
+��W�X�N6�9+�;��l:�J���T6c&�8�
T�*�+"�aR�+���W�>�7�L��>Qs���2By�2a��X�J|������x�I`�E�h����[_Y�n_�
���.,���#�JR�1[}�\�b�����4J������C���N�(@��`��2��w�l�S6�����pC9������~@����Z��!3Ty�
�P�����g��M��-l(�j���;*��s�Mv�F�Z�U�iSNp������v�F���r9>I�0��i���
w���-N�!��}�����z���[���[!L�L#H�w�B�>�f�������E�'�f���j�t�h�Y���W�OW��:�`�����
]<*�1,�p��^e�;/��(��+�H���n��B�n��������d�U ��H�������bb4}��L��<5D��D����X�DE`�[����i��s��i["��������*��Yj4m�RN	�$I����.3t���:����9ERg����U�?�$C1>�{V�k%PYiQGT�����t2�\��:��K�'$g9;������:5���n�^R����#P����R`����;B�����LP���<�HT���E�9A�Y�������!3�X��8&@��q'P������S��1���7#�>�DJ�D3���+��.o���W�V��dAIr�P������0D:��'��Q_�D�k��"a6(#P���/{����J��0lr�HM*����T|N�8%�����`��#�|��	��:�E0I�7�_���y� ��� �bM0�� �\����*! -$W�����4��H���#�[f�"&�f�:�������F��d���!�]N��AZ������$".e����,�_�{��=������E����*��K�-�E�m��L(�lZV�Z�	���_������*=�*�S�V����v[i��v�����89#&�v�%&J(����,��>�Z��[
0Vh��Y�e,��__&��]P���I�O���S/dK��hm�Ytl����Vz���H.9�Kk�^it�Lv�*��'9b���b��������U��yl��}_�&���n�C�y����eeJ�@j���%��sr
����H�5z�p/���s�"�����D"���n�;
��*��]�E5%���{��Z*��L���pd��hsO����!����3u�: D���CR"��[���?���0)���(��5�0>z!Xi3������C��q������C_�V��l��z�zN�t���+U�W+��}���U�|;����u<�>+��E���U�Mn��MS����������q�z�S7���=�.��.zP7�L[�J���*\7�H�0���������Lb�uI��9�'�e��������VV���i3������|��7V|��"��sdJ�:]=*e��r�d�~V�S?�*D9r�bN)����O�_��z��C�-�����#%�A����'�Q����\�
]����O�z�Lr�3^XW���d�aB�-��cU�KnP0�kj�F]b/���Qse�9�4%����U2����kJ�V��Ws���!5tIKf����Zkl%$}�t�$�O�W.�A9����d1U�+�b��[/�}}!��2���l���������z�RS����(��	�J����s��F�4[4�����}����(IN��#8�Y������y=�F����|8�m�!�����34�~T��g�0�U�x�a�������rK+�S(�_bQ�2K��X��fhg�w	t�I�����E�/!��/:fSd{�?R�n����Yo1B���d��\�*g����2��yT�5���ri78L���H�S����6��NJ��������/1�&Pad8h�v���p�n%?���@�������(�U��k+�e|�<�������{�*���������s��+��$
/���/z��PKa����2I�c]���Nf4�,�[��� -��*�c.�(<��������|�v}vP��;Nrs���W2���L�ke_�*�H<Kta�+��@o��<)
U|�^������;J$�i[�MF1� ��#��}?�\\[cS�@'��;�@l|3��6����~�;;><~W	��G@��`���]��x��q���{���jn�� 
?>9�c�G��PS�P��3th7�z�����d�)��pM���������N�!��\�P�D��e+���hE2�b�����!��o�:�a~��Xv���{�����
<+���N�O�P�g([B������,��t�<	�P1Sp���1��d 	�z���/�V����	��Q7�.�=�b��R2J5�-_W
_���"�v��0�(�Gw���/���I�}�o=���J]��5�m<����/��7�����<��^�����,-Wz�����T�_Gw��x�]��IC�k�7�q�GG��SO���t�o���B�eX��=a:(>��z������B9�g�\�h�_�U���?���p�o�K8�1qY�>�D���^�[@T��D��md�@)�Q�P�0�~�=���{G���D���Wd������L�e#)&��8�Y������D2���pf�����6��k�JYi�	������B�.=�����$�D�V�$��t�������&�w�Cv���H/t�3��N3b=o����Y�kc��hA�Z�8E���K#,�c���NU)TO��tn��dsKL�6�[Oc���`#������)���r�&@�Q>~TVzL�%2'�y���TS(h0C�q�BD���K�����Q����&Rc�i	U6��A���������g�����G�Q	�U������c�a��7�ZQ>�D����H\�ME�O��A��J�k�A�7D�<�4��[��**5����g���T����)�R�\�g��A�Cc��NK\�JW��p��K���o8��.�l2���C�	�=�PT��u�*����"G{Rmw1��+Lg�Wfd?
���"���d5|::y�E���;??|w����B##�
���;9�b�G��&I4
1�$��I����g���CK���Vp��uQ����ps-�J���:rR�E,������[CWP�VE��I"p���"aw%-|�*��H6'w�tD��iH��������=���v�4���8$�#����!���H�JE�
��3�����E/5�0re<_{j�l��/P���VYZ	�z�T�������pP��C��V�f[���:N��Q��������:z8��������W�oV	��,B�0-��������'��q�Z��^B�z��nJ�y��fl9p^�(��6���^jp(A�p�i���I�!�q&'���f+�A�1�d���2��5�V���T�E���}T�����.�U�?����mhP�Tv&�����_�t�hPU���8SJ���Q�fX��=��ry��P6y�IE(�m�~��vaE����	\��P6�E3�^�d�q�LI>t�c������W�[vZ1�D�P�� �==TQA>�)L��=�����c�!87'�ej'��2j��}���j����BT��7"�E��*���N%O�8�'��X�r�"t����X���8Z���|���o��+4���AV�O6EQ�c�{W%[��b���QT���]:<��&|���St���0#t������Qj8'"a���h�a�QZ'��^�{�o^��{x|xq�wt�_o,���'�=���v��\�v[�2/������o����=�d�}�Q2��r){<��������J��*l��c�X)�.����T��`�F2Xn�+��:�%V�i=.�D����WK�>�����bE�n`�o�b��
\R�Wp�>���S�Pd"���T�!������F��:��aB�
��c�c���f���8��������q��)�����R�� ^J�(K���(�?���9>B�/��[�X=AdM�<r���xG+���Y���S��[�������^��_�UM� vy����^G�w����|Q!���7�d��f����j�4iv��g���C����)�Q���h��PfgR���X)0�0u<��j~��O1o�R�J�*�8��n���5�5���@T	��#��������d1Ow�D����W�3M��/T�=u��:e�Y�@x	� e;�hg����K�$)������9a���� ������
�kE��`d�i��>4��XK�����������	q�O��+�U����i4u�s�\�9(%��Q��6��j6�
(i�Lze�T��v��V��m�<H����r��(�R�K4n�����r�OD�3@}��W������8�/1V��@�P��d��<@D1ca���kNr�������9��ao;��mv����Pb1�q�yL�3��:YmK��A����{��b`��P�z��~]$u'(���
1��#��_�P�����P)o|U6�8!@��)��u#��i�D2�	��T=��,b��i,
���:�.2h�.���ia}����00�SQ���~G�e����e�����
�J�5���r*b�����������1�#�\$�e���`+��RxM�oN��{��~������yP�Z�)�)qr^�!�3�lJA�M�7�j������%��K�I����d~g.�z/�����[�
P�t.24Fw�!�N�Y<�����1D%%���/��w�Fz�|]��A��\^������'�A��Q�D��.�S�9�x��|�z
��1���dt�"��bQpW�x�s��:�/eX8&�`)p�IoO�����HW��F49!�)=9��!6�������s�����N���&w�a����cD6S|DV:��
��G�Fpc�+4�(�qLu)8������"%<��=<��"++�������N�@5����*����:z!�z3��;����)*��ik���9H�bH1������E[N��R��&p��%�>9�����f�q
)�#����h��UM^�x���H�����u!{G��HMa�h��d������7o\S;��R�_hoP���S@��c<N+�\a��bswh<��$��$Oe{y��������L�\��}u�0���O��W$�B.�"E�K:�w�5�{�M��23idJ��8��P���R��	M�[�m^JW-�;i�H�k���U�������������t�:I����F6UFQ�`�J'��L�$&7�=�kU�)}�[:k��5ch����n:b.�N��r�(+�X��2���j�����D��
�
�q���vP���)��S���@<G
��0vwV������
"���;�X��n�K{�c�qv��oaS��+w!�N�o��KG����g���	t=^�Ek��S����Qx�#�����q,�^��'S���v��LYg��c�O�?���S[#3�*a`pRjl��"X���|>4���xWZ��U�89�
��Fdi��
Il������v����,���8W8����^�W93P�X1r�m>���q�z>u��C��*�d���
�����
�Q3�(K�I��\J�8���U�E����|=�������`�tg��fH6`a�X<-hE�Y���/�����MXbQN;�Um`f|�3��U�xLn��B�z��w����p�>�o9kB:��������""�2��e�;F��fp
�(���3�3v��z#��9�WC'�`���iO����ZK;����;/us<`�B�u< )�;�sYI��hN!cM�X����f�����O)�#�l�N�M>����~Z�\�\��>�I�N�9���	"��S��Y.s�S	���9���]S�Y�hVZ��7��j@g�Ut������og$��%
��J�N�0��'��i�Ft�Hg�j<�n�
`�-
8�h����"���L�LK<��kf�pd���~��3�7jCS��
v�]O�6���s?+���A5�K��"�O8��.�/�k������_"\��YG?���E����Gz����G���E��f���E��r#�CTf\|�����3�B����s2��gg���)|��w���4��d�G��@�g���P�J��r;N$2K[����^�+�(�������?mT���������������>`�l"�-p�3��'e&�����"z�/��6MJJ��C�F�A����R�fW$^s��$��%S5[�=Uum������:R&WseydU.����mK����]�Jti�S.��J/�j�"�u��o�I}����7�O���2� i�M�������IW���	+�v��Y��t�c����L��z3�RI�|r�hs\B��$�a��[`4[����L�Hcfn�z��������L~"�	���B]���p�@D��<��d`F		�4����[s����&��B�G\�:���������M��~�����`n��)�D_��sh�/�8�A���H:(8�F��"�'9�H;����������qeG-�K�dU�U:��m���d�r(}%��tEs�|u��D��w�qs�����Z�BU,QW��.
�3L��P�jL��	�K�]b��~�%��#S���xIi�(�;u�OS/����������Z�^S��)���C+2H�E�D����{��1��b�$}���>
o�d�*�G����$��]k'�fX��]_���f�s/����)����;�����n+�������V&&+��l�*g�(�I`;��igdH��Ti �y�x���5�mI��,���K�\�ed�Kta���C'E�ra�3�U'�*���'T��k�Y��ra@21��-�1>�\�� u��U�+�aC�\o��;TTA���a�KlT7Cf�7��P�����zwY��R�Z��,�a	$�U�&����Xe�rI�>6iQ-V@mNP4:�����g�?������
P��EDiO��zet/ �J��9#'�kQ��{��v&������B��>��Vm5�]�~�;Ln����d����pZ�����Z�IT������r=����M�,��L���P���*'^�z�������H.���p>�=�#��<.�l���� �rI����p���{Zn1%�Tm.*���f��5/>�#S�����;�Jf"2��`�&W}�:q�������z��t�
o���+�e8'��	KQn�����x=�JT
��rx'��Di
M�JJ�Kx�&L8�8������ZF��Of>}cp�X�a�J�t�p��)%
,����jM���RZ�Y"��k�*�����
7��V�I�
�2����{��=�5N0S��<lq��5u��o�5�1�Q�!�YL��I���f!S|��}��=�����T�H�z:)&[�iN���� ��}�^q�iVgS.�������D��$ J�RB0U:8�N\�VW�h( ��z	����lGq���S�=>�>��%b�.��c�<h�TM���2�u���2�q���g������2�-��6XQbN
�<���Y�l��kzwM���Z����A����r��_�)VW�RW�,��9,���/y�����I(��zO!�"c!�HC�-c��U�.�q�2`(��Xp�(�r�����Q�Y�&T���a,�:�g�N,b�E� �k���qs������U�C�J� O�
s*�l/6�EJ����"H���`�_�<���;��}6���A;mHJ��lO���n��[����{Y�#�M���6u(a���s���w���Sq7�X�T��G���(k���n��A{k�(:w{��t���6.K��G����CuU�_���7@H����o:��+��*Ar��p�	��P����S�8��X����q�p
z� ��@h@�4�U���|�7�)���f7K���d}P�#UBR�g�����D4vaL���1q����������{�"���!��_-�2'�D�7�gr�d��r�!:s��c\Y����/a��4�E�Y18�T���pL�+�jH(�c�����Z*��PF'����Q���5�Pu]��I���3��,���FuS��$��z�����eTI2��&��!
���=N�S*;M^t�<�n��|]�����0�BIdN����t��C�]J��)r�l������\�6#�O��B3k��w�������
f)�y-@.T
���r�byV1P������������"zg���z0�	$f��+�ZW���*�GY���L0_H�`���S1�T-�c��S�S1���^PZ���%�%!��S�|��
g�c�D���G�|%�����P���JI
���c8?���hk��)�"�o/���+��[���B��=+�������/�2'���@���qn�_���t��Z����U��CZ��������fD2�k�s%{�(���'9mod������dI���q���[C)�6��#��*���-M��%��b~�(�"�9��Yig�=��2��9���Pb��B�C���Lm
Q���~��B�S��i����.#a��q�<�j����
W��M��1W��$C�T�Q�9�Q�eR��V� J*��S����~�b�M�J�D��y��?���8X)�������cJ���J"f�q
'�iM�lE�{��6sx��8������iJ�[����p9�M"�h�����(5�hMa5�@� $�s_,�b;��U����$)���b:��`$�v�5�,j�-��p�j1��Av�'��d)J��N�c=Kx��XZ����3��?�!���b�(+�Q#D��K�#�k��;VlN���������^|%�6	�Bn�e�U�"�8����t�"�"��;����#��:�PW�N-��@|�_P>kL/��������H<�g��; �&d_x�_�m�h�Ck:�`8o���$�\~�*;�If=��HQ��>:tB!:d�l��	�y"z�����e�$������y����Tv.�2����c|�g����|�j6������HG��$�����T��{>�3Q8�]�"�T��
�V������#[�U���*a�+i��E�1�)�>��C��5qL�,��cN���^��b��8� ,��k�8���5�I��i��w**pY%D����S8���`�	��7Q�������&J d'�K����s/�r��;�J�Z	���m��W�.��JW��4�o)��t��o=)d�������R�/.�n�&\oD
���e���w��&���&�����~O�g�d�����Dseb��>�7I�����o���C1��?�fE�������-I.����Z�Bk�s���Xa>����	������.��z���Vs����j��v�F��i��a��B�=+��G����z���
��~�Z��}`�p��Z���7�TYRG�-}�'�W'k������"]v��ZU�'�R;�������G�O���~]R:�pvB�$�����!h��Pn�*�
;�_K�C�&}S9}0k�[;6��m�jL��k�R3�q}�ebSua��9}.<��������<�B��f]��=I�G�4�^��pj�6tN����k9�������k�VY�Lyo���k ��p�~L=���"�N�D��]tN������
�tE�T����.�S�)��&��W�wx��[�GC�+�7>���S��7�����/��\6Z���0��7U���U_�z�O��j��!��b(P2�/E�����F�����5Z��n� ����@~��+t]��(�!�HgbO�N�`�%�M�H��f����sL�Q���de$�t!�|u*�F�����l[r��6��B���j������8�w�%h\g,���b����M��1����@�u7*����������.=��Dj7�'��0��)�'N1�v��|N]��0�cq�
�����K]��W��M���w�-�'�����@�E������>���z&,�_��@
s=3q#$[MAn�S�aC����TJ�d1����jeoW����������������<)V1���-�>Q�	��rGR�E3v$`;4�����w@zj�f�7�\G�y���H>&����Z�$:c�c��j�J���v�O�>]�Bg�S���HF-�pR�i~7�$�_B�RUh��_����|�>��olH���e��M*N�|)�����P�����}_37J�qP��p_�_�������{ ����:���03�gj��;�@����;�&�[��f��RN���15JA��2	�������g4���ew�S�B�q�$�3n������RYa7��k�g�H`28�
�R�F��8���/M���^���s���J����E�{N������>g
T�E���Y�m���~j<����d�s|��b,|��
���$C��s@�r�������tp��������������������H�BP��!�k�����u�������(��t��#�� �&���U�$v�$`�`�?�L���Q	���N�M�,� �w@����l����
���a�	��?y~��w��mjv�K�b���������p���]:����sBV�N�y��?�����oA�Z.��n8��Q��=J]�Hu���C&k^U�=�6���t���_�"����^�=��=1|x��;��]����m���u���>���`VvO���G�?3��O0��VF9��������
���[N=�J�To�;������1��������
�z+��uv��]w����������1��D������>���M��l����	lB�{nc���1c�I�&�����`0��e0�o�\�=��������Rf
O4Px�� ^U�7�^��[���*k��A���5���x�����uSs��e��_�Vz�k���SE_3FmJ� u�t�d*�����&�ohT���,j���A��J(�J�����������v����n0d����?q��d��[�zo2N]���T;<p�������
�\u*�+��X�N����
c�������`�=����o�i!?d�}]��ml=����0�uHc����[�I�g�9��O���~��(f.�������}D�;E�j����z
abgg����C�~����xA�R�M����3���|xJ6_�=���2_`25T�D��Ct�Z���KD�����gr�	�4b2���%���/p!\�e3�)r&��e�����U��MP���V���'�=��em2������:zd;VQR�b����u%��N~�If�#���d��XR�2�K���WeoI#�a��S��I	`U���BO5�#]�:s������L?I�9�
�d��D�9$��Rd`'�����+�&�9&�����m�[`�&�'��1���8r��=��r�L
�f�=:���=yw��wDh}Un�/Qo
8��,�bPW�T��$�5���J�����I6�Z������<��/�']��wCm���h#����ve�	8g�8&Dhd5$�Y�ny������Y��#�JD�2S������ �|��
��lsD�f�r�|~4�"V�����������i�/�K):�/"�}I�3��C��o��������d�A�=��d����R�)���"gQ���#�h���C<>���p({T��%k�zO�$�AG�Vs�����3,����}ECT�i�/���Y�ZT�����8��AN���&��L���v�����~��w���HE���qH��t�������xj2W�Fm����P��lY�]sC�c�m��������+�&���O��������-JHSR~�����u���ai����@��v���Z1������-�a^���O��D�{�k���9�6[��-�g�v�/���Z�Dg�a�}V���Z��vY�S��X
+�4�x��# �|��.<g;�1�[�S�$��8�eGR/�e*�|�I���L���������u�Z��@��k���Z�2�j��V�	�}�;	��H���"�&�iPMp�0dI��3�>]&p��A3tU<�G�S�9p��@��BMZ�sd��KU1��r�L+9���4`��d{��;�\a���m�d?@�w����@���"��)����i���3�r�O%��;h�;d4��25k`4[e��v������� ��O�N�����f����'cv
��j��:��n�hS�(����f5�+����o��Z�]��N��LG+�����svZ�NP�?����������$�������A�<�J_�I���2O���i���h�o�g��'r9�)M���f�r�u����Y���v�!+��!�����-�rN�3��:G,��u��l���pYaI&�Y/�0e9�<����P�U���F���X�q8<q�Y��Q���O���:��:���������*���*x�K	R%���|��C�����Z�y��4��\������U�18L�u5/��������1'CA���YQ�>v�^��N��{�Ft�'S�������������4���.~@V`m�S6�p�x����^Z��Mi#/�� _ �iO��<������u�:�=;��������k������tO����<o����79/���(+q�+I�f�Do>�]�Ge�Bro'3��=�d1�ZT_br�rY� ]����H -oT$��}N/��9��`����b1
D�|C��Q�������Bra^�w�z�|9c��W��h>������
u�������l�;��k�:(��5:��\��N/>�t�:xQ��Nz�h(����t.���Vkwv�;�aa�I}�����v��?��z�Vsm�pwJ����O��PJm������������]�)�����m�[�nZ�(�#\�|Jc��S�mU~���P���?�x��d�UT� �����fLcU��KlU?1��jn�	�9���m<�����kG�|�L;�D^Y~)v9%�q�]]���<����+$�����d�kN+���v��om�j[�h������y�d�m^K[��?�����M�W���'�%���N�u�~<��8<9����cq�:<�h�����Pz!xD��M��0���>�P	J{������l��J�.�������3��q�>w�����#����C{VP�Z�#;����p�{����DD��3��k����VCjP������u:BA�����0�[�m�9h�����7�{Q�0}��V�.K����/N{ ��:y�d��FZ`���O\N�d�N{�������-{s
\�9���{������
[�Z�3���f�-,�����0�����-���� @�����b.��9-JZrd�v.�0���R?����?}8x���%�J��6�W���
x1��8^0��(�/5>��G���^i�$;M�c�R��	�d��/�,�'��o����d�:�M��-F���}�}m�C��-����
���/:���^D�~@')�
@��3�wR��Q2������*�&Ym�i2E�r����Vo2\��������7���b4-g�u���
}�b�0�����r�_��P7m��`���������{~�_%�����awJ�� ��h��������Lfs�(m��F^��>u�?�r�����t=T��`�H���
�}����r���]��oa;��`:��������
��l�hlWM�b�C_@\F������������
����k:v.�M~$���/�/]�$��7�v���@y�7�P���//%���
(�������������������Sqy�������z*���f�8�^_8���L���3�� ���v����������V*����>N��FJb��8�$A4���D�)���Fam<3�\HBm=����&1�q��}R�����T	��m��m�-ga��7������uJ%�����A���WN�f��
��ye�EN3��t�D��O���tjec0�&�	m~AB��V���
N�&�J����}��M*�����
��6���T ���rtoq�e�+~��}���
����x�t����_�E��9����"�3�*����?��U[�pv���: ����N7';~���P1`S��(�^�K�P�3Q3hl�������U/�.�����V>��v�J�i:Y��&�>��0c�����6�Z�7i��}��Q�9����X�IC^SY�e�6������v�76,|�S�jm��������6�F����1����o��
|��X����03"��:��	�.�ts���^?k_����Y�;=^0l�#F�?f������h��H�p�&9�� 1���	�w@�h�rL����b�O�?���R�T&���S�l�����+�]V������L�U����3�Q�&�4�iq:��JC��u�1��s���?u��b�T�]
VET�:�1��w/����_��W�:	�'���Xo|����(��E�����~\M0�L��2oMw��3�!��)�	(M	�����%!4I��<S�.������mx�x�d�#�~:���}�pYv�����_�bA�|��.���l�g�*g���;��9�oF�������U�Z~s����b��E��h�������j��
�DfLl����.��!/\Ij:�*��{,���&�����~S�D����<��dPM0�\��Rcd�q��Z|����-���h������e��������z��W
�z�{�N�E���ra��pBgM��d�R���gK%�>l��F��A��+���!��N0.��n���
oS�XU�<4�����������M�{����TRa����8H���_K�A�-�s|���Y��k��d�5_��\Q��Z��J]l��P�9A��'@���H��X�tM�C�bu��`s��N@�kRJ(����������L$}4e�uxB7��H��y�� ���7�p�o+�$A�A��Gsw�A2~
�R���s=�������n��5�����:����*�"��z4������f��x�*�-acs�.q�4������Tt���a85����;=�G� �g���Jq#+}��������w��p�Q0:

n����9�Z({,*(�������%(Ht��j�r�0`�'���K��fR�^�I���Rk���i�1�����K�P��m2����aU�lX�h$�O�aL���@�������v�`��z�%l0��UV����6O@_�`S��R�V�1�����V��O��mK����GI~2�������{zrx|qp��!�
��:%�k5�DS����dq|��I��w�����x�{Zb�W*%'����N@jn73&������+�6����p��X��z�^�h&��&�\l�&���l��}
d@�J�Y1���o�����4[b\���"��b�3M3-7��~qvx�Y �S��6�UZ-I�P�lB��E�/~x���\�
A��b���{:X�+AhQ;[���n3�Sl���)Z�/������CT@�������r���3���B5����v��|���XL
�Q:m�&%��<-,@�g?��s��b$[�m���T�
��N�4�����4+F����x��ga�K��(����.~mX�n�fP���w����8�a��/#<��)�E%�	�sI
l+�$3W��0c�����n� �%�(<i�P�mS�����8�M�W�Mg_T"�isn���� ��9/>�v-�W%�'�8��A��e�Y�%?����`f�Z�����=T�y��t�� \/�m��sCzV����&gV��8�������[�t���f4�_��6d`+�S]���Y�u�Z�������fA�����������U�����-��7'��_�fE�#��"�N�A0���o��-��e~��0�M�d,��Pk����������������)j�N�����.
�,`=���E�2����|x���*5O%r~���(�����4 ���b8���������@��:�1��ioz'�~��Q������
�9�Bg���zyW�(������3Z5�����
K�N��������A��hX�hS��}' 7�����]I9QW�������������������{����0����L+�,��[��e�V�E�Fg����(�O��Q�%e��%g4������A�����s
�6�� %3��\o�/�%T�i��Fv�g :j���#>�O��5^�5n��l*�f��7�!�:� .I��v����N�W���g���Z�S.>+�n�D�2�����x�/x
��b�����2�(���qo�������c������-��h��������o���������1X�z���Rf��Z���u5�$�.����y����9���"/�����Y4�
��_{B�pCa��y�	�R"���V�"�v�9N���k�vZ��-�,���r{Y����<��jF���4��	��2�}`�AVQ'w���]�L-��3J��(���r��5J�������=��{�R�#2�S�"FA�[g�k? �8���`)��w�"G�����w_��2�|������|oq"�O���w��JdK���k��I�J���B�����R��w�*������� �-��%��cy����{l�'v�Q��<o�Zb��v�{F�n(!f;><��)�������d#0���D���(Z��v)��PiS�69�&HP�K��rahu�+
Jh�#i!�j��N5�s�&����?��0g4�Lg1��A
��I��oZ�����,Gjpm1�PE_[6�;m�E"��S
9m{N���#�K��!�U�|��k@�%�g�Tg�"�=�71Af���5�E��Iv?V�$�R�;���Sb�"�4�5�o�e���2+Mf}������
!��.�����s�~��*����\P���4F��b�����U(�Xu�n�z����WD���(7�����b��Q���r�
	3�o&$'�I�7��*�U��a���������L{��R�/���^�t0w�8�,x�,�����E����R4�a�)�q���A����9�������==;8;��>�^`����
y1�G�Ui��#(�<��v�R�Q�?5�u���n�����l �5�E���*��
G7J�F���5�p����	�u�����2�Cct�92v����s�E��$�!���\Rz@16���U�'��������oe���0���\���Eu�HE��_��]������.Z���������]#�������t��r�{���
�$�Q
��w9D?�H�(�!�6'x�
���s�Sgh�{�]��#]�{#PKC�7Ge�Y��U{� Y)<iNzR&I��O8������7��2�_s�:�ib
�!��o�K�<������X��.�n�'U��1H���(��@w�o.*����R/u���dZ�|���
�{��A(N7)F��*q;Gq������{�D�R��yj���:'������/�9���4~��I�a��Q��Iz�
��-M#jcn������n�N�{t]6����>���OhI�Q� x�e�J	�|�'�j"t�w*�fP��:�~�8\��53��[��On����d��" ���U�>�U��6~����Y�J�V)��#2���Z���/C%l3V�
���Bu0:W��r39���%���<��'�rv"�JQ���#��������1l��I�^��
z6-F�������p�\@��(�Q��u~t~L���M�$����*����j���8R-�/�>+/d�%B�:�tt���]�;JLC���JJ`���6%i���C��U���������0��j�����\����m�	����tH'���
��O�ra4�����O��0�q]��?B�|�J�8U�l2���vS��w� 4���q�������>��GTS�*�Ln����{|l����i8�����O��N�.���>����3���:�ev�m���O���P�1�����/Y����olh@;i^����i��������B�@�JTBz��f�����A����y������\q��Xj�K��l��/I�t�6R��>���O�tM0�}yT�N52���IO����1O��'�e*"Tg��d{�2�X��O�h��Tgq��q����te)��)��b���x2�|C������:�tF�����fn0e`���,�F�h������.U�'�?���$�[����(�Nvp���>�M��`Sv!1���-�@�Vo7���}.U^}q<��.�{���'�.������{i�^�tGTr@d��B�����"9r�*�@�:��|P$}��6#��6�LD*i��T"�F*!�x\�'X��������t��L>���D�}��C���tA�.Z\��FS���c�&#��ym�����3�B}��]A`�#3] 6����~���c.�@����D�����}�~����q�x
�=��Qo�=������(�68MT���' -*Y��$E��+�]�v-�F4��U:)�.����Z�������\8A����4�rZu=r���in��r����a\�)���`Os=?[��G�K�p"#��wC��*hZC�D�p�U��=�eE�	y��6 ���P�%2�#%��0%��C�~\���mR��L/�7vN�_��VW���|���:��|�Q�
S�*L��%����h��^�����z>������O����/c9���cgwJ�������t�h����LR��ush0;�i��s-�o'/�#){�K��G ��G�����>�6;�5��.zDC�t�'��Z���yqe��8��e��r~��t\����������d2�
7:�J{�����"��{���Y�Iwr[Z��������<�!\�*�R����Y���IG�B����������l^fz�����[*�����	_�zQqI!�+wU�������\V�r�';����z2�@��>e�z-C��?���^|������_����}0"���9\��I�u*�J�_9O	(\��L#:���p���\�x��~?m��,���n����K������(Y
E��bp�%�^Xz0=���{$D���Xy`�T�{B�Z��s����T���k����~���`��M�0�=�q�o�E#��O2�R@���y�_�e7����d�2�I4�%D3�E�9��3��)���a4������h�th-V�c������R}g�8����.N�B)�xm����Il���T�m�z����ge[�����~��*�,9G��e���@%w��I��N�rgI�OK�$U#�������r�	��X���%9EFG��R-�(�L��7�����f2X��~���<Q���W9�G�3��A��#oI6Q�\�=�qtskV%�1�	������|�:|��g�<�3i��^���X�a���71�0HV,�������(���(:��z�;W5���t������R`�Q���z8	���^$��$� ����,�x��V�(S��� �Y���9�$���~�����G�>f�L���p��A��0�PG�('b�����o�����s�lO��d��Pw+����j52� d.��Me����~��%���4HY�������,Ez���
.VG��j+����g���#�������y��#u��jO:X��d��ZqOX��$�!u�����&b�$� �E7��2JI�w��ssz=���������8��5��D�BWb
�|Z������_v1�3�R29�LV�:���f��O��%�����v�S��pI�x����S�����}�{4�?T����L�@��;���������������_�
��x1���*=��<������X��X��1���<�L��p�i����g��h�d���<)��I�J)��E������(��r����_�0��v|�T�e=���O�z8��*���1�W#����\�x��Kgg���0���0h�f��ij���sN����<����GzD���p����`0�(�dj>��LD��+�I�U�CJJE��q?p�*���DU�E��
������"��!����������}"z!�4�E��Z=H����������`�8n�f	$�h����e���\��{n����`^@���oT�w9�`9���b\T�e�-��W���%��g���_=9�������b����Am���	gF����kP�&��������L����nM�j�My�w�vk��;+�������Xq����&��&�6����[$��oL=�{��
���_�*Q+����7��G��![���R;����#a.%�DB�����r� ��:�Z��\S���+�d��-d	�u�>���Y�w
��A�),A�#E����&�5����][�����?��;u���x��r�>!RH;��E��&���%��v%�Z=��~������av���~��M�T*�k.�����%���Y��e����K�8Z#I�{�}���{��yv��{z���z�*�b`u�(����*H�;����nzAPx��,��w��j,�i9�����7���g�S*���k���}��Q�5���a�>g;H�DJ)��x���N��������Wl��6��;]����@�_��S���S/�6������GERY�'H�a�c���pa�����yW-��^+�wi�V���gv��9��I��>9���l�2���l����!�ts�*�������i�gaD�I�7n�7-pS%O���[f����p�W�7���a�)��Tpf��[2��H��(�S��b��T�3�n=5��^xf�'�><m��R��,?��>
�
�"E��$(�}�We
��}�8K9��R��K�9�1JEL�y�Sic��m	������a��������������o��J���I_l�	WT��d���2P��v�����&�
?N1]���TX����b��Q&�t����7U��������e?���UcQiI:Q�����(yb98�R������u��1��d��-�N�������?�x�������������j�l�}�^1
�X�`�V2�LZ�5B��OU�X�P�����������_E]I��3�	��j2������d"���X��{�X�9 ��z6�|�B����������s��]�S�n�|9A�.�����f�	v�e��_���+b��3�q���C�|oU�o'����=��+�$�:����i|5<�5�wr�J��:�eJ�U��5�U�4��F�jt�>���K�������~Uo_����#�'��Nn�y��:K3�~�)4��t�:�w5����K�O���E�g�pfq��M}G�)^�K��iaS:�,R!wo��i���f3���D�^!�zr}��{����H��Ce���������T���a���{?V^?�7��o���
,5u��fZ����w��n��Zg{+O��&����j���'V3���is� �����P��^��[ ��I�����M��&��4��'
��4A:�nnU�9&������BcPq�ry����K%�����u�D1�pFi��8
���Y������LTg�,:����z���Ph��{:�_TTJ�|a�! �/>��k����������c
�cl�6W�����o���UF7��n�����2�O��~7��S�L:��@��J�yn��������R�b�[�=�P����%�7�8��4����.��'��~3�_��cq+��������v|��,k����Uf]�T�?�F�<�i���~���}����*����@��� ���v���R,V8���0��&�:�l��*���=��:�?�`������?tP�����{�+�BA��Fcj���n��k����8�����a��p��#���PiGS��S9*?0��������z�F�6[�5K�+��������
E��j��c�&����j�B��m�
;��3R�����w�[�����=,@��k�`���9����Wx���Tj���q�Y����x�h��f����Vn�������Z�*��&
�V�2���`���}������������1�6�eq�������U�=^��
X6H\������.@�v����
��4� @-��W��*��x���c~-
\������[
����������-
Y�p������T�u��|���#3<g�������rx���9�R����I�� ]�(L�=A�O��vs�G������9���B�)��T�k4���2�Q��9����b`a�;������%\�s���h
0�;�C,��_���q�g���*`�c�Kb��4r2�Z�,��z(���j����/�<��\�N�\za�a:MJO�9�w8�JRT&����D�*~�A�&�xp�8�$�wR����/���E�b8V�n�Oao�e6�U1������S�SSz8O��4k�_��&�w0m����<�zk7�:|�M��5
>�Y(��xe��xzt�}s��FY"�V7�x-����e\`��2@���<G���.��_��n�����OX��������^���eX�������������Y�.�j�'�f<(p]�������F���nA���O�{-�<SD� >���"}b����l{�k�t�y! _���1��������j�������
�l��,|���^�`���S�s�V��%���������
���I��� )���Y�&��!>��J{��OzC�L��CRF<'	���,U�-�4un���*q	{�WV��V�:����RF�1��8�+L��U-N����Ic�#�3;;�?8�8��������6��4�����V������vr�S�#7.g"��[���jKsF1���h����=����m�PTt����5<��=���y�9���]EU��c��J��_*�W����4)�)��L����P�����8�K/7�������n�C��@$gA{��K��qJ��^��������gp����P�NIw�a�Y������V'3�S
��.+u���6R�+C��Ne}���D��;F`|p�<gP�u.L��;�V������^�;�[����&��j4��<��0%��$�}�6@N��5\��7�3Y�Dc�]A�V�r?�H��S7��������gW��S��~)��
N[�}��}Y�C�XW2�����>��*�E��R��[;�����]i�����|1��wo�����!���\�1c��_���b��U��p�|��x\��������"sZ��g���, ������/�Q�������������'01�Fc>��0=�Tg�o��
�\�SU@7�.��Pe���
%��('����u��. *�j�Q�@,?�'��=�Taa��T��2��"�L%E�K���7���U��$dj�:�����}OE
�b$���n��]q1bA��
�;���5]w�CR	MY�eM����G���=��=�K����]��z�(1F���0��`�L$�jh��]�����-�E])
]��5��6��e���������t�Y?�JT�kXP�����jB���U�������`�7���E�����:M�'��<���A����	;E�_:v�h�
�,����w�����D�x��:��1��:�d1��\��3�k�w+���0�z����!�-��x�$"�^��5���������&�����!����#�2�[��m�X
��"�����"/P���]U����mT�Ih��=H'��v6�S��
��Fz;'7����{_L�y��w�}f-�����������L"*�4<
� �FT���`T�����#��&e�x5@�Xx���S�q*�������+v�d���K�|�q��"N��'!�S���Ot]G+^cB`#gU��2e@Q$D�+�I�����H'�^9�D�K=��D�OQ��
�sy`A:��Mz��T
��
2����p�TUg=�����Iy*cxc��Z��S�����[	�p�,��t���8�zvp�*���:�(Hum�������Y��~�����C$Q8����S��������|�=��.a��)�����2�; g����~.��q��������C:�K$9;�����-�M��go:`^���E�Y(UA$��j�;�A}������@������+b
do4�	,������%����d����<����@����Hz��=��S0��}iSQ�$T�\Q��j�byW�=N��p_�'�},?Jc����|1���$R���A#���c��c�_�C�\"���
��0oH�R��BE_������P���Ln�����S���Y�1������_�WX���V��0����(
�[�kN�`%`A|��=jBjh����0^/�
� M�w�����#�I%3�H[MXG)�i�Yq�qq���<�O�E�����v
+Fq���0[�]��>��<�	��k��x�Jr������#@&@�?��|�7���8��)�M@�+v=i��lx���N�8/ZpI��W��Bbs	��/�K��Tk����phi��dJ�|�!�r�>M8y���$o��v� Um��b��	�CGh����B^}))Kr|���|�D�IP���HOO��z6��O������d�)�F!;�n�����V��c��H�a����e��.�I�RD��r"3S-x6p^���]����yx#Y�{�C'���f�4'������3U��7v.��o������>�vnz������y�DC�pCM�(S�+D�����8D���l��9>\�=$������q�@����/�To��'������d���������k�%��t+
\�������y��@2:��y��&�
��G)�r�C}>����[L����YzxZ���`r���?�0*�	X��x�}�5o��EM}I�BM�1\�>�E)��T����0�L�]���wL?��G��]���7t��J����\��.��:���X2n����$x�"��k��?����c�
�������orm�����i������>��}5=�!�����
�����
-Q	�A��*B�`��A:����7�\�����������7������|`��������& n�y�5��A����SK ��i��9�g� ����e�V`���S���)�*xU+(�;eik�BMA��([u�o�(5?�#�C�!L�MpX#1�-6�zU������k_~C����Aet�*TJo�79���ny#5��^FJ��J���n������NK�H��P�P�Ud�p7OE�3?�s~Y"��E1qe�jg!��o��z,�XM��|2b����DueD�F��`zYQ����xA�J��h��94��W��� C7]�Y$iY�y�G����:1�K�hQ&B�)��������q���$��]��z�OD�5��<*����S���;�e�%�_!�����p������]eq��h���jV�n�9���S��E���
U��5�|}����Z���'���x�.��������9�c�:������A�v���Z-
w���N���;���j����g�ryi���Ui"��?�1����u�.�o��:�����A>��t
p^�g��T���l���������*�@���G���������~.W�:�7����E��Js]v�8<���n
xp�NU��q.����c`..������/O���{�m�����m����FR�����hQ��isi�������yg���r@Z~W�ttE���8�����L�� ���������wB���R��4��z&cu�����fX�T�?\�xx|�i���_�`36|�e�����7���a�����o�N���8���o{xv~�=>9��w�z�M�U�Z����pK��qT��J������^L��^ds��0	����~��������s���U�����~�N�Yy��2�+o�0�K�U���������a���7�����N��_��gAHJ�l�m��T�U����p�G��S�f���NkwP���e���^I�����L�`.�e�?���
����W:���e5��\�=���?�zD�3�L���K����<�_%�l��3�C����2���������]D�I��z:�W�(�H�S�,��Q���A�v�(�5{�-��z�yTJ�T$��� �����%�fQ?�����>Y?���
���9},���%���D@�7����NgP�����d\�f��F���~P�K\go�����a��
��6F��>Z����y�}�O��bR������8��������u@��x/%������������g��������UA����W��r�dI�d8��}1%qY�Y��E,6v��I7�V��������?K��&P���.�	|C.���T.{�/������A��O��E	�@��Ba48k@M�/��E��a8&�!����K�'B�A.0|��J�I�c$3'�9�C<�R?O%������i#��?��1�nf�<�����Y�I��E�tR��#�����et��%�@��M���19��� �xM�V�f�����9oX'�]1�A�p�&%����b2������8�����x3	�$TD;��x������r��nSz/�����{�����.�/��Cu����l��:�u���3��
���G���[I�s'��"Y6$`��'q���#�y
�E�{�/��Vpo��o���F�&�9���]������Iy����`��K��V�&��o5z;�Zm{�j6������I.��6����!^�������/���}�U9�6UNe�2
�IH!e��N8��0�`���pl�ex�R-�x�K���5=d&����, Ec�x����(�o�������LX"b�
�oD��u'�G7�K�6��RO��}����,FOq�������-����O�]�p1����p�F����^����K��h+�F���R��`2��Q�9|��������V���l��j�z����]yWL���4�/$���f�����b���G�D(�b�.(�sGCv��d�sB���|:��H����
�C���d�B�����(�P��q2�Of�B�jVj���qx)��`��J�p�07l���C8���!��K3�s2aJN7����q���>Z�an�Kd��I�>)f�52����
��U���K=C�
b�o;�3�Q�+��)x�_���WO�I��.~�}���d����S�����;X��~�}��+�������������]�Lq�Z(>����f���O�^��
H�F�%%�����{stt�������* -T����������K���,d��aayE������H_����������2�9���A����Q�>+����T���s�iU>��B���zW+Q���C���FM���}�8�6O
)��dEnjf2����P3�������v��V����+��j����H����z���=��,��� �k�Zv��?~�iw�{v�>s��� ��(������:0tk������@���]z�%�%��:�}�n��q������%���*�����������&�Q���F+YCy;�/��\�a��b�2P)A���)���S�l%�����"�T��9�;��NRIb��N�u?��_0yY#�s��k��{zvp�wF����ud7k[�q��������8���!��)|�f���yM�\s{\�Gh�gM�k��{��odL�2V������n���"�m����3���W��!�}�M��U��,��T�*[Zf:Lg�����5S���$Dgs��B8�^����������Z#�Yo�M#
��^�Vk7��������^cjX��A�EY�p8Vy��Uy�[��@���(�������1_�s�R�^���]�8b���#
����;L.�.6h�E��&O�Umw�L�Ar�F5���g�q��)o���S D#Do�����?��p�fmy�P�.�B�����Wu%�Qa���!�<���8v��P��3%�EB����
��dv'�XIC�-�M� c('�����J����KD&�\�5l��X���R�z�E�&��yqC�+��j>�uE��x��I/d�9��I�+t
���I���� ��hG�jc�$���h���(�q��?�{�o^�����|����S�^��g�V?��~��: �O����w�X���NS���i��f���7/������7,�������+���X��Ep�!`����.^�>�<���ll��A��w��h	N����K�6D;��^�_�����^��{���7���U��J����+����$���F��d�z�������t5�:���Y���|���N���n����5�p[��@z�|aa@dq�Cv.��V��LFR�u����kk�����\������~i�����;�K�������"!=6*��h�N�v�&��V�+m����J��x~�V`�Z%�C���
����`��&z�W��-�� ZR��tv�4������������VS�,1�k�����.��.f����2V�"������BYc��|V�����n�stUz�����3��z��i���v�J0��(]�������.��Z���)E�C�wCm>$���nyw6�AM���n��o�a7�w�Y�;|����������0�j�zB��w�|N������
�R���p�\���h:��w�A��'���N��n����g&@�.��v��M�~���']��v��������"i2��,�x�������P�a��a5��!aw���DyY}���[y�h�������,���1��S`5��K�ZK��/g)��������K	o��
�L-�*����S|oeY�G,%�;��&�l��I�>��:	���cs���V~it`�|�
��3��c�ya:����?�����.+���[�\f{{[Q�@���k���jd(q2��xn=!~����6

��<*��S��� 4E���`�w����J�?��=��0��g]���j���X��{yn`�=b�:���L�9�0�X�FM���h,��$�^����U(�Ld6B�@����C������{����<~&���������O�k��gx�4�%���9�oZ������!w���������@�y\���No5}'��$�q�����j�/Be�Zs����Iar�������s���[���fX�
.w�hg��[��&%���k����rpco��rm��8���oBz)�)��%���,!��Dy�5:
�2��k����	�=R�~a���Q���S#��nxU��H�C�:oPM�!S�.oe��w�����c����<]6�
�w��m[�����S
mw��Q�_.��Wx�W?+���Gu������m,	�/f!Z?�:|�
��`�J��wj���@��R�1�����2��FWi��a|9����?��)x��b'�m��u��������,��CJ&a��\�����M�%
['�����A~	�!`%�m�&*C���S�pV
����2���hl�t�Y[��+H�Q����abz��Q{?��Xo��f�2j4�k��N��.�^/YP��C��%�]�S�F�'�9��6����dQ+q/�F�fo~�6����a�����D��6�n����W��P'"����9��L�B��3}�\�R��n�y�Z�]��^k��3���2=�+c�pq������7�g#�d
ld�Ht]��S!�G�e)�s�A�nD;�A�hG�md��@�t�����V�,����1���p�O�Yll�8c���/���(�C����F���0�b/�e���������3�b_rO���l�����wh8�M����o��l;��aF���F�V��M��*_����������4L���$���@���3b��Vb`�`�J�����n��B�?�:�&#�����?��l�5ux9���+��w4Wv��E����EN�b�-[X@��Mr��w���+������!��M��p��5��R�O`�t���xK".�����������Qw��������7����4P�d��$��������	�������S�a�p�[f4��Mbj
��u����BA����r����~�'+�2@M7���)b�b/m9
����[�����P��eZR��>^o���xr���(Nz���eiw�T��fqP[bwN�$�	`e�K{���I��}1����2��R��K5d3�+�q����mP�Ct6M9�Y@���0��x���,���M�K'0���D9<o�Ii���v�0��q�����<�i%�H���P��vg�=�]����
n�jI^&��t(x���;ZCG����`2����C��T�����������C���TnL��0�aqW\
R�q�Z�'�|����{T����8DonT	*�C����PE�����E#���T���R��U0{\��=L��G����2@f^�HW	�3������v^�H�.F�o�hs(,o�5�Yx^�.����
�7
��nz7i���-.�{WAL����&��k ��� .����Ip���j��k����$\��X'��}�,�?�n�=����L�<���������q�kb�Skr��`soH��Ha&�9��|��
�O��g���)h�"[��X���a2=+k�4�l��n�@;���Ur�Y�
~{UE�N�:�nN���8�������<�� j�R��(]��g�zg|��zK��H��E�<�	6�k� �-�f�KH�`���i��^�!��jJXd):�C������	C�p�F���M.qR��^?G�����l
���V�4�����O�R�g)P�1P����c'|a��;�D���=�n���a�{/�c��s�`��Ip����d�K�H������|������PW(H�'����u�R:�����@���j�FX���R�9�	&
yU���]Az�PA��d2-�qj��Q4e�l[U�!�$�a�BGA)�*�R����h�B)$9�|N��SI���� �+gh�K�~�p�$��G	/6U{e�0I@X���X�<����,wt��M��S3������s��`Huo�]W%��r*�����F�����O���y�WC%r�MA?��>E�>��������o�C���+�/��0�b��Td%��2�Lym����~/N��E)+)�?U��=_�'�Ap3�}�f�up���)�%���R���x�'b�G7Hfo�(����x��������k�;D������������_���Ob�L�,���LU]���\���T�?t$	�@[U��g��0�����j�$7Bz�)��ws�^���������\
i���{��������N��qD	�r1�����4K�W�
F�p��~�V3��������RSiOm	;�F
n!*���5`�A4��H	��Z�s@N��p![y�[��>�_B��R�Q�a�$�
.������6��rX���7�����_SEw��6u��n4r<B�%���H��Q��-�w�#�P��4�8��?�jn�D��vF�����i�yL����0�-��E���?^;��Wy�F���S�y�-b��!���J�@F�$r�PNPF� B�0�"��������b�~�f��N���zZ�`�F��n���g��(pgLjW���D;�%����.h]�y�������p�j�k��R ��v��,�H+��UJ��Y��A�t����0��oh�z�7�vx��y:�c�\��C@-�����3x���t�Jc(�D�8���,���\2�1@�4���Q�����&��kw��Z����w�bL��c9'�Z��M�*����O\��5���Nv�V�PK�t��/	:���rr~(�m�k��)�TuXo7c����|��r��Y�f���y%���07���	����Ui�Q��F���	��V���u��&SK��s�7?�CZ�����cO���{���C<������7y_���o���o��s��M��/���5�X��������Y�� 5b��9��i��.�
�U'UkoF��t��L$��C)���n(s�������������6�#X��-}[���J��%K
���������E�qa���!��i�z������=zs3,yU���e��V
�t���^w��x�R�""r�,���C{|/�����9o�����}�NX/k@1�y��x�|R:�(��A��G�)����/��{�is������7�'��N�4�[����DG�Nc�����f5u�
�������t%&�o�S}T�&i���F#�G+�6�h�Yc�����s)�"�����1���d<���"3��t�@�}���Hp%�#��e���z����L\���%*�����}���`~�B	�
�1��/LPT��0���m��(T-F���5��bE*|��E���}L��~2��,4S����y�4GC:�����w\)��	����P<��R�0��4��RV&��4���������N�I�cB����[�wC��q��y��>�2���j����I
�����������#w���|+��]������g
H�����Z�����	�Y�;O�Ga��jF�Ga�l
��>��S��<�`�rL���9in2�4r%XQ��I`���
�� ���5q�B��N��:tWp�o ��S����/g[�7>n�\(sv����#�{0�czqq����t�t�M�����X��C���W�^���_���A�s���z. ��^n���Z���������a	��6�|�����K�P.��]��%uV
bmM�u��/�l�
?���Y�W��IT^R��������Fp>eR�W��J�;\�c��`N]�:2�x��4�E�����-y��p��������7��6��1����j�B=��K�B��/��f���3_�����=�d�4>E�@���
�d'�jE�� �q���)�:q�d�h�.��G�L}����!a�j�����/u�n"rIn���mUT��c!G��	��� ��m23�"�s����}i��/9��"�O��t��[7�0�UW�]�"$������<|o2��d�WTy�=}�O�;�O��z��=�e�Te�N�����m��x?��b����������x"Sf$���� �p�4��>.�V��)�������GiI'�
���jq�A�����g]�1Q�(]�d�:]�������:�fCY���t�"xv�xI���{d�g5h��)pE�o>��l�m����s�����U;~�rWr��9��6����^��D`�%����2�u���A61�m�[K��f{��j�6l�&��N*��h��|Cr����)��{�$���p���p������|<�?�~�{w�������nv;�z��a��0���{�����d�G���������;��W�K��L�xC�*�1$�b�~�}����1���Sdy�P�?��������V(�$�)�z�T��N=�&��HG����%�3��Z�����:��sI����<P."�����s�����BA�-����:
"����~]�}�2�a�����7B?�[^F��F�V����^oI]���yw�� �����M	GN+�b%sm��P�)��S���+Svs��5�����(mc[��C�)}z
l��,��q"�t�@@������p��h���U��.���A9��28�&K
�l�O\�8���%�U�m.M_��m�d�{�VS5��O�^M-�Z�^/$��)���'��Y�%���3�x�1?� �
K�mp�"�3���?���Y�1�:!����FN��~G�=�m-8�����1+p4%����):�aApq��O}n�6����d���0�d:�G���*$n�r}K�pFA
;V!�"�&�G�$�%I�]�g�<cZ9���;�
�nX���9���T`��-�|�����!�Y�r���6��:�2�����5�����������x���cZ���^�3tqs7�i������5�N���U��;R)�@`)U�$�,G��U����eb��10B����j4?	��0a�Q�u:m�'�w����2���P��]F��u`�e���K�k5W�3S%�<�\��F�qB7��G����B0 �������H�Q�e�x\]���Q+�j�)%�&������	����g>w� `#u����>:�5�R4^���D�3��9�gPo�\.��~2�L�&�py_�������~-���RIV���������u�z�C���\�S�]P��T���P�1��1���� D��x�'����F��|5����('�1�?��w>�A<���x��>tA<�����Z�,~/V�1Gw��O��;�:P�,��j��i���#�9z+;����~�b�����+$l3�Lt�2I����X$����,I�oLI�I���W��l�������*��[-�����oK��z��J����V��2�7��:9�<SBF���������
��U��z;��T�����p���S��]v2�{�x�wS�7@&�B��sa����z0��@���'�{Q����ab���8��=�����?R����"u�� |aR�S-e<�S�0�c�F9��c�LW��p�2���cK�i���ar��E��7Q�����r����1JS�o {�N)z��l�����"��}q{g���j��ek���Y�����������"��~�^e6�+8?8:���5�9����$a$��W|��k����3�w
��8�<+��Y7�q��d�x�����|Jy�(K��dH���$�d��(�"��u���"�D�|���K���3������E�N�)�M~�J��Wp���l�}��C�d����8�'�)�f�Rm�R��?0�Q2sZT<���'5���]&5��^}�L������a������1$I�R��:(gW�K_�������/�4FW�j����F��J�������.��W�R+���?����r��I�����tj����o���8	�6d��/S{��!5X�?�?8;������	�W�Nx��{���!x��.@��
�����}�2�l�Y���;`!�L�K�^r�wZy�J�F>�R�8��9e
�����F2����=�xSKD��;��_i���/`���o_���j~��RzS���z\�
R�
ri\��m�H#����o7�w��^�sg��~%��)������6v:�h�.�3�nV���!]�J��z���_`�h�/C�E��I��Go��'��ID~�o4_4Z/����$�A�����7��l�Te��e&��,�c<�� ~�lP��5���F���d�N/�Y���t�;@���>v(���g���D��d���[���3��������"�&�� n2,�-_���������A�x��;p���f:���M;�L%����00�5��3������A���YI?93��l�s�e+���LG����c[~���H��y4�r�:���"�
J ��gkBQ���kN���|Hyh9I�A���UE��b/4��&#���4v
�����\(��������v���^4��uL���a(���S��v��2��F���v
h�G���S�3�}�vX���M,g	�0�\�am4�9���P��	�Z������o��[|�+�1��Li���9�e�����Q�_g�{���\��"������t�����%�@�O�Y9��4=,z�P���U���3[8*�Y<�GOz�NI�P�)��/_����4��{�����4�0P��f=�c�������N�m��TC�o�G�G������3��}������n>�m�y�������@�=��<���y�������c�n����`��(X���{��(X�a�{��(Xk<
�2X�~o?
�������`��(Xk>
�������`��(Xk>
�������`��(X#�����J��a.���i��e�"���y�-K���c+:�gd&t�:h66�����D������|��1Y�9�	�C>�C[���~��_����hW~!�
i6�,@v�US�_��o�a��
���������-[�g�n]�6�������������������Im#C4j�=�7�'}H�����������'}R����F]�c4�z

=FC�a�������h�1�z�����h�1�z�����h�1Zz���e���?i�nm�O�[�[;��>��>���2m=F[���c��m=F[���c��[z[z[z���^m����{���jK�����-�W����%�6t5RWS�>�f)����{]�@�=�L�-�]�:���{��b4��#��l���bQY
rG�F"NM��+��8�+c���f^��<h?���}V~x|>��9��9�NF����K4�-!��j1���d\�:YH�U�(��a%�+�-(�0��78��}?W����a���
&�_�pL����zr����9�P��M�s:��2����TZ��RL&��Cl{^���e<�����5��������F��:���Z���6�#.�������{���
�f�7�����}���+O3��V�-��y[<�Q��������}��������_F��_�6���v����l����v��/����%�A�����s����\�a!�wbY��=�*2���t��?�lc�>C��\l����V�r�V�l��w/=Q��=����6l�v�����0CqSAc|!�za�����WX�W[��
<�54������n��7�K(_�%��������.�w��!ocQ�������S��y�|j���G)��YY����r_���X�{�$c�?����dC

#35

Greg Stark

stark@mit.edu

over 11 years ago

In reply to: Heikki Linnakangas (#34)

Re: Proposal for CSN based snapshots

On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.

I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)

I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Greg Stark (#35)

Re: Proposal for CSN based snapshots

On 05/30/2014 06:09 PM, Greg Stark wrote:

On Fri, May 30, 2014 at 3:59 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.

I thought that's what happens now.

What's more of a concern is synchronousl replication. We currently
have a hack that makes transactions committed locally invisible to
other transactions even though they've committed and synced to disk
until the slave responds that it's received the transaction. (I think
this is bogus personally, it just shifts the failure modes around. If
we wanted to do it properly we would have to do two-phase commit.)

Yeah. To recap, the failure mode is that if the master crashes and
restarts, the transaction becomes visible in the master even though it
was never replicated.

I guess it still works because we don't support having synchronous
replication for just some transactions and not others. It would be
nice to support that but I think it would mean making it work like
local synchronous commit. It would only affect how long the commit
blocks, not when other transactions see the committed data.

Actually, we do support that, see synchronous_commit=local.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Andres Freund

andres@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#34)

Re: Proposal for CSN based snapshots

Hi,

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

So, here's a first version of the patch. Still very much WIP.

Cool.

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to other
backends before the commit WAL record is flushed. With CSN-based snapshots,
the order that transactions become visible is always based on the LSNs of
the WAL records. This is a problem when there is a mix of synchronous and
asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.

I personally think that's annoying, but we can live with it. The most common
usage of synchronous_commit=off is to run a lot of transactions in that
mode, setting it in postgresql.conf. And it wouldn't completely defeat the
purpose of mixing synchronous and asynchronous commits either: an
asynchronous commit still only needs to wait for any already-logged
synchronous commits to be flushed to disk, not the commit record of the
asynchronous transaction itself.

I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.

* Logical decoding is broken. I hacked on it enough that it looks roughly
sane and it compiles, but didn't spend more time to debug.

I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more heavily
than before, as you need to check the clog even if the hint bits are set, if
the XID falls between xmin and xmax of the snapshot.

That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is? I wonder if we could just copy a range of
values from the clog when we start scanning....

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Andres Freund (#37)

Re: Proposal for CSN based snapshots

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to other
backends before the commit WAL record is flushed. With CSN-based snapshots,
the order that transactions become visible is always based on the LSNs of
the WAL records. This is a problem when there is a mix of synchronous and
asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.

I personally think that's annoying, but we can live with it. The most common
usage of synchronous_commit=off is to run a lot of transactions in that
mode, setting it in postgresql.conf. And it wouldn't completely defeat the
purpose of mixing synchronous and asynchronous commits either: an
asynchronous commit still only needs to wait for any already-logged
synchronous commits to be flushed to disk, not the commit record of the
asynchronous transaction itself.

I have a hard time believing that users won't hate us for such a
regression. It's pretty common to mix both sorts of transactions and
this will - by my guesstimate - dramatically reduce throughput for the
async backends.

Yeah, it probably would. Not sure how many people would care.

For an asynchronous commit, we could store the current WAL flush
location as the commit LSN, instead of the location of the commit
record. That would break the property that LSN == commit order, but that
property is fundamentally incompatible with having async commits become
visible without flushing previous transactions. Or we could even make it
configurable, it would be fairly easy to support both behaviors.

* Logical decoding is broken. I hacked on it enough that it looks roughly
sane and it compiles, but didn't spend more time to debug.

I think we can live with it not working for the first few
iterations. I'll look into it once the patch has stabilized a bit.

Thanks!

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

* Add some kind of backend-private caching of clog, to make it faster to
access. The visibility checks are now hitting the clog a lot more heavily
than before, as you need to check the clog even if the hint bits are set, if
the XID falls between xmin and xmax of the snapshot.

That'll hurt a lot in concurrent scenarios :/. Have you measured how
'wide' xmax-xmin usually is?

That depends entirely on the workload. The worst case is a mix of a
long-running transaction and a lot of short transaction. It could grow
to millions of transactions or more in that case.

I wonder if we could just copy a range of
values from the clog when we start scanning....

I don't think that's practical, if the xmin-xmax gap is wide.

Perhaps we should take the bull by the horns and make clog faster to
look up. If we e.g. mmapped the clog file into backend-private address
space, we could all the locking overhead of an SLRU. On platforms with
atomic 64-bit instructions, you could read the clog with just a memory
barrier. Even on other architectures, you'd only need a spinlock.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Andres Freund (#37)

Re: Proposal for CSN based snapshots

(forgot to answer this question)

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to avoid
bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

Have pg_upgrade read the old clog and write it out in the new format.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Heikki Linnakangas (#39)

Re: Proposal for CSN based snapshots

On Fri, May 30, 2014 at 2:38 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

(forgot to answer this question)

On 05/30/2014 06:27 PM, Andres Freund wrote:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

* I expanded pg_clog to 64-bits per XID, but people suggested keeping
pg_clog as is, with two bits per commit, and adding a new SLRU for the
commit LSNs beside it. Probably will need to do something like that to
avoid
bloating the clog.

It also influences how on-disk compatibility is dealt with. So: How are
you planning to deal with on-disk compatibility?

Have pg_upgrade read the old clog and write it out in the new format.

That doesn't address Bruce's concern about CLOG disk consumption.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

Greg Stark

stark@mit.edu

over 11 years ago

In reply to: Robert Haas (#40)

Re: Proposal for CSN based snapshots

On Tue, Jun 3, 2014 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

That doesn't address Bruce's concern about CLOG disk consumption.

Well we would only need the xid->lsn mapping for transactions since
globalxmin. Anything older we would just need the committed bit. So we
could maintain two structures, one like our current clog going back
until the freeze_max_age and one with 32-bits (or 64 bits?) per xid
but only going back as far as the globalxmin. There are a myriad of
compression techniques we could use on a sequence of mostly similar
mostly increasing numbers in a small range too. But I suspect they
wouldn't really be necessary.

Here's another thought. I don't see how to make this practical but it
would be quite convenient if it could be made so. The first time an
xid is looked up and is determined to be committed then all we'll ever
need in the future is the lsn. If we could replace the xid with the
LSN of the commit record right there in the page then future viewers
would be able to determine if it's visible without looking in the clog
or this new clog xid->lsn mapping. If that was the full LSN it would
never need to be frozen either. The problem with this is that the LSN
is big and actually moves faster than the xid. We could play the same
games with putting the lsn segment in the page header but it's
actually entirely feasible to have a snapshot that extends over
several segments. Way easier than a snapshot that extends over several
xid epochs. (This does make having a CSN like Oracle kind of tempting
after all)

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Greg Stark (#41)

1 attachment(s)

Re: Proposal for CSN based snapshots

On 06/03/2014 05:53 PM, Greg Stark wrote:

On Tue, Jun 3, 2014 at 2:55 PM, Robert Haas <robertmhaas@gmail.com> wrote:

That doesn't address Bruce's concern about CLOG disk consumption.

Well we would only need the xid->lsn mapping for transactions since
globalxmin. Anything older we would just need the committed bit. So we
could maintain two structures, one like our current clog going back
until the freeze_max_age and one with 32-bits (or 64 bits?) per xid
but only going back as far as the globalxmin. There are a myriad of
compression techniques we could use on a sequence of mostly similar
mostly increasing numbers in a small range too. But I suspect they
wouldn't really be necessary.

Yeah, that seems like a better design, after all.

Attached is a new patch. It now keeps the current pg_clog unchanged, but
adds a new pg_csnlog besides it. pg_csnlog is more similar to
pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, and
segments older than GlobalXmin can be truncated.

This addresses the disk space consumption, and simplifies pg_upgrade.

There are no other significant changes in this new version, so it's
still very much WIP. But please take a look!

- Heikki

Attachments:

csn-2.patch.gzapplication/gzip; name=csn-2.patch.gzDownload

�*��Scsn-2.patch�<kS���������
��0� �@���dkk�%Km�B�<��a�}��[j��q���*���Ow��Kv��X�j�n,���q������#��q��F���p��b���-�w�����~��������m�j6{��V�V�d��J���j�������}Q��py{~q~z'�]�������K1����b9��{���a(g|���F^��Cy�(�7��b�W�[��+����b"��E[�<��o#��0��<������F8s[Z#O
;��S?��-�?\d�[SY3o~��U�?���`eu�&����;{�`�������������8���F2VS�a��q��A>,B+y�����#��E� .Z&7�T<�X<?5��F=�^���g��"��C/�l7M�"���#�EAG�T���P:O��(��|�v#
����k 	��
������=���w��nG��Q'��+�2�x������T[MQ�o=���k<:��(o��`[�^]\����9<;�a���w��M�2�#��-����,t�V�|?�NJ�'wv���hdX�o����ZO��������j�E�>�}��	�
�||/ca������Q����G�0X�WtQ��9�\���i�^FV�hf� �00���<�9j$�	T����(�� �X�\��n����u��RZ"-?�X
�"x���Z5�k�T~
4���3#����6����$.>#�(�#�~����l��HkF�i��}i��	�~���m�	�~�7��:�E �^�"�m��T�od����_�o{sG�fA#'�'?��U�
�
���-=Q��yFzy�7�$���Za���<��!�+�'��|<�ay��u+�NU�H%���R�i��` J�X�-��rv������;1vC�+`�=����I�0.��21p�o��:�iG��#���pTI
�9��~��{0F���l��#��S[�=��",Q
�c>�mea]�� �0k��'n�r��V��D�V�@�U�����9�:�k���q����5.�*�Ri3��h�Vz��}7�yr��aO�g�<�wq�y����-/����t��.�~j�`:�����i������=dYA�����Z=t[�q��'Dv�.���c����7f���#�	���eO4(������y�	C.��^na�V�����G�Q������Gk���K���8��#�����Dg��8�������as��8*k�U�6���e�`�
+��9.�T�ZY8��D%[NP��L�]1V�,��/���������7����[��Jc���ry�E����#2t5ce3�Tq�/�"����a�`�����`J������z��n�W����
�>]�W��h���df�g�sL�n����d|�^]^����
�o#�l����3p�A��Qr��%f�5>J1q�����
4�����luP��S���.����=�������	��h��7�-Eq�H�\<�@�b��E�-,7&��<���4���@$ 
eU2��}�\��'������-��h����N��>���W��R�����/�|+���D&�k��2�&I�����m �xz����� �����vBt+Ft��8UD�
�{M@#���?��K����� �'�\��d��N�_�(�|a�5�i�q�5J��P:��E��I���"N�XD<p>Xw�J���~�U���$<5v��c*��=;�a!M�����~:�b1�@���M�&�@���*�y�sJl��f��q����N�$�����H���������&y�Dqgi��C��IbY����y��I�C;�o���!��B�;+�y.>d]�;���H��N|��5�����B~2qjLG��'��Y�����!p	��/1D��g}�����>�^����E|n�����`
����<���G���O]9L�R������	p�p��.69ZS��+;{( �{��)g �?	�Ds���JG���j��j���w��Xi7�H�hKcC|��<v}%�G������m-����R�"�����	�$����z�-�}nyu��0����)g�2w�I���Y�e�W�
��v����\R,w:�b"����������O�:���N���Y6<z��������y��<��,_B�+R�51+�����$2+)Ur�� Q�zDS�$�P����VG��@���i��x�U.�
u=��c�A���*f��A��b������k�E��G6����~At�o���R��1�s�0?�)�7����������Y���LF��s�
	2fhvL�~���;��<��n��~SZ���n�j���t��,�'"��nQ�8%�����W�V�e��&�,7c'�V+�_a�*����Y5g��(rG�`&����X�*wD��b�J��v���K��k����
5���A����s���8To��11��N������Y�W���$DE
�b�v2�.�)X=D�����+��S\�'f����Q#�JE	�h]_�6�^��D>{\���%��B�x��[��3�"�/;��7����2t%&#�����l����8��o��9
����&b�-*���������`�U+U���\�7������/n?��C>9	������`N�L��,�U.oyG�=���mR�>�����B����v��>��m���o:����C��QO��
<�4\z���	1�z����,A�s.�v���`F������E9�����)�����,m6y[J@���7�	���!e��j��
��@��ME��-��l�A�.�H���|t����3(��.64pr��X���m�����Ip��rA�P����2���Smk'*�T~S0V�n=2����%
vT6i3P$+�B�\�#�:o�Y�C�y�"R	�e���I)}�ML������$Y>��zV�
Xp��-�XH�r��Y�*/��|H"�N��E�����
��H�_TD
���)�	
mOZ�|F�����QJ���"��z#��|�|s�SR7��L���ZkU��ex��iEz�P����84n�Y�%Bl��M�����L,��V��B�/l�Z�^�6R�?0�lO,0����I��'Q�n�?���}��������������L�e�^ZX+E��,�bu�qa��_�����b)S
��+�}T�j���RN��)`'�M��2	2��u$�����oA�3���q�����Fb}��?��z�)ma�qeQUj��U�~�My�~��1
�OK���6��k0��h���J�wi}���;�����&n�������������Q�e�����������2�Z;.p�����o���!SyL�������D0���S=���Z0��Al%���br�fa��Zb
x"�����C�������%%��T�]�r�Y�*/	��Xp:����S1�\oa��&���������U=��t18&�Un+u�L����wZA���f���7�����)�-����0�y�J���Q��,����m3F���fo�������=��;8��g����m�I��m�w���s,m���y;�_���f��x���� ��e���>������VZ���s���6���K����L��1�4Q:;,�[$��c�gn�E�4�P:0tH%"�tGL����%�������r���R�����`m�Ma^���m���}p���A]��j�vn�t-P�~cLcE( C!Dl�]��u"Vd ���V�L_�'���z���|�<9`�`/i���w�-�&���+�����5���3�R17��DO�=���s�Uz�����V�5�	�a��Y�;&����)s����������/�t+�������p�|����E�*�f�J$���wS�K���.��c���D%Et�B������"t��O.��B��UKN���v�[���Q��35zpk�A����
~�}�qqG� t��VpwAx��������G��	����Xnd0�iD�zp�c2��@;��D:Le��c���7W���>Xt����z�JZ>�D'��]`���o����c�Y��Mi���X�����c���zkST��&w�S3v���r�D��.$��1Zh���GE~�3��\��m�Rj5�
w����CR�}�k�l`
�a�gK�s�6�S	��Uk����!��IY�
� ��J��
�����gs�$�`�_���P��Cf�0�����h?�����p�f�3�����PJD�D,e��{�cg��n�I�&�B��P�y���M�P[Wd0���wW�p��,���|'��r�<e�L[D���x��TU�M�>���]�1c�7��t3avx~]QR9h<��� ��
�E�:<(w�P��c�G_�;Q�N��hm����/Xc��%��O~G��.�����@_��5�Ee�66.%�f�1��<��p����u�-T���aRq���b��T�.XO0��<�z�Fr\��%���3N���}-_��Y�JS�x3������2i�����$��k�`�W�e�tj���k���	��'����Z�z����?@$%���"�X�������o$���^OX��n�S_Vm�u���m��������m�V"���Zz��w�?M3�[�ES��n�AV��y����[��I���A������
n%��B�P\���~�|x�O���CHu ��^;�����GU�m�
)�VL��rP������*T#�W����>��~��E�&��|R�n/&h�ibp���'Nxt�c�C��k"�o}C8���6��2�p�6��`?����TG�t������>�sqL�a4t�R��L��y��	�.�6�h [D��kb��P�	�"wV����p�h�J���pY~��~���*/L����P���K�w��- ���� ����<�;�^��L���$KR�S��mX���LMj����������0��	��rF	�.�-S'�0���y��j���8l\�+!�KoS�����oa+������ZFg���G��~d$���FZ���\pe��g�#��@���"�"�Z��G3���<��,����T�A�`��Q(�����"��,��u��8U�����d��9����f��,�����X�ic����%������^�~�����QXis��6,N(�
A����L'iW��8�#���P�!���c�����Y)X�������j���"3nx�(EiS��s����n�^
B�o��9�0!	p`C}���������
Tfvb�f�2+�_��N��,jV�4����+�EY����&"��T�
��z%���*�[����$<�����#��.�N	TU`o,_�SzO�����)�8��vU�K77fK��L~�7��12/��H�O?���%w(����j%*1B�5�g��������\76	�$L�F?����FD��N`�T�Wj����<�\4�;�1�����Q�:zN�����J�tM���W}��Y3���?��i���������z
ki�cj���,�N93�j
���H�0������5*cD�7E���L���P��+� f���E��RU����?��U;u�#~��-�Y��������qWJ��B�,H���t
U�6��l��H�u���s]*��*7�sq+-�(I�{���tL1��
@�$J�>�	��DnzYPs+�M������s
k8h�HS��]Z���K9����}���d�!}����Et+%��N��&2���2e�����g�������z8�v�(�{�p��������s�:9�&�Q$���5!�����I�Xg�M�[��������:��/�h��G������@
������RN�f$���t.%�))����c��m^%��r�&�����e�Ss��B$���j�S�%�t�:<�4y�� ����]�4,�wyZdM)��JMPf�Il�6uqv�..����8����,� �*L"�Q�?���(�������`��us@Y����bbj=H|��~�J���H$O�:�������mY��g�W��c4q_$���-sJ"uI��n�LH�Y�($ �U����Yb�%U��w��Z$���XN�8�{�Qo0F���%��I?c7�t��VzO�Oi6��S�Q�<�	btXB5�}Gh�����������������D�>'�I�&��8X�������^�in{!�^Xa+��t�$Q�a	��W%�p��+�R��V��Sz���	��^I���U�#�����X{f<����}�Z�,��B�H!M0��q
��B���J2�cT!���P�5������A*9W�a��/��]�YY����M� �@�+�[��r�`�'���8�9)TFy1�gl��&j�����q>)��&���Iy��[�F������WA[(x��+p�"������dL�~��Y�<g#*�I'\C�0���)0���cl8��Sw��E��X��$[����v# ��a�)6��$����dJ�����A�l��i�#��(�1Z&�|�^RH`�	��UNI���Y�1?�U@Tj�5HE��+8���b,�}�a���?�N��YJ`,f&'$B�|��~4�D�v��/*�9;����eS��/� }T=JT��6
�,;�,�Q|XnE���2O�e�"��/��M(0�9�+���Z����Z�Y��[�Yd�#3�[�sYh���[�����Ng�w��'ms�nj
s��$?�P�	�0����d:J�����	>	R�[�I~ZO}�
g�����{�����/����/��������W��7������~�ds����>�sM����@p�#��~�s����Oy-X�3���Gag����F������Oo��7$h"�����~�������_l/:
;����;�7(^�� �./v���F�8�0��Q?R4�+C��H��K`52G$#AZ.
�#��\2|��J�d�V^��}25�����v7|�s]�$���%���j�����.0��������jPqx�����7�S3��xB��ZT9��-�f$��6�V�����6�T�q:d�%y��Y��6�)��HA=���{�2�d#s��4�}73���F�`?�J��6�5w_-���������������V#�2����������5�c��;2��-S��/{����a��V���{��N�z�z}�j�c��Yt\Krp�3���H����8Qu��$�����\��W���u�a]s�&��+6/Z2�Uv��$����v���a�ag�x��r�\���k�7�t<��D�7���&^���w#�18�B�P9}I�&^��}�����dcC�i�5�=�����x���������l�������F�K���[\	�rK���LA#�.����^��Ta��f�D+<
�Yk�*9���)t}����3�������S6��4�^
>�����:��������%�y*�mKg�np����WZ�DDZ3Y�#f=��*�=h4����h�\��~��yU����	@�:%J$uU�~W���8b�N���[,� �C������6���-��Ak+�?���e�#5��>)���s�e��	�^�<s��z��F�\Z���������=Yn��U-����yK:���R��'C��X
~�k�=}���
������~�%�qM|x�b��9�����l��R"b�InC��;�� ������%"+aJ�@(��[��E��i���+#�4������n8-�\L��D ��
�y|�p�@�(q:"d��T/�q<J�f���'�R���/����j��������.��VR��.��w�WR	N���B���%���|K�~�[�j~����_GX�s�~��s��v�:�]������r�JPnsWq�|%x�rMb�tM�v`*i�rr�-��S
�}T��y��_�T���&�z7������.�A��M��64~��R�.,�o���BQ���$cN�&����)��<��v.22Y���?=���%0Z��?��<�
	����9R�K<�/�a��W���X%�W�Z$u��v�x�E�g��\�I�>��o�]O}O�qF�92J�W~07iO���hh{���������P� �\��r���B�c7�K_h}����G�bE|%��%�B,�N���qt=�lU`���2�>�|�|7HY�K�������=�4�/u����b^������������V����]�Z�sJ� ��x94���I�b:1l@J{ct�}���mE�$�].!�P
6
@ieC��}����	�|7N�����Ed49�<@p�;{B�����~x���'�}�QN'�N}��>�_�g����-��V"�i=Q������0�~�12�Z���>V�"BA�|s����f������8��;|}�{*U�(�sg���W#��1��#s�����,����U�����������7�eK�*�|\�[m���z����u:�{{W��3����<+l�)��MD�I�7�bY�v������������3����F�R���M�S�wc�`����;��|p�k���B�����Cr��R|��9�9�������r�#�y�1������G�z��X���Yq�J�)���4��o`>h68�n10a���\K�n�r�7�|ve��B\�dwkU�������_��������f���q
����������$[����b�jUW�O����������`c��*vRu>�	}����r��s��:���f�[}����������c�[��5�6������6��P�Mx���c�>�G��[T%������G��[4�$cx��T�G��ae�����Si� �*
.��G��������o�O/-o��V��K_���}����������%�A�6}GEP�ME��tr��%�q�@��@c����'#�k�jT���A���5W���&�Hq�r��[��c�l���������#f���FN
oG���U��_O�|z��T��"6��o�������b�v�f�q�}�t:{�q?�-�����U���:v#���~���������+RZ%w������s:J�[���a$�����N/������S���c8=i�m��
}�-1=�A��K���m������:�Rao���:����iV�����t0H�?	N����'��7����O>��������h�Rz����cr�L���(���O�v{���Ng�joo�����O���Zrr��mk\��_
s��W�t����5�M���"b5�~��/�
{fKe��x�����D�w���&3��]QS����0���Y]D'�Q�����m���v��{~~�f���y9u4[�D�*_b'#�����^>��1�p)����eE�I6�?���Y�yqBa�������]#��76z�F�X�E
�q��vk{*��-�N�h��|(E���s6���-�
�Vzr�=��;�{��llL��E�p�1�T&�c�$����I6`J�j;L�����<��{*:'(���E�}�1��f��m
b��MG���VA!L�(j��w2`���&��F]L�pEZ87���r	�5��%�%���+�U���
x���z)��U�b��G����-�����Jdj��Yq|��dFAs"��(�0)��:�S�a�q))t��{!L9� �k� }L�v<�
�LI�x� �4����z�0�������k;z���	@$�!4��t��6nBB���m���nL�vk����T� 7�
�ZmW���+��j'�%P��7_^~�B���m`�)W�Dqd��s�
��P�?�_����@�f�-]����s���h}NF�O�|�[,�1�s��,Xp�$(��X<T��dz�$�09��
k�j������./t
�,��Y�_�;@����*�����6���i�	t��hrl��PDp`�"����L0fR�YE�]���,%��Yen�Q�����8v$"���]�mQ6�X��Zs�?o!���y��!��X�����P(��;��N'��bj,S�*��a�O���t�Y#Mv�%�i�C�������-����.����/U\�Ho��4�Ive5v�q�(&5��NFXq�Rt���N�����i(����U�'�/�<V�T�yd4��)y������6e����a�f�mx�!aCsT�OD#�F���k�����9�z���t5��!�r�i�([e����cE]i;��j���qG���"T�:��z�sV������Fn)���������Zd	?@W��1�t<�x��M!�����H��5s���6�v�9sr�
�����a[H3VLkl�0�yp���u!1): |8 IFx1�����Y��z�Y%�t�5���(���b���H�h�����nn����&�i?%�g$�l��� �l�Qo"gA�8Z\Br"�*Ya8�M{kw�c���E�p<�M�Kf�Sx����+|DS�t��Nt�b����{u����G�K�,�VHw������=&��w=8�I��Me�V8��9�bx���:x���r��6|�R����E,x�=�w�Hi����a��|h�A�}�M/�a����%t>����.��x��o����!�9�������#��4P��:�r����)����g���l�Xl�"i����L�0c��N�8[����������
iE%�����F��f��X���3D�.�������}Id�� ����m�6�
�Sc�9���>�r��&2Y+�+�������3�V�*OI�������D4#�����Go���k����%��pc���!��rPZ�����>�(�`�� �j��Ya�@���Y�0�R���4��q9\B�*��M}�N)��C�.��������� �L�)<����Z�����NM��a/��pgT�8���D��c8%qNwi�)1,���6�Q[�3��z �����dF$C��$�z�c�@�@�|��@��l�-�f����l���|����S�4I!�h�_3QB����&t��$,h��0I�|#�Q������;�Rt���	)�~W�I��t�!�����t��KLTUJ�!
����)�j����eCA���������f��5����w{��*��A�v�A��<XomoE�����&'p�`�3/8���7F!��b`�r�[Mn$�$7����$�P�5�@F����O��
$��:(<�y�,��3��4of��N�<�a�@s����s*"OA���kY�� ��}(w�9�F�D�1����q�4�W�i^��� @@.��; =|�S��N�R�����Q;�a���<l��Cc��������>����}sx����/��I��#��h��d�1Cd�eb�F�o~����md�G/X�Z5�\��Z�$�@���[�N�1E.X.BC�:j$7�H�@����L�"@��XZ�HA���z�+,�C��������"q��[�����nn}:'d� ���
�}dl2�2�X:H8�1
9j�!
�F(����@��W�|��v(��]J��Y6�Hz�J�v���>A��b�O��
	Lb��?}1`$�x���e�[P�+�r�O�s� �z��M����&u��
x+l�,c�5�$r��yb�"��P	C��q�����24^�V���yK��TC�xd��;�,x���*�����I�����i��u�b�l�p��E�&��T�t��*��,����[�� ��k��
����E�N�-F�I�2�/��Mkn$���Gt��E:��.'f"�E�?��V>F�H%qU�����z�����j[�"�/[�I{�Jt���MO,x��<a5@t
C�mb����
f=���G��ZX�����g���L�?%�Vn���,��L#Pd;������;TQ���6�K��j����d�&������I-Jh��#�W�\ ,#�X"H(���r�L���T�>�r��
��fpe���0Ag�
���e�K��z�:�K��������Np��+&'cQ<���~�d�5��@��7q������=�9�"{�7�	�NE`Cpm\���57��9[��F��I6�f���!S�le �5�"�Q]<99@��cc{�����b���F�������f]�.�Y2N3�A|%��af���[��D�T�A�6K�H��K��j[�4H��[�xYHh����Y1������41\`��0S;yj	�C��N��,zP�a?"�6�%���R�s	�K&��:ib���n2�=T0^V�#��N�v���>�H������6+��I:�q1v�7�����lC�^T:�-.��*��X��
����������l\����S�?rO�T">]��VtW��VS�P��zmVL���y&{�e�7�J@�z'�KlvE��]�q�Ez��ya����L�|�& �� ������j���!��	�_�`S�Mwy�����4�����B�	�C%m��$��aJ���~��kC�����	����D���N��=)@jvF�v('�����4 �������4�2`��V����/�9h�"D�D
�T�M��0�m�z���d�J�b"4���j������:��omn Q�|D�Ba���%
����zj�>�G	���@�c�A#����&�R��Kb�g�C�.Y��	U�����4o�����cd(��dFBtm����KhJ�������"x��o��JMyF��(b�M5~;�2����\���I��o�.�-+�K�Q�]�����)�f��'If����%���d�H(�A"C }��_A��y��G���m4]��d�
�!d��
�d-B�3"Y�@(���E}�1T�������EUhHZ1d�6E��p�s��>�����'jK��[S^��].���d�����29�,qy�W��
�)��M��L[6������� �����	9m�1���pi��|'9�*h9V����t��W����-!0���N3�S���3re���E0
#]R�7E����/��!g^"�K�C=s��Q�VCV
14Z����C@�1Tq���u���3�v���
�1?��c� T_����o�2'�Ah�����z�>D���6��+�jf�l��;�:��Xa9���Ti5�a4��*\lf��I�T�g��|���7�p��7���02c���F�%6��z!9�/0`�j8%�ZCW�B_�1�0���a��9�Za	�������Ng;^���kzY6�s�Z�����?,-�l����`�|����hE��f�t38�G2���*���������"D<����c���".hkB�AN�@?#v��.���@����q�B.�����9z����s����`n�ys�.J�US��g��`
U3zmyw���������fc��{P���7�A�m��+G\n	���oQ�l�Y�����32'CW��~E�KSfq��0���R^19&��e8�K:�
�����*4 �V�U�E,�����y~xzqxtyrv���<���BC�_��w����?�������R�G��u����f��bc���d\�[��a ����pZt%	~��hIb�{yY�V��v`�����j��R#���A����L�R���l���N(�R	w���p�>R����Q��6	-��D�a��-�*[U�Qg��t�Q�GM�"���Y��"�W������Z�!��������=5�# .zL�v^���������j�~�����'��f����K3#�o���9(���
]%�������c�'��r�`���K���{��!Gw��w�:*�Y	4��j�_wK��s:~�P�:�����#[�6�(M'��3���8z����_�]���m�p�gxY2S�@��6&����N��t��w��s�uB���U^z82:��t��QP����#E����w�������M�o�8(x-$���e4���:��-m'��8W����d��6y�D��&!&l�&]�@��m2�W0�����EwG���6�\����-�����O���BH��6��V4%,jl�����������s����[�� �[����2��=�eT{n`�"����p��l���Q���� �]q���?&	��*�|�)(I]��/���INF��v��{g�7�|B���h���y[����������[FQ�����bwbQed!������(���s5�*��C^��f��]�sW�j�Vd���|��
V�>?��X�����������#��u�OZ5�=�����.���o6�Z�Zyrk���k�?w�5�
-����&S$�$�M����F��}zv��@�'��s�_�z#J�e�
17�7$���Yw��������Me
�Z�T4l�	.��:��Y�9~��E����9=<���_����������-����#W��T0!l������j��
z`Q���.���f2t2V��_�}�S�Ei��������=_�o�r����>��Z�T'�v�B�=��3�!�'�����W�����\��$������o���@�Q�[�����yAJ��\��0���j��CBY����������o�'�x�O>=�������~��`�j�����6�}_���P�jd���?HP��:��k_P�^��z�}0K��B��Y��jv%�4R[M4g/G_QW�i��
'PR���Q�L_�.��6s��8���[�6v����-�"��L����&�8���i6&�����=m�
���4�P�_G)���D�#$pd�Q�a-����_u���6�M1U��y�y��	���2�g|[��Pd*���"�r�����P����MYC���/�����crtqjw;7��A�|k������/�������(���P�k���QbM��������rk3�R���I<6R:�������ZL��hY��px������g��E�<T����i��Ez�N���S��go��������@8�~N{�RZ#�SD<�NB�`:6K������ !���K��2�����[J Q/�L���2��wR�u�`Ajej�
B�'�w�h'�\��!d��H)����Q��=^'>�4��V0��U�p���Ov�N�]�S{R������B�
���zSJu�s�����'++���9��:�0�#:���E�7Q�i���c;�x��������1]��:��\��c-�^��.��#B��|����u�Z����Z�u���P�*��tL��c4�$�H57����TB�U �D�`=�C`6i�)!�8��
� 4i�u���p\I{�0�0�%(Q-#:}k�dB=��S�ZE�}[�pY��%o�*�Y���
�x7�z�R^(�{���k3��
J�Y���Z�x��%����	���z1���mHp+$��1AT��<
��J����Hf\����R����J�r/�%�cFa�U�J�uD�:3K/V
3�rM�k�C�t�9	�_��� d!�)5_��c���4Y�"�FvN$8�7v��RK��~M�F�Z�)�Z���WW"jPR��p9�dm-Y�.���y[r����9��)��mt�XQ������'�D4Uz�YV��;�w������/M�A)����9w���n��d�
�y���U�9��` 0�<�A���ekq��$W8�^�t�V)�����w�������:�:���!?����l��rx'�����������o�����x��h�����k��M^�-�2��m}p�;2�m��$�f{KF'�������-tT�9
ZnI���f���6�k���r@�m@��Z�h�ru��7��Iz~��\brA���a!k�-d����V�X��Y��^��h.3#�q�%���>|��������7oN��_)
����:�Y@�%"	$��R���h�4&����n���L3�P����	�8t���Sr��r(��V���b�?`���v/������aM�P��*��G��'q-��OY6E}�q(V���^���6Bj*��n�f����abr��+?��������^�*t���g�(��u�8~��<M ������A���P�&{L_�p���C��l��]�|+�T�����,��bI5����o�_�lrtM*���J�TO\9�����Z��6#���)I�N��L���S�
$0������X*�����q?�n�����yRz>���������.eY�!�y�y)�Ft����DI�|�*������s��7]d��xa����x�&���JcrV�nK#MZ*^��|��%�=��$�Ls�8�^�'�h
2
)����������\��
V�tM�j~6_2�,r$w�����z�S�#N2��$��(�o<�d�	e2@� ���]Y4IY��OD�S���L�\o�^!������%ss?4���,�`.��t���Y�2�����!�;�)���<D�D�
��=!
k�P�_O'��p\2k�`Y:J��(�E�������BK��
�E���%z��a�V�)��������A���j���(G�7m3�Y�m��)m��s���+$��0Qo��aV��_��0Kp��[�|&Gbi�M�����'W"	S��������+�����WMC�y$
_tf����a�;��s��On��#l���2(��e��hD@�����^�����/5��|��������5�nb��P .d0lt�)�����)�9�fRV���4�C��d�'��" ��lo6�����������)U���13��2��/��i�b�)��9xC��e�q�vOJ�j���hV,R�	�
8:�ie�����
[	��-������&��G=���.�Td/��az���2�5�~G%��+�B0P�;���C����gSqq%��j�h�"�t�<��S����	�]8����D���t��z�D�0ouk��d&�z�nZ�\�
y4�-
�e��}����������k4�
4�i�d����_v��=�Z��v%k���)�x�\b��a�J)fhL#g�/n����Q�
��B���������������!~n=�v66��r���X;z�8�9��F�*�a��5��a6�vH,cx�08�koO}�,%�(���:�H������I+���mQY>j���#	�W�R�,�j������1%��!�v�/0�Sa��0h[
NrzO"�r1�c�ld__/2��k�{�3�]:�_�<�]i�}J�i4t�+:k�U
6����#�
���U����N�% �B�X�����*R�i��:~K5DfTB�\��hL�D�����+O�Q���D�����!M	v����5UB�R	��H��>������-��'�P��r�?��R�����^�z��id�Zn�S�|	��	CdY���?���.�������
*
N�}3=�K�~u�Z;�s��6�=��L����P���:�"�#8ih�4�|��O��Xt�(3��D���D�t��atv`P�������XC��y&K�m5BO����%;q�}s�$���h�#�La !��sXd_j���.v�S�edg�Y�g�M�J�Mb#��-k1r+����o9&���;J�}�;b-z�!|wZ�'a94������!�	'�2��o�)8jG�$71���!�`�|J��%�f�V�9<�.|�$Y�4���L�s�R�)�z��o^F��5 �mRd����?�6o��2�=���k"I]�����N�tX�)KB>���t�c��M��I��zP�����2m�3��%V\������"u��o���=�P+����Ua���N��Wj0��������lKhg~t�ii,�Avb?AS{.3sH,APkBoMXl<�U������W��W6
r5a�0b{��.��� �Q��'�w�Az��zt��?�~D<"A ����0u�#c8]�az�@(��F*�6����J\�w������Q�[�*y�$�����W���rM�Fp$4�t��4'�HD�G���K�Y^�m��,�qQ/Z u�)M�J�������}�W��G����~L���\�[G�Z��_2%"#
�[/x4�V��y��W����+23���8���!�����/��Uo��<�LN>�z��%Z���F�L�
yT4��nK_:�G���Q��n�������x�@���W�2���bISqI����
7T��@Z��F��k��,�l]Xy�nS����(���A1H��j&��H���<?<:��;���!���J���/�G~wvrz������Z����|Zsm�����������w�*��"����n�����d�I��F@�n_��9B�����1�1:�>�p��5�Nk��yp<.3��VeQo��>B�5�������-�t�d�=����m�t����E��d�����_���	���j��2�[__(�GYw������� #�:�xG��u?&/"�Q�2��9�x�%�����<;����Q_�I���B��%��T��$"������5�����n��W^
��Y:�����#��q�AZ��y�G5O���}�J��h���_���cN2iy& ����e:��g|sJ�`0bG�sngoVtIKn��q��5���U�R�k���p(��_d4��(��
,����[CB���U���y���"4�r�h��]�����,�`�H�/�Bx�����%V�Q�q&�{;���*'����TT�xa�{x���+�SQ5����y>�����ec���-.��Vs��G*��)Av&n_�K@�o�:�=�����`���L��p��k������]wey��Z9�3���H�W�GO�~�F����2�<�g�|�����[9;5
�o�F�tV��;�R�	.@���u4�g�~n�B��/��	�"��(�3���1���^��a�~�#Z~����C�pP	Ds�hp��y��aM~��W9�Ua���
��39��^��\-9��K*R��Y�8���������M�-|�^1}�:_�i��<��d�e����f]���z�\��3"W���
R�6�d,��M����E�^C�m�����v������>H��m��Z���7�D��J�7?66���
�@�=z?6t��>aX�"���V������V�W/0��4�"\���T�:����<�[�{�4u��	��x8<
P/P�LTS�{�%���#`+���_��:�<�B���R���G�U��H��0$<*��a�x�UF$y+����7�
;|Qj�l����
F��������>�;yTp���%\&�5��R�B�����^�mgy��e#���������~}������=�U���G�ogTaA�'#���D���:����,��/���4��^����H����2a�/��'�������>!(b=D��eY������l3a������Ng?Iz���K�U��B�����:�������UCD�
}P|�@�O�-���j`��z6�L
�����t:����A��hm�7���OS�A�d��'K��W���)�����v�D���B��\h7�1ZD�F�2z���]"p�J/��+\�,Bh/X���'��d��e� '!�OF�����`r����r��i��]�_��WV^��W�r`8���KIk��:�vy
������0�/	�N*�<D���b�I������$��:<�G1�G�I��%�\
=�2qM��Z�\&����i|(�����T��Q�x�G��rB��o$)0�v�����K%����u�YV�|�.�(a���E�, @[��A����G��|�J~����9���X��#���v�������j�����.C�c�JH��i�(������s?�C/&�?YC��!;���7��+�b���U`�Fi[���h���[���qg:��8'��t
��������r;���������:j�
"w�E��:������JN���������$�{2��Q�����cmV|g��b�D�Q��e�Ft"�S�.����e�;$���
���fq���.���)�?��27�0FN�*m��{�Rzz��aC�]Y�$��l��s�n������?�S���U���Q�E�O��'RF�������DL2����Z���P��A��x
k�����g(���_"���yIq�N}�F�����}��p.�=�9���$��������L8�������#�80f�\k��@��T�$fi�S���p���F���y&&O�X��A�yuFi#|�9��h6�:c���TP7~��7�>�,��#
�iK}�=���u�����b1Z�DI�x�U+w��[8����0t'�OIeH��u��lb�J�f��E����kh�����n��7�?D��::Y������=�^�w�\�WV�h�(�W����P����?�/�r>��U����c�v��$S�
zP{���zP�>iAQQsqT�����u���y�Kc����>���JM(�_���qs�����T;��"����)<Ft��{��A<�IQ�V�� ���?S�	R���VJ��+�a����Gw[�R��l�h�1A�z��_�wa��u�wb�*Y�� !�^�N�������kbI�������k�~�~]'��R�,]i*�
���g��x�����',�!$�p!��#0E����+��ySN�������OO����;�J����o�����Izd��$R<����^ ���?]��h���8Bo���v.z"�;�wP�sb�
�T$n�-H@�����Q�p&_���%z=bdYrR��A;�P�
���
�'vkZ����k
$[*�"F� Rl�Q�1��!*c��QP�Y���BQ�����'�q�<�P���#J���r���	�!��9��R����SH�,}?�`��6{�B>��Ga��J������ST$	I���I��ld���lW�.�CF�"��'�, �;�f��qyh����H���
�j"�z|�
��'�����U0��]V�+|Nb`���eI���_���|��= ��@�Y�_�V\����U�/M�����{��c��������v��w�����#��������/b�.����B�3�'CHv�"���Kplg����J��$�V�b_���x�M�����`�����uh�G�+]O"3��o�g�j@)�l�b�R�_}��Y�����o�e���D����u�@�Z���R��g/]��K6����Q&|����t��[Sw�O)�%��R�_,����"���6{,�aU���j.��D0��w�Y���eVxd����dG&	��R}�%�!���M����o��pX��&\�l�a�,}E���R�\���r{���0��Y���V��?c�D7��9���u�K���/�G����)�v��P"�$=���5C����u��9K���0'g����zi�s��2F1@5*�M�j~�T��������N�����cJ������#D3����f�mJpM��v*���9���/�(2���N��/w��1P��Z8��s�����X:�iis��Wl>g/���9���gV�b$���R~,4&��@;�pQ�q{.�-���C���r|Oy?��L����Q�4r@Z��\9e���Kw
+;EVQ����'��#�9g���
����jm���fk���<n��T=1�x���i6.�Dn���
u
$���9��8G�2�
�a��E�(�D��L����'lZI�F�p|�	�=,K�dN`RL���W��*��:�z-e�WU�l���WM��"z��L����_}���S�O��~aW}��*���-sD��i%u=%+@���B�����)a��k�C���!��Ik�C����j_�45��� _��Qua5?�1������	_�:�!��������o�Jy��"����9,�k������3_�r�{�*��J�k��������[;{�rq�7�RXK���M�����]�{�W%�����\��]
Z�������S!�b�:��U7�/�Z�N�w|�@$��Z��,��`�RO���B�RO%����ZU�?�[e��W-���m��G�������TGlQr1`Xy#}��9-�/0)82�Z���Q�X_gv���C��Y���z�~Hw������$���z�3�C>��	`��Ju�A�vr:v��lN�o���0��,�j��VyM�DF��0�\��%Tm��=/�������8/lN�*0^5�J��S�����c�f����]6������Nr�;���n���������|���iq��k�]���_F�8����F�@8u<����CjO2R#��|���_(�9����t���)��,��	������k���������>l�a��/9���&��fb�� +��~2��n?�QE�t��Oqo6�<�or.����9�n�`��������#J�wO�%EM���[<��m��������/O �\����_���Y2K��i�;>h�Z���`�S�L�\�����0����/Lc�+��������Av���*�oI���3��O5��c�������-�,�E���������V�����HA�_�*W�����N���1F��|J�+}
@(�9�LO���Z!p�o�<�7Kr`�D0X���l8$�g������#�K�����4����,|�=���J�Ey[}er�����!1��OAj����f��3�M���F���g�(����UO�`
U�3	;|9�	�U����^!�HF�����t�e�o?t��~9~��a��VI��)��zY����J/_�/1xo� W�CV	t�yoP$��`��B�����Y���h��?"$�
9�-U������3�&R��I3.��Q��
���C
cdEg�`��;�3)�G^YG���r��t
'Q������Gk����g�.H���W���LOm�P�l������g��Bs^3��V�;9e�`���IP��$�R�d�M�#;�>������0qW�!�{�M/oP�a|K�Og��l���q>�L�����sIv������5f��T����W���zBKb�8����M����}/��)�������n��j�k$��0�i<�pHz��k��$uI�A�����?�<i���n��lm���[pFy�?���E�\_��<e{�����*�78����+�XA�'H�y���>��G)42�IF	�������f4������217�tC��$R�{�"��g5���
���)'�������E�����.���MX������~r�]��T�q���a��!x��G�sy)��3������H(���kV��@�t�c:��T��-�< ����:�����E��D<����}������w=��!�����{+h]:�,Q��2+.���-����c-�|�m7e����apDC�c�9rRdo���~�T�ob-u�R�������~���Q9I�D�#������,Vkn��{���uIp�:R��-��z���]���n���4�VX7����rA
���V�Q��%��!�;<=9rUX��������A��q�5Z�H�H�����|�'�l��*���&����u}���\o�\�o,��b�Yh�-�"�����&�o�R��P��74��u���N���!)/�
�� "s���"����2������x

��6�^I#r||�\�z�%��{�=���xA�����:E%z2����j}u��s_�ew�s6_d���g�70���TK�>A��*��@Q|��	FI��s�%��Sf�����e�a��?n���M)�E�Q0������6���6[��A��&'��l���i��8z�b��!7���M�����@+���/O��uE����<�
���������n>���hd�l��^P����q�u���6>Sj;�b���O$#��mj��|-_{]�����o����P�AZ���5op�?$.���?PmD@�f��$ DV�`�����Z
0����om\u:����v�`i�,���A�nM� ap�0N ��i���p&d��J���������g^�����S�%��R����A�t@s����J�t��af|���L4�Q2O&������,�2��5�����������'N�@I����"��4\`��O��9�TW�O�
H���j�����\�f�-�a�*bQe-���l\��K����D�~��p����}(�.m�x��]z�����^��+21�����\g#�~Jq�^2�G��U���J���RG���\���h^`&L������������B�/N���` �,d��p�e�dj�����������P�?*do��BDb�l��5f~��9�5n����o�t�����,�(�������$�����|:�^nSz��2��T�l��Q��_e��~(�M�V���ZB�����km���/^�
P�����������4�����I�1�?9O��&�h���6��Ka���m�����.FQ7t�����%!coW�jH
E��MIZ�m�x���]l
L���]
�	FB��Q�����Ah��&T)�-��E\�~��f��������2oo�6W��JYc������f��(�u����������������K�������lD�i6���l��g-��������&������r#���1�ol����!��K/e�����f\N
h���.y�#�&�#&|������y�NP���uL����%�$2���|l$�A�*(>d�����9�����&Jj��%���k��+����g���K
����2
cB"4�2(T��^e,�@M����
�h���S`
q��i��Q�h���`:v�tF���L����Zl�R�OIB�����u�T�8T�9�vg�����������E.���������7�	_	����a��c�I����Zf�� � ��L.�{���Y�v9���$�,�tv�@m��"N���x����,�!���zU+
�W3J��#�>���~�$RV��Qz4�!O���mk�nSAWDAq��L
�P���uF"_�@a8i��w�'�e?��px�ig��(	j8L��$$�|�-��l1.%��)��}*��8��u���q�FvCDiz���E�(��z���i�j���^�^�.}��Q#Tf�92� `>�����[xF���s�~#�� &
.�����/�I��P�}A�.�0�#���$e����1~#)H�
���Y�����<K�9Jbu��5n�L�-�W/t��|Q`�^��\Q��M�;�m�w����*�1��j����-���}\��{ ��c���XV��T����7�X���R�������8qTk���Y�v���0rXx��a����s�������)+k��K+��{���s�s�����	"���#TL?����.��zz.0���A��
�l�k�<�Yq~c^��j�0
�.��.���H%Ty2�-:��L�UwH��vM{(��"�$#|H��l����"��i+����J,�%�9A������v����nk������~�
�U�4�h��-��e�/Z)�shn����=����U��������M$����<��b�j��=�����y�L:k!"r�"_{^����2��C�hn��fM��0VV���/7%km�q�\��=������r�H��?���N���Ci� e����r90��������_T����/��h<7�9m!K�FL����*r\8���-:$�&������LC,�'
�b�~VC�{fP�����`iIr��s�1��������v��(��2��s>8*%�+@�,w�>"k�ka2�<��`����U#�@	_��`�B|)���,>���#@�#�!���	�GF����}BT*d�k��]��^�������&�Isccc�E`��#�!xa��j4
C��/��'���4��Q�(��U�� �A�KR�d�M2��������x^���
�������mnl�=|ck�����R��qB���7Zg���!(�!�u�8]�^�%����*�aJ�D�n��aL���b�]k�����h�8�rk*.]h��W����a3��Q�<~��I��q$��������?;��!5�dy
E6�y��j�/!��:�r����	%�W�P��nO0����K��9�5�F��L�[[qg=n)��������1eM6��SL��\������e�4/q���eh��l�9����s�+3�N�\M���r%@�iU�y-d)v6����+��=�B<��$��3��}I���]p?Z./DiFvr��)���
p�f�IHu��]�"$��"�����R��t�������/���������������g�3(�!G�.���v9�U��-7J9~E��]�����w���C�qT���a8*�[5\�f��$���SyGug��	��f-��n�	�����`��
�*r�]��wf��d��
�����z������z���+/��Iu����#�G��6h6������}����G562��qb�\�W��������Y��NA�!�:L�����wG���B���{3�)�C��&�x�{�2%G��\��5bd_����x~u/�����.�#4&�}�-�1����Ss���
L"b�"6N�����i"P�x8ec�[2�7�|�>=67���x����?_����j���������6�+>�xRz��5?s����:[�|[;X�v��K0w����#�j:�
<�����7����%�vy~xt�}wvq�4������b�(���>������-S<X4E��+�m$Q�����"��pv�I�>���K��;7�����73'��=�7���B7��(��B�p[�q5�R�,C�����&q����<�-Iz���� iC��S��_�4j56���Y?M��2O���PS���r�IA��������������=��Lu�z�?%����u,@	~�����X	�a�9�uO����tk(!�:�k��<5������1yA����`\���&�������u��G��Q��N�I�9��_��;���K&�U��� ���s ��KF���|"�D��xP��-������d�`��*���������SV�l���3��&BZv��l�~�%Y����]��lS��!^�{�c��x�M����3�t9�3E�g�.�����i�����>���z�K�8�m�H>p�=r��3����N�����)��Sb�}���x��U��EP�U���N�f��az��������b�zH��ID���S�3���{`��s�KZ@S�0�Q����D��9vx�Qz��bXK*Khq��W!@��A'�>/�`a}g#DC!����&��'�����Zmc�?}�fX�2\,A������;#���8�$x��CV?w��qN|u.���������	���L�%RmGOb��`�jn���9�1��S�5��n���^���d��W�&E�5uEBN��[���"�YB�����Y�\?.���$}�4����z�-#���!7
��\#���HC@���CH�l���������������_��F����a+�O���~>�	�����������.{�lq��I+�2y�-�8n������H��x�&4T}��Sz�p������"�a,���2��<<}���	�������p��'-���#C�@
;A�*<�F���-�'�n%�0��.��H�v�l��i@��+�a�N�/J�6�:�u}������$����#�����C����MJ@�,&v�0Q�����+�ug�i�	���'	��n�x�8-�����b~��*�IG�_X��{CI��'���t��w��U[_�iwg}N�����T����u�s���m�__ !���D+�i�0�k�}��t�����S��[�7�n
�6�n�#s�)?�<6� J�G��!���q�G��L��k>�x`$d�q)zUB���������{����U��v��ORd�������Tu�����wd!�B���M����h����:����,�}@��s�r�a�x�H��^%��E���"U��t���}�DR���c�f*Z�*�v��@a~ny��������v6El`z"vm?����s����Xsk~q�C�HF�|�8�xK��v���;�Bf��N�a��~��H%A�4D08:FT(�KGU
��l��``���m�F=�

�D��++�u�>+�h���h���/���2�v�������� ~^+9��'�T������9.�3�������2-�'T��Q�B�k���fV<�VX���UQ`�l��
���C�n<��,3�������=�<�#@���l���Y�.2:�����#���h:h�80��Ersa�w�h5�kF��3��=��4$v&RUV�Q^�S�H;�����GQ!e4��T�$w&�*x���}�����];|�����8z����_�i��xw���a�%���r���j�f[u�9��LCab�_wA[��WFI�~�D�W'�?�a`b���b�{b�k��4���#���E�k��
�C+2\644|C���Y�8�������NN/Q����_/_��?���_�}{������Z��]n~�OX��BYI?������K���lO�2)
��w*v���A����R�q�����V��,	�������:uF�:k0�+v4m*0����9�##Rk�^p��2�Va#���f����M>FTh$ETX[�5#��w��e��G}�'����&���eCB�C������i�����97z�v���|�'y�y4�>
��}����;s�#�*)~aXY����W6��b`���a���f��_|PR�Y��%)w�Yy���!��\����Y]�0���A
 ��D��U*�|w������s�n����id��������c@�Z���H#\�{��sw�0]��i�Y�=$Pd����c����x��P�|�
�I�
�)d�������=�[����C]�����Gp�F����]'!��XfA�4������c�����>�c	 ���^�~������)�Zf�H���'P����L�!��y����__o����`c��������c����{VC�-+��%T�e�=��o������L�{����#1<��VB�=��Y�<����
=N;y�:��\�2W��-��-��=V�{��85�3���R���p�Q�6�7����'�V��kn��wa�������S!Q��UT1D��,�*��������a������7`=�$U[�T��\M�<y�B�V�sl.m������O���7'k��m���}FV��U;�R��[Ir}p���_o���������e;v�qH��-����g����&��I��|?!��F��<��V�y����Y�|>�)�~6M���7����HuF����}.��V��~�4�S��rf����{=���9���x��dB�h���8���$}�X�A'4
!w�{��(��>�Qr�U��I���z��=�k���N�����P�~v�8O9��i<{������0���A]�3	��,&p"�Z
�o����xs};�t6�v{�{��I\���qm����v��p�oI��ta.M��8����,T4 �9�
�����
Kz��X,�+j:-�9�.,6��_\���� ����h�e�!z���{4?������>�O������)�z�.��3��w���������"���zf�1l�!��Y�)��gD��;��c�v��I����WK3����DR�f�����.��\+p 6gW�k]Y���X!*�vu8zx��-�>����� {��?D9<����9����&J��JGV,/��>��!���i����hD�.���A8
�L����������?Q����
��p�����]���~����s�:s�����Y���=6jXz�0��i����������w�����g?�������������������w_�z��a�����Oir�w��
j�Z��"l��W��Fooo������:��������'�k�������8����{��
���~J��������.:H��X��8l�SQ���/w��.W��]{��������go1t���^��Ln7���9E� �e����������t��\��:@�;<}���:����j�������
R�7��Y>�
5�C��]F2l�b�B^,Z���g>�X��"g?���k��,��bg��W�[�n�^i��`�3�U��{������8��V�Q����4o:�$��xT�����jV?\jW���P]���T
�,-F���M����/�����4$���4j�b)�����q���G�����Rt���z��x�x������9h�]T��r#N���i>��3���0fw����V�F�
�=;!���~6�$++}��=p�b-�}���GW�6���,�����AO�7�a#���J!�P��f��5�rQ��&:!s����Ut������������FNR7~���5
)o�����i������N#*��x��;���9B����G�IG��\n�Xks�Z���hZ�����].}w�G�&��xr�������-N������++�u�u�L�
�d���bT���������x`L�T
�j�c�#&�5�{x��n��
�|9����2'\�1���Y��F(�C�4W"{s���y�e/p���r�����������Ng� I6���(u0��I�{��To�$�}K�E�������
�\lb���pI�It�:������,�s��*r/��o���������	�!��t��f������3��������������+��������P���DC��\�7)���qk}]#�7��8�*BbV���@�;E�_x����|��w����2.������	�����h��V������`�6��9������Q2�H�5n�ld!�o_�~�(z����fhy��xsog����$���K6�����Q\C+�`sd��.)�T�������W�������3V���I9L�O��*#�sE5�WI�#%8�Y�t���� �������EC����$�{"n
��7_�����ft
�9rfr'��YX�{jh��+�d�)CpT.����,��OB��6����$�s�Qyt�as2P�
-P�n�=M'�_�MZ��"Pb �V�dnlPM,�[�WRQ�i&hR��{����LyE6��'�'��k'e�A�s����w�Z�I� *wfE����uJSo���������|��,!q�py9�r����^��w���V�:��O��<�<H����
��FR����C�?�5q�3����O������3
t���E�����}�����KQ������Tj�����h�#����e���i�h#��w�����J��$���>R!X���'��B��l���P�W�)�������n���5w�f����q;����� >����hh������g|�or](�OK������ed�i��5�vU�M*��*��J��(!U1�K$J����^�����q���w{q����\���YB��;�#{���Jg����1����W��	Q��7vY���b%pW����|c�0�����S#�^����A|�B���=���%�LgA1eD�dg8�����/$����Z7S�@B����z������I6&ah<.���c�(���RZ)v���� ��:zn���K	��qu��ORA|cn�\uEwsO���U�������������s�[�}��2��q}��s�8�A�D��~�M�[k�i������b./���~�����`�d/���b?������9^�������j�fq3��^�>��#�X$����f����]6�h�B�r��q�G�r����B�X���I����VJp���{��gW��
 &
��_��J����uY��%Tys��Oi��TX��o���HI�$����_�k�	�X>K/DT��LK�
2J�4H�t p^:*�z��6oH/��'�L1FA�LL���$h�f�=�O��_:Q��g���������G����4�R��"�,"�Yy�bJAw�X����'�����sNE�O�+��d�4��p
2�?t�c��~�i����L�
��;*�� ,*�Q�*hg����Nn�:�������j�"��R�oFv�'J��N���W���x��(hyv�nP�3��v:j�!�F[$�e�up��,������~��������=��!UK�R.^Onf\;E������=�������	��?A�5��^���[0?=��X�f0T�������	!�p.�=�:]�W!�$����o.I��@zzK�^p.��8�(���,��J]�8��N(���=U��p#%�6C��i�f�o����>VX �����g�<vNH�7��G<�����XETy���{���G�D��O��A�������l�����q�~��������?���X��)�5�����=��xVV*��kh�3�X��cph5�K_D�~���q��fs^d���x�H�z��6Pc���zg���%[��*��b�B��:C��[�#Mk"IE�V��_L������	�:��S�y?�?@�Z*�t��x�{;��"��+&I~��m>���tFd��p����Yq����	��t�?�8��rP�h��G�����3�6�$�4-��������If��Z��D���76w�t��u������!R�oeb����H]U_!���i��#W��"L�'=��kjN���L�S.Mj<�\���~F@s�X�y���'�$8 ����5�?Q��'����,tU	l6�Z�.�*��{[��>��E(u��%*�W����������`�B�F����o��_�\\�wE��k���������.Q]3��
�z�����{�[��q�!C{�,#��{Bn7e����c~�VtH00�P?`���i�	���_��_����5d���4�����|J��vw��zx)�K��4�@'���������`}{����@C3QG�=�B�)�g+T�i��7���'#4�x.�q/YsB��9V`)-������Pt3�7/f��������2��foo������m��,���=����
��ql���VV�_l|�Dj �R~'Y��t#t/|k�����,���������� 8M��;���R������mc����`7s�����~X���R������cV7���E'�KV�
y�}����d���j-.T�H"��=V�C�/T�\�����������`�.7wh���5y*���������f�
��>N�(� ���S��V����)��(�.N.�k�O3���R�+��X�w�W�WKHy��,#hC�����@��n���d1/���u����pF�p�'�Ze����;K��<�X���^�����-������sW*\(�t_%n�d�h d���9��|l��:E���T.E�jWa�%D�����cN����ZI4\�P�8�����l���W�����
�64���|�l>��ef�E�&��[�GM�+f�H��~	��Pk���m�}ln��?���D�����'����6��d��4�����(iO�bx�'z�(��'G�N���6�<��_����Il�H6CE�IfX�0�<Y�<���%M?���>�`5\=�h��I��|�����!�w�/Ybf��0���A�8;��O'����w�3�'�Z�gp��C\|1�.�Ir �'b�8�F�������,Y�z�t�k��7m?$���-X�������qr'E/��]��*��Bcq�����Um�D��2�/"N�Y�.
�
���4��Z���W�\��BG�F!�O2up�Q��������jyA��"a���@z�_�.Y'�����'K���B�T����G��=�)�8���n%��	���xX�����u��_VK��*��o�0N��u$��o�b�q~�u���.�����8����S�����w�a_x��e������f�Q
]��.���B�1|� �T���@Y�W`M�oD21s�&=3����[=~���l�.\0T����i�$Xg����C�����'W�v����x1�c������LN2�8VU�,���JQ����gw
�&�#C97t�@��y�p�r��YNp�8 6{�v�|x�i�I����U�����/����-�p5qz�SEO�j�� ��G��J�{�SR���a���kl��_Fv��s�F���x��0��G��b�i�^G��%�A�[�N��h�&��d#��
{���M���O�l����X���3��J������J�J#�����)�Y�������dq
u�*����Fu���)B���+J������f[��� �1����F��Vh�����m�^j12���~����E�-P��������}"~^���t!����)���R��i�
�@����O�?.�5*�QQ+���AXo�����dF���2���m���j��@D����l�l���K���,��x):�8��@K
�N�������I*��������/Y[v�J~�F6�\�~Uy���r�i2���}�.Xh�t�w5k j
���~�H!�>����l��.f*��Bw=xn�%^��U���s����F�^6$��M�#�)9����Zu����>���^��L�7�I�T�%
�/�]� �x.���<;m��R��Kh}�=d�(�����YX#)�=�N���{���*1r�\P�VA��SF���RaLy(^�,
Jk��XV��l�"A�� =o�j��3�5�&lZ���\�44���������c��W�D���M�u#�7��6�@�C"��K����������z�X�&n�Qv�'o�J����t�+����J�����Ht��r6������G�:e�����;����<\[������M���[H,^�/���K�K�$`�++�~C}���dy�����j��W��@f�����6QFG�i��ufs�F�{YL��G��lZ���u��������{H^����>�oXXKW	qN^�i��{T>����#r��\��T
*�[�bE�����Z�2�m4���{��,�lv�s���'������w��������%)�gb�P���@�er����_�j��&bD���Z�}v&����k��X��	��m�.����?��������r�������r����54:������1"nJv��fj:��Jz�q������7{����������L�{$k�O�@3�����}��*��R����j����&��I&`:�e�]�c�������"�$Q��@��g*����$�+*e��<;7�����/g���M��,��W������Y����iD�O�2���l
�SOz"��B���h�('�"� H�#��q����L2���+��.6��j��k@
��"�������_3��;�_�(K�x �z��#�!�x�Z�q_b1������J�D��������)��(��l<=�s�,e��1����o ��NE
b��6�7�;�x}cgg��������jEtI������up�]A�2������'���P�1�C�
�?��f��C��aoq�@��������s�����x<&D#2��=~uxy���t�Fz�g�\��Y?A�����k*[��~�s����\__��������xns���������+�1|W�dy�Q��"����5�A�9=����x��D����+�nF�OSJf|��\�eW>����t�X�
U�A1��]]��������{r��Y������TT�����T���c3��t:+��K�dN��X s��h�����������{�W`1�dH�������;�������y�L�ANd�'d[�`�$�a�'����y�����T�������Y���p���M)�QEL��|>1E�#���ybzI&�����'�OL/8������r��q>1��30��I���h&7�kq����� �sJ��cW8{���%	��tt�������w?�z��U�g��w�������I�����N9n�r�M��dTw�6�vZ�����R�_��+++R�j������n���V�?�^jK�\�i[_p���.�V�s�(�+�������m����R������S$n�W��B2���G��D$�*�k���!c2��y�%����m��
�������9�^���2&�bsKmPL,7?�wa,s�����m���R#�[����K.��D?����g�*tcr����t:�{;��9�	�{[N�s�9k��6��R!�}�s51�*+��U\GI�Z����$�p�rw<��H�b\6�,d��]C��]�{kx�����#.N�~�R+d?�����h4�h0�N2V�C�u)�d����~�O����n;aU��-�y�/g���g������3z�U���M��_6��[���}`!p�.����o�8���$t7fNb���4�/TI�U��-.���y��`�/��yOw���zj�7����a���sx3��$�l���"C����s}�D�����������5]!La��C���Q�����:��b��PB����������J�A� <��������Z�HG�>J�p�>[�5�yA������U�m��)����H�2��E���E�!��E��GB�%`wpg��D2�w�����I��,��%_85�
��t6�������|���E���y�9#��R�ZOO���![6gv�s:q�-��)��6�t�d�����`u���.s-|����b�����.���������z��uX��<�u������� ���3|���o�Q��"{el�����q+�I/?�F�O�1p
�D=��d*T.�U9�X�R7���F@r���#�S������Ymzo)�<����t�,�8�k�#yU�k")P���dt����Z�b�.$j����0��\<VCL0y�
����}YB��� B����!�Z�P��tb^!��Jy��%6g8���-y����^��T� ���4=���Aq��=Z8K�-�������A�-�Od|�g~�&��A9e������K�B?���G����H��$������+^��������[��'(qj����Fr�������_i��	��r��nS�1`y��]�����H���V%�!����)�EOo]��z'�&�����z�e���s!������Z8����������������B.�P��))U�T�z0�oy����&f��)�
L$k�y.�O�<��Ro�6 ���lk�}-�	N�����Cb����CC���� �P�ds����W�L�JTf,����Q^���/V?���F"����==��Wb�`�x�m]�����g��`=�5�K������ �7L�E��;��4,��a`�%�Z��U�*�?�����@kN��V�����N��O
-��9��z�2\�^s�Q�vFV��F3R'[�Y2�x+��w��c���S���J0�ZRb/����|LbH?�?v�C������$6�Y�&$�0�m9T�2Sj�^��4u*�7��k:��09@%���7k^_$�9#	��Q-�'��������,5��	���U�LXO{��"9��]w�-s��#IY��Q�!s �|c�������X�s�8�8\
����_����F,'�OP���2m�����*��7�-'���v��}������~}�����+h���og*���'���D�4�R������T��i!��~��"���6�����~6��
����B*R�d�|�E������'C��	�������
�Aa�{�r��"����"*��\	X-�n�SZk����5�<[�4����`��G�]��J�)��y�1�S}��,�D<��%��q�@qqpf�R>�����<�]���t�v���9����������4����lz�8�����(?����mr���D� "q^�j3��nat��5��:5�~��h�8�.7+��U~H5��"�{b���+�����D\�xkQ���&���k�b������|�4n�9q���%����\5����C���H&Ycre�Z��EC��6Z������\�/��I/��]��q,��rbe!��Ir���o�_�����6
12k:��#�e�{���}��+��[R��4+��0f
V9�c� �"����s��!V5�qr����{������=c������&RmkL����b��P��d0p��`�/#��Mo�/

K����Y �{!��`�]f������B�@�)a���'�P��CI�W(	
Y�	��g�VDtA��)�J�������I<ypv`�i�WyBQ_�K�'B��Z���e8M�qdYJ��=J4����4}9�j�Z)s*�E�&"5�_�k�Sh���0O�"eH���L����V����Y ���R�����f����m2��-9]�F�5�&E�he��F
��(P�qm��&��x�cBt���C�����O'a���S���Q��%o6�Q�]�i�-�A�^1��������2pkSZ����
��*eV������v�ao
���!�O}Nr���A�j[�].rs������h!��Q�u>IR)�d0/
������������\]���{TW���]�V_���j��w�����^@�RCw=�q����J�U\��4��!mU���Mc�lb����R�;A��	l�&�U/�ybh��{X���v�w�[���;��u�_.t�j�hU�R����jWB^����0���B�p�r7/g�#.��8F������H���
V�������H�����W��])�w����������������<<}�=y���������mCJ����%��!@���S�J)k`��48��o�����MU?�?%w���^@�s���c�)s����$+i��*�4�u���3�M*���4�I�m��9�4�t�W���V[�5i4$rm-�'� Ws��c+P�jz�����JJP��p���>����3�`��jPj���
�~��\���q�����Ue6
�\pW?������LCb�z����&o�ry��98�q��p��$�@GfSR��z�X�=Rf���s�b���n?%
\�
���w�~+%�7�y%���W�(%���N5���GE������~Gh���b�	�,�e�<I�}Nm�z��0������ieYa4�r3��iZ$/��./�����-e����2��>5�[��������b#��B���|�m��O�1������5�5��,-�o��3�XF����o	g�16
��)�+��1�����(� FD��RvA���v{�������l=�&����=�B� ���l�� i�u	2�����/�~%;"�6r@�$�cb
o�����s5vu�`����dz�$#6��^�~F���t:��=�s
��2����s�w3t`)��c1�z�;W���js��L&#6�J�#�r;�R�S�5��!�k�D�#^f#����������Yh��]�.����V�v+��?|c�a�r����A���q��]N��6��2B�Z�&������Fa1~��!����w��nKXg��e��:+��2�>�����E�.rA���aF�m�J`9vR���T�D��T���R�
{5yc�L^�.a`n�0�"�����s��S����K�Wy��~�v����M�:]��L ��k�@��AuSz�};;AV�$(����d*�0�aL1N��M�������S���JO�z�9�2��a�Z7�i�J���L�7�j)N���4;8r��<������A�5�r}�CB�<1��G.��)�m;�W�i ��T#4�������I�Bp�+���(����d&q�G���
?�Tr�=�2��� �5���U�����������r�"Rq�$i���W��0���o�&E�<�G�F�����e��Cv�~m��:���������; A��s�4-���h�G`o+������|o������h��X�Oq:��N�rKw#k��	��J4'Y���/)�CoB3z�}�CX����D��bZ@.@���-	����JG��~�=k�=
]�<n$�puY��9�K`�%�dq	,5�,�+�<4�%Y��������}�����8�}�k�~o��}�1I���I9G��;�9�S�:��b��A��E�,��t�O
��F��p8���Wg����F$F�v�#�J
���������v�c����Ky��oN��`9���=��\��R\�YL*���B���4��;�@�� 
#9�\��;g��o+���H���Q�Jl�lr�D
Hc 
� )]M�]m�?����J�(��	p�Fq����9�v��%��8���r)�k�'������eiN��Was{@���cKmY<Or��������:�{��G����1����h����?=��{8�}&X���t�%N�
%9�*���G����K��Y�A�@U�1a��i��.^������K�O�s~����^��B������C����qZZQ:z����;��-F^��3$�|��l�l���`�(yU����z��59q���f�F���^�l��������B��
Q;�R����R��8��f��E���Qi1 c�Z���OK8������n����z"b__�a,�g���SJY�1p��_>6izMW����j5��J��TeD��2��.dS3�������0Z���9�J�!�3�w����W��`���L;'\*`��h	�U�o��#a
T��k����rv<�J
��<��J�U��0�c�e8������+�V3�Z	���V$s����(�J�-Q�`S@@��]f��}h��m6���2}8�2I���8&��v��eWY�a�����x����������jJ�o���h���.a���������IHw~+n�u��P5�E�T�������������_`��W����A�ak�@B"�^�R��]���R2�{A���)����~|Y�����{������9�Ip�f��������?�9K��� ����EOS������O/�r{{���i6g���l��m��YI���-��/����/GG���=J����$5�w<0�JL�Qt�7��8L�5��l���D���'��>%�G]e0G�9W����;��w���8�j���WdL�����"Sz#aU��Id���#���l��w�5��%��i���aO��1X
��;����<��������Q�&�X�e�"���0=��.����PW�������c�J����u�=��
�=��aH������X�������R��k�5�!(�B�g��v`HH0��J��]�Yt�Ud��[m"�|`�GY�Nd�f�;��M�y{b�XI���~
HGR���`=�Z�S)����``7�L�>!��fw��8p9p7J�>�t|�r�H;q`*�9����P�pSP�w��&������=EB�#���8B�vH�����;�b7 ~~kpf�3�`@�.�4��v��n���=������l/�{-��t��~���t~�y���*��`(/
-������ml�[����b�=����N��@r�c���u��J�
�"L�� �/�*��-^7G!����.D>��D������}x+��+������f��^+J;�r<}�_� WA�b�a���z)�5eF��v��=��U�|T��NYY��u���c��0�?�Qe�b
��B����U_y�E�N�J���{=�����6���;
E�-����8����;�����d��#~z��^�T���	�k�D����J�A����;�_K�\��]YP+����oj�t|~~v����z��$i8���w3�K�HR!E\=��@��R�`�W���^f�\��ZUO�����]A!\���d��b����}xwpd���6�.���I�Jl�[�t��=��!n�5��U
E	�$��s6���bT��G��sY�u�D�jn���q���%�Q�l�*����L������/zs�E���0�gPpl���`�U��V�D��l\@U�WK���}��_��KK|�M$
a����a�\�L	&c�0�c������N�j�p�{'��3C�9�����Hg�����h�������$
$�����5���Y�]����J����fI�����p��(��
��16��y1RAhO����=�Sj�-������$����u�m��-#{LrqTJS�@>��*��9��@�z��B�F��7�J����{?p������O^�!�3���_��5�4dV��kzO]k����v������tf�B`��9��q/���V�w����D~|��e��{���?��c[��%,�Ar��������d�/X�'�e� �������=�&=sv��#JL0r�w����q�x�0�����d�\��'�>��c�E�I���=����8y��@�*�Xy�"8�R�����?�P�������MW���o�^3-4��d��4���ehQ�h�m�4�~;#�������'�5���9o1+���g�~P������[�k6�����7����k�s{�,�Z(���u$5���.��~%�h)l�Ci�:��(�p�$�Y��1��u�')Tk�F����+f����*��9��m��W���JN=�}����:D;I�U��������$��"�Rc*���
D�HbL���:wf��!c��j�W��\�6^��M�-�?��sE����z�|H0���8�Bsgw��
BD�����#�s6������!##�1y�;	?������yE�.�R��>k��
u�����)���Q���j��5�B1��8���T���4�����dC��;����G*j�C�8�Ql�8��v�N/�ui����UC��q���]������.�5��^!���`m���8��j��V��
I:l$��r?��*��qr=V�|S�d�_j���=��l�VR)��?8��Cg�}���pS��j��N9��`��G��,������W��Z9ZF�[���>�#�;(�s�]J�1kNC�f��l�����_��������K�B&�g��*��:�/���Sp
��s$�@�F-��[�*� t�T��bN�y�	���E�:��l6�"G!���j���&(6J�����K�b��S�Z��=i?�I�G i�s4Lon�nbTM�5P�����bveI�uv��p��TJ$���+V����Nt�A����Hf��|��'��19_�X�X�/��H����������4ea$+P�s�����$*���K��EA���������PI�a�u���%��*f�Z�@a)�����|)�9�R2��
h�S���#"�"9��oGX*V�����z��}�M>j~�4�g�O��%8�%��4"�6��W�,	�yW�UIL��	a����lb(g�� ���U��/D����o�C���S����/��aH���rYJ����[j���Bf��~���e�����?T&�'��$�'��*�c�}�%>Gl�$6}m'�[H|����g�dn=���I����������n�EZ ������-s�PfE��!Y���&-%�������Yx��\�p��x�+��^`*f��;�!gv�UZ+�Kp�_�D�����:k�S�0�����#��RW%����!����r/���/J�VWV�_���J�Hl?=gA�-&y�E6�{`f���W��8ZKQ�*���?@nt��
��.:�5���{��8)F�N#R\���I����SBQs�����8;+�������3?�4 v����:�2^o+*��r�#��X���v����!�e�MH�D�RC	�3��Q����li/9#*�������a�f�h)jJ��y�A�KVQ�����d��F��	f3�g��7�3QzFI�)�im��O�q:h<�W��4Mn�����C���i�D�,���#���>"N��������Q��'��t�dl
'E�H8�2C���\%�`�m��"��^�D����X��\�����'��
��o���-2D;'b}���0�o.m}��6�>0��=E�Jl�AWH�"��u{�zk�P�v7=�||[G
�De`��j�	����U���Zm���4������**���&8�����sB�%gS��$YcS6��e���1�2v�Z�f���Z[�_{���8�a�����Q�^��q�,J������ #A6W���b�^d	�]se%��J$���P�� >��9����,� �j��������s��H���3����uC�������������N�C�4�o��|����W=�����Y����pS�t����4�\����������S��%���u�A:}(��V.��	�4�P�
@�Q���fWI>����Qk��k�����#]N���z"�C�2��9DY_�����A��,hh�!3,�^�z�r6�>�����!=�w���^""��L_4W+�\.'��g�����=7|�$�_����Y���&��D*�#@5�����]x<+���4@�;��{����`���Q *N��Pnw6�k*��S2��t���/�����~��\��O�"�
t��dA�f Qy$�JT��7j�L�v��cG��Q�W<��h��P���cHoFX�F ��ki�A4WY�2gI���� ������"
�E:�vV���J�(��z�Uy������*W�|^�T�:�i���d������ty�������[���Jy<�(��"�FS��g�����,��?�QvS���VF"N��-��A��P}�]��v�����v4��&�R��U���.�����iI���<$S�������$=����3-��Uch�d���l���V�+�Q�7�����K��E�����l�+����t��k�D~'Y�/Hr���P���i��D�D���{�\�U�6��B$�?���"����]i?a0�����Uh/1�+� +/�8	�IF���p��Z���&�yVA6���f��7?��R����Y\��o�T�Y�lm���Nks�*\IL���?��<�`���%�`���C�r����0>�;�z�P(�t��{�[�.��a���{�#|&&(���F���{
p`��{�(C�R�t��c-�_�N?#Nm-�M������QG^�8�@y�9
�x4���:������������l��s�4�c�V2jp4Qk���#{���a�l���Z�m�8tG[j[.�N����$�TY�
A�2����w�$O���3�Y�>x�<k�
z��������s����������������������%P~AROW80]HY2zT,����5� zO�fY_1R�i��!^���M�tO){�R6O�)��zJ��:���w
I���t^�mjn��C�M�G�-�,����Lb���i�M�0K���Bc�P��2r^s�4����Zv���P�L����swr���";��7sH�"��Mm^XUF��#�JJd�k�� ����S����D����`p�R]7�sg�Xf�>6���@v�������y`�"��.������ML���8���J��1I����A�|g"[s����]��/p����j��]�W�������Q�0����1�q��0~�;���V�w2�x�Uko����7���
d������5%��F���=h�!Ewow���/�#]T�]��O�<����@W�gu�W�?��_�����^Hh;����������7�]A�V5m��P��g���&;�@`���|j�����h�j�����k���EU7q!����%i��L=.D�|b.��A�{��P���^4�i+�������>�	(����`J]gWD����
��>Z)��8IPF1~���L�`������DXy��:�1 ��
��v�I��R��S��F�JD�b����%�fJ�����Y4
l��!�����I��$�/$��M��R_�j��4ze���v2��^"G:&��0��C����{��� �����i\wqj����H�������},�8��8GH�wx��c�1��t��(,�QJ���XY��E����r�;���0���I2�`a�������5�;0R ![z�d,&*�)���~
m��a��/1C�x&	�+����,�����
PH���CP����.�^�Q��a���'N�RI����f��\�~��3q�dfy&�w����p�6����*wV���������|J��q�[� ������k��#�w\��w�������|��_j�����-�X
(f\�RE��;$b�9���?J��	#��
�A��BUc���:��D����s�[�cHb��M�����cJ��<c�����ke[R4^�H)�:�s�`���E^��L*���og��<�6�Sf8<�jU��0�VH�
���
9�����Z��AQ�w�{��� :�
el��!d�E�6>��cp=h��^���]M�Y"�}��[%>��e�c?��T(�d�Pz~#6;s���yA�^
�#�(��g��F�_� "�L_�/��YuQ�K�9�x�.E��+�Yo�j\��U9��[^6��`4u���<<� ���S.��+D�~�	+��	�������B����k��5�@����`�8+x+�x`p�
��.p^jj��8
1��~W���]Fn����g�#��^�����:e�z,�R�k�F\l$��d�*��(+�� �d<���
8��+�qa�^�����sUs�$�l�B_��Z�����wZ���������!��x�6Gd����	�!��''e��~7��
	�����e�r����W���l�d}1K]�C�|x��dT��o�H-x%s!5����P��8���+�_�}�kp���L?�u��
A���� g-����R6U�����H�����d�����nQ�x;|�zD����=&���*�z�o����
+�	��v���[	(�y@�F��a�P�cE
�s��X���)�
����]w]/��U�'A���w��W�c�.-e�����U�p\M
�<����rL��H�L3���K
��v�+kR�'N��8�U��J�(�CE�;e�y��q�D��oT�����dzt�1�l$�2U>k��X�/��V*�P���:b��N��y��5g�~@�A����e�}��,g8��y���J�4���zm�@�\q�K$��$�g��`�%����{`���*X��f\l��*��#^����
#.����=;�4�������,|��;�������.r�
�e��[�e�8����f-����$��G	��+��P��z���[(v��"2�K�����O@��l�vx�����T���e;/	%��h��e��[�L��#��?	���1X"5Y��g�S���S�����Y�B��k
jL�N�(�<�����g�W���]!�**�(�����
����S������M����kb�����������h�	9��������_����@F+��*���8	!�
y��*0rNln(�������G��E�Y]G�DF����cS�s�����7`�#�g_pO�F`'�AX-Y�������=������S�&'#*�Y�� �Lo_��%s������=e�R]�v@���TP�j��HJ���P�%�O�:sa��T����V�[���4�����D��k���������Q�~��q�q���Q��/��'g���wtQ�1y�&��%��>���
�u���f$G�$�[�������J�{J�Z$�d
��M��~T�w�|�j�.�xu�o�&����� �-�T�}?'�a����������b�*���t_4 ���w���\L2nT��?r�PF��C�"��|h�GU����*G��I�k�J�����Q�O�C@���=��r��
V��1�~{{����w���������;0�+��I�?����HI��3�Y[UF��
N�c����4P+u��G��C���������#w��.���vmZrW�������JH
#F�`��D���A�)�-VJs���+�TZ��o��1���hB&�C�GTM}�{��!p�@�A}����H���3-?3��Y�dJ�.
���aj.B�9�����)2U�����f�D]�'�d)�06�=�n�\��r�5��3�6F�l����E	]B
����^�^	�K���aC�"B#�0��dT���$`����!�2�]FA��(Q����K��������Euqx��z���A�4����=��F�6���\1p*�T�����F��"�<o�����X���
�����-TD���z�y�����x����.
��[Y
+OE��-��.�f�
�:��V�>���z�E�
P
�5!��I��M��
���%r�e��#��'�O=`H�)�2*p���4�i��DQ�&p�����	�k�'�ek���p8!�s����E��j=��$�rJ�a���w�H�gEl
xe�����C&+����s��EZ�>U��9�������b��Y|5��+	X�5��{�.xja�M�&��9����
�w��Y�:>������&���<�w��fC����;��
�QOJ)������	x�)�� ��
�
�e�2Z�_�=+�J%bn"��Q����>��$�<�>#��m9q
_w�?
��"���os=�V���{t*������J��Gi�e��q���*|��Y��\���P���l�lJ���#s�}��eTD�7��Be7����������-i��I��>#���	Jpd]���5��0(�!{���$��yfo��L�,d�W3�U�+^�'�zr���@W~��n��]�]���,��O�pE�;]� ���������,��Y��Y��Rq/��"n?�g0b���;	D|T~s=����L�Q�t]��
^��]��]���qrzr�=:?��lhc5������������\]5nb�h���O�33�w�����f-��6����N��g�����c3L��6�sNis�5x9��B�-����-����+�]��w8��*�J�%7��Z�E�Z�:8����|���\?���p���]���}!�F�iP���i8���'R['�b��9
MM`�B,�w��1d%�Lq���J����gtf��<�E��in����u^���^F�4�8zC��??;}����w��?�������E�����s��������A0	}�C�?F��,�c�s�fi^�+X��t)���r���p!���k]�V�*OE/�*0i���j��9��
I,�-I~���4��E��b#�@���_N�u�H2�����,F�2�+	��3Im��Z����V+1��~S�1��-45��RU��AD���$�,��r?*Q�)�����3��K���������G�*�_z;~(�#���e��8�����Cb��U[�C���B���TX�"��R��lL�����[���6[u������/����-�Y�7C�S8���y�]�0��/�(���q�H�����j9���,��* $b-r(N�����:@�0V`�/��W�FYG�N�}x�L%��]�<7/�Id
^���If����T�� Mg��b�S�P���s�{@u�W�)e��3�-�eG]����[��X�/P����l��0�{�zk���?=L�|���7����y5>�����		��������{k~�Y��UI@���Nq*��@E.F`XK���cY�2�D�a�|7�m�N����R�����CcGY��F4!r�����������ns��9WA��&7��XP�I(�\*hs��*���?�p�&s��ml����+���:�iG_�#�����:����o�7�\�+�=*0�I�|�r���
���o^�}��P��E�(���|K����{���N���3��k�T��k���a{�7%��b-�z�����v�d��������.KIg��U��iHK��|Sa�����F�$Uy��S��0�a�4<�P�Yh�9���{�wN�+w�a�,���b���b�<O����%WJ��%���g�/�����T�����kic�P*	��
��2q��9j����_���
g_��eSZ}���W�������+��j�a�����L3�;W����JN���Q���+�������s7�a����uKs���j�7����z�T�l�{�'��o�T�$u����r�����������������(V��B�1��Hg����w��L��i���tX�AMy|����X�O�3MA�[�9���A�T�C�P��FVQ��kG���W�)PF
4�B��RT@�������Ib#���|��A�a�2,P@aZ��Z`�%���L(&�����m!�{7`���S��)R9�IrU��_��(����xK�JU��T�C)?S���4P[(����\�+}���ox��"��`Or������U����Q�E�0�=�&W��v�EQ��MH\� �%/�qu�����X�`9G�{Q\(O�X�X��u��[�j����9����1��XZ���vX�^g��b��tiE`�^l~y�����������zf�W�Z���"�������>��Y��k\�%�������>+���+�L�.���
�!(�2���x���{��T��4���VZ�{��XfQ5�����T�������,�J��B�j\�(��(�����:TP���~�pL���W�Z��#����T�Va�S2Ja��5���]��K�$�L^S��ssZ'k�
�<�!' >���`!�o�>�� �N3!Rr�����0��� t���m��(87�Jo6�:���f�3|����,8�c����E�`���-�*���N��G����G�������l
�����n
[\����Gv��iK=����hG3J^���	A����?J���e�\���R4������5X�D����`A
���^aB����b���4�E��o.N�w�L�oQ������3y�S3�"~)Z�~�Ul�l���PD�W����M��A�
�j�J���Q���	N������/�?Dm>jp1���O�
� �Nz�����c��a�O:O�(�\1��jKN��yG>
�P���=Rp�����~��|�4���{rq~��!���;4�_�"�%�v����D��5��y�N�\�"&����+���\B����b��^
6>���uz�����5N���\���|�B���: ��6�-!������Qr�G��"��t�r�u�%j����������K�=RZ��ZzC~��
z�\slV}�<.}�'��a����~jT����3-x	3�bm�<7S���Z��=�M�?��N��O.%0��0)���^�7b
�0y�d�����6]�~��H�?��0��2j(���qg�~)t��Us�q�a�V
�%
Z;��_��u���=��W��p�Hl���/a���
�\��	.�R1o,6���HH<�@K3������to�A�G%��h^�����[�5�Pn�Qg��R1,��=*z[a	��NH;��h�����8�_#@��c���]o�o��,��4���'<���EW�"��>�����{����A��wm���no������Xm6��������~������D� 
����%��"Q���"�u�C9��A�0ug��.~Sy%��?@8g��Y�=�#��E��k�-�[&��\��;�y���A����vO��o�!^��,����7�m��xAO��:�<|s|JKS��7�!!��������P����.E��>���N��s�@�@�{�����?���j��{���tN(���T���������#"���c��{�r��@�z.zj44���uXc�>-zN$\�5�u��9��N���R���u�"�j�>�UB���C��dy��%�N�!�S:��W$N&�/����Y:����]��5�R�}D����I6;�������A�]tl��3<�����K]���Lq:�Z��"����ejX~c�R��a|���w��D�vK������kZ6�V����wf)d�s�[F���.����]�`?Zn�|�]����[�M�=����_Y�D�WFP(-�'R�_F�m�b����(����sh^$%�VJ��������pJ��|��6*=lA���=[�9������`�T�7
�i���A3sBteM�p��z���2[|1��[���,�Z�,�i�6�4��~�t��[fZa�e��?�*�V�r�Cg����
u-��6B�]��$��������������{9wF���pf����K��Pysy3�8���/+d�Z����v�&AU�l�R���'�/	��sU&�V�	���tgE�[��S�5�hno�.�@� �&g���a�\{"ey9��=�d��(��]���b -p}k��4�[I�hm������z��_�Z%���<��_,i�|��E����H�;pa2��Nm��;��*~/����L�������
�`�������"7����Su\V��}n�Zz�B3���Z��s������6�� _h=�VbB��o��o\w:W�{���xy������-���NJ�0�o�7��,�7�����������i���������F���$��4y}|�=?~�����Z^�9���������"���h6,��,��H�s����s�;�ji~%t�YR���f!��,�&�&��77�im������CHm����LKb;����,|��F�F%~�f��_T��+�u����Bx�`�f��_�&P4{����Ivag���{!�|�>�mB�F{���'����)�~�U�������v4��
:��s�U��+�,�mnH�c�Vw���D�c!agP��,������sfZ��23^���AD�%f�'s�{�N�{o�����{����m�-�����3�����nmF�����h���>��>]��~��9I���
�'���t��8_����A���u���x��n���#����:d���s3L��Z��hH���q�EK��AY��45/�8y2(\{����pL���|�(��
R�D��Rt3���SW���6�Qk��U=f����]~#G0�L�{��o�����=U����u��U�e�4U(�Q\DP���]��p�t�s��������7�e��sH��������MQ;/b5sUr;��dll���61�����������-����V��"�����
A����o1����
G�o��4����b�.�K�W��]H�#`��>9��,�c����
E�'���/�-�������`i^s�=���vO�FB�����������j�_�rof��2
#v'��a�����f�����czu���3[���L��>~o�B��Q��&	� WNBE;L��;z1E����:k<�}����j������iR��}w�����J���I�Tt�I&��7����.=������]~n��?�����j�}�gA��L����RY��������;���'.����D���}�ks����T���"Vu�@��D����?�;0������N���8e�Y(L����w��7���b>���r/�f���I@�pXkP+�XrteP���"`������nr�b,c��]�t���_���p��W��.�Y��"P��m�_����>�a~TvL�6�|So�QMi�Sw5��J�� O��9�DcY�`�
���	��@���Oi���0,�
,�`d%-J��m���s��yN�`\
��\`K����j�p����o
g6� "|Q���(��Ku���i��4����m�$�&|l^K�����5���j��������iE(�?�Joe���zY5�E��GHZ��tJ=)2�:���x��U:�>4�A�8��g�������[��m�s���&�������2o'�8�?�P�-��>���cA��~�[�1�WI/�Z�k�%W�U��������������
��pG�����s��Pt�LM�*�h�W���G*�y^��o�����+�X,����#�om
T���P�~���O(��N���C��'Ot�'R���/����,���$�|������N�y2�!�-eF��n�d��zW����/}I?��faIp��V��`^l�^Z"*lD�.Y%�H��"o�,�3�~'��85K���0�����5�u��` ��}XI��Pe3�b�d�Rq7����'P�;Qtr3�P�k<����G4sZ�K��D!�B��(������R��%g�P���
H^��O>�=~��0�m�s�c���{a����^���*�!���������"q-ZU���hcQ�gT����������[v�~�
7\�i���-�~�:�%_$Lu%F���j����.�$BF5`�O�����VX�i��z�{D�{�!CJ2�Gn����P5.��,!�����t���<)�<<�Rc	���a�]��	�	��$1���
7��4��;������aN��V���%O&0�����J��k�r@�������T������U:�
�
:c$��p��#{����0WR����(H�(l��p_'���@��Apl�l-
A�re.��oQ��p��#Xv13��I��?��
��6#-�8�����l2����X�!���j%[����������v'5���>	����2Q9M��������l2�	Hqn*�`}�YFt)�6�����eE�} ����kml�������><�N�F�E1�sMF9��O��/�w�$����C���]���*��h�@�I�!W�@��P,����\.f�<�hE`����n�7-���i�2-��@�����IB�-�|�D�f���V�`�[���?�ey��{���d�+7��F}�����\���7�m~]��}��������<U���i:��Z�(k��	������g=�
��f��|$����g<��n8LS�yU=��N���j��!��
uqk��>�s��"��[#��
q�~L�M�O�%��L	X�<|�	3���]���?-:������o����2VY2�~�����K6=*5m�6�8]���_�<?<�X���]�������2�tC<�}6��L��hc��1���_ccXe��z2J��rT����-�k,���-h
���9~ct�wf\�l���Z81��;�^+9�I���������^o����;�~?����R�v�7�����C����A����x���\��o��:V�\c���\��Z(����o��4�
�U���������,L�BJ�4��M�
�{�1��D�!��iMY�p��?��;#h�XK����W9��
lF	}�>��F�4o�|�}U�M��Y�$����<�������i�<+)p����h�]�\��.F����[�2,#�	]��Gt����&�;�}��7�<M�
�U�0d�����s��2��
�$e�xN��3�@i�������
������}�a�-�����)y���2-b�I���Qn+�S�_��p]b�����������T���y�[�y�������CX����i$�rM\��H#�AL=H�pf�K9��"9@*//t9z(��*:,F��*�e���%]#@�+�+aN��U���'Z���_�c����T�m@7�A�,��w��6�J,���g�-q����G��[|?�o�������D������w�gG���1\u��"��_���������$3
��Qyb�9bdS�gD��������x�S�_�P������	
(B�M�x�f�������v��"R�h3�����W?�{c��]v���:9}��-5q����CcHyFE
1(�
���uJ���r�1��#��4�BE�p���"V�o�5n_s��r�%oCr���=�������#�^"��0��{��
�M������
_j���������8�m��M��-�{���(��-P��������;*�����M_Z�n�J++��7�C.<��������)YZ}D���+pk]x����5���g���'�~���.��<��F�w���}���}\|��H�	�cz"���J�� ~���%C��������MxE0ZL&�]_�6j��(��e)��Z$�����_0��~W7�����U;�����"�DU���������1��SU"����=NJ4Sz!����g�~��g��bP������-1��SA���������V�|K/�H�I���>z�g�}Q/�TO�����2t�g?x	�����x1q�a��~�W�;�����B3+u�:����e��vC9��<��n��'����!�b�+���0�����x!���.�0��o�K�uo��&��T_���91�s#���Pr���Z>O8����������{3`�W�c�L**�g���5��`�%����.�!���������c6V�~�������!f�����Rv�(�cZN]aT�J��Q���T
hE�G�����D����Zk���Q1��xf�*5��%
}��4��D��������-��hFZ���b��[
z�'�<5y�?�)�P1y��Y0W����|b|486�@���Y�b�k��J��}�%����`�rg�I�Qz�����������������b��������(�K��S���l�(�jyq��^�5I����o��Lqyvy����{�U������y��=�'42�^���$=��e0�;xm����		���Qsy�&�qV������C2��~4}��'x�����ZXTf��d���e��
x���h5K*��
�8�����lLG��S.\<wa��_K�n������}��=�K��D
��=����O[[�U��8xW��M�G��r~�&`�]y�i�^�J��\����"����)>^e�LA�]����a�Y�6�8[r�<��1���/�����[�E�
���,l������[+��U�F,]b�d�=8�N�N��75";�����j��BPY�_��'�G=������3�Z=%,z��j[F���eK���"vQ9�?B8���= l�0B��Px��	�[���
<#�Or3�$�7��c��G��G�k��p�Rb?�,N�'5�����7�E7d����p'�:�SA^	�A��xl4(\�O��E�'
��A7bJ������%Hz$v���|�������LV�/u1�%��t<�Y�Qf�C �zh�H�d�q�QoP#9LH�@�MKP1q���a|c�W��C����.�i��E�dL��D�
����T�y����/|O�);�>}�+�d��p����{��w
�o^X&9�HR�1��l8&C4���l�7���
�%=��3y��n�86����W��+�Z�B�9CWi�S���������7�^����K���)t�c:$���M�o2fQJ~�@�Vg�*�r0n�&�8�����r�Y����)C��b�Z\���}�+��hY��.I�5EF� ��T����E�'xU2��0�>���p2k�`�*���7t�P�a�%�c]��>����I&��4��!�a���>�J����U�}�+G���f�g�{���u%sH|���.��5�,�Fz�|�L6�h�8��b��a��+�r��0R{��n|'\��R��������jSlZ�Jv]��b�9���Z�X[�n����2�KZ%����F�6OkO�f'���5����yi��q���_�@���rxM�yG�2���5�G�m����Z+Rh (ZMh�?	]#O�Ux7�
���FZq�}����:���.Bm����wZ����� �g��������[>��������H��1��"�KES�q��e��8h#��7,����sY���~�7y��e���b����_}���<�<6r����!�s��~��pz��-�������4��4iurI0��^]��^SG�E��GLm01�)�X��w��|t���if���a���7�))��P�z)�S���Q\�a^F%�s�Xt.~9<?~�PK����G���0����!h>�4��h.{���`��l~/u	������ #M�����s��b �u1�M���6��n%�C���O�J=IJ��	G�?�������D�����5�����,�'������,�w�JF	��Q������m��$��^4�Ip��eI:�*��C2�\+�����T+e������AS^���?W���A������������/}D%���G�����_~:���^ks������@����+��[H�A��1��S(�q�c'B�L���oo��q9�t���������/�i�~��=��y�$��gl�L��(�&f#V��D�y�+"�5r/���-|�_oEK�[�#��R���[��1��r�V@�Bd2�t��HF�����h�q�V���6:����<�l���D��������h=�lRyn�u^Lf�Z����\�,�t���x�������9����A�\�/��|�h��J�����������r���9��%�����/[��E{�#�B_�"P��1����d
E�����?_���=�!-p
�����u^�_�_�R�����|$�l��6��bo|�b;��	 V�' ���=Wv_	���=�k���C��J=Q
���XZe�Ji����/����\m�?�����"�^��a1s����f��u�	��cCng��5� �3�0�>x/.K��DU[�w�|��D���l)<DM�6(X���lL1H�m�cI��Y�,Nh���7�>�|�������n2"k��R[sk���E���,�3��4P��\�����j���-X{������C�9�{8�P3�~��d������6�������J�������������ze&Z������12�>#���s�*�}�=��<�YH�����3�����2����/��'I�s� �S�n���Q��O�gc�f��H#�G\3��������Pri��
�Y��vb2'��b�M�����:�!U���	*�fW�N^��VL��V*�F&�t�S�V1���f���%�T�]�����H5G?��|r����8��P�����������.E��]R�n���[�1�0�M������A��F�����~�"LsuL�jO^Uzrgk�~c�o�K��?������\�+����k0���^�%���n0'�>����d��%+5��D@i�T��<��QkD��X�KU���]�`<�z�h}FN�������n�b����*~q��.�T��q��[X0>�8	�����y�Z����>��>���������T�a�WO��'3�&����I7n6r����������;[�a)s��l�y�`|�k����"��H��6��_�EH���*3gK]Pj�xHDYFL�N��n�;���
2���'���Xd�$��r�
c��3N,A�x���t�O��\�b���k�!#��� &E>��0�BWp���XR�c7��6��]L��jZJ��b<��,�\�bb2�CnL75�����5Wu7�(��'bu6�^�%�i�;B]���Q��k/���=fi�/z5.{�;����f��c�W%���P���6�����?��}
����|9���(�
�k�vS�>�����_���0�\��f�i��U�+cJ���C�RxI5��B35�����xe4�
�m�M{�j��z��Ln|��X�t����H���Q �s��@x�������T2�'�`X�hQ�Oi��/=IH���@�Q(���9c�i���������7msi���z��q>|7��T�e��A)���O�E%X�Wi��B[�?@H)~}=}���W��j��q��{V��o��,�X�E������"G������g���{�P$��%�G�.��
�������Q_'��V��Ui�v�������TG�b�`I�'�4����P.�-�j_i��'��4z\*�<����A�/O�A��UI8[.���2�F:���1ad����T ��!t�(s����!��,�l���^��Sc.[[��$�^���q���OpW�3�����g��67�\��Y�?�*�-��)�Fl.��g�p�=xJ_W��r��h��MT`��I�i��#�0��,Me�M����(��w���a��e�|	Y��I���sp�)O��(D��d��d]0��l�	�E|��]b�@IIx��='�j���b�?m��2M�D�����>/��
��;a�e���|�<ww���rJ�;�+TG���gp"r82�T<����d�����#���9�
N�����m>�<I���O>�P~c�+[�#�)a��y����[�+5I�)�.�h��)&��s�0^���LzI�CE��e���C�2�#Oe������g�w,��HK7��Bvj0��B����:�v+N����\Bac�3�q���p��zA@�B��%{J��=���N��g��r,���_sz����X�5�	,�<�xU���5:�*y��j�L�L\�qq��m��6�7�����W���n�^�9����c�o��-o�Rk�B[�[��/�D-�C��B�����iphm�~eE0oa2c9�������o�O/)�����	���~��i6��z��P�P���v��j���}�����<�x@:*�|Cg�!����9x�Dl�������__B:=�<~�Am_�Np�%�_Tm�1�����d6�Yp�rd~!y[
�m|�9�
�=�m�c�K�Pg^�,n/��<�.Y�r�g���V$�����QT���`w�Y�l�v��4����a1MLL����2����g��xZ��l���4i-�5�z���GtI���v�(H���M;E��OsO�\��q9�EE���T\
�$�DzM��F��9��GQ���PI4p�p�FD��J���$�
����^E����b�sno^uY� ��K�NP����	Y�h�0����UM��Hz���>����s �l�)��Zey	���I����2��6����	��=���4a+�p���x�*{
��S:\I���d
��j�9������C��a��A��_
Ir���c(��]��uh`���.�,X���������C�[�$�����Nm"��q/#��SYZz��X�@�q��Bi������*R���j��51j%�����FM��.��������C�IeaU��0����J�������*�J,+��'�����83E����r]�������s;�
��<��qb�?��>���W���I�!�>����aOx0�-�(u�9��@8����%P�W�c��Y10q{yx�=� ��^�����]u�0!���nu�k���������9A}� &RX���<T���$XsmN���Nt�:W.K�����@oTb<����No�����Tj�9zCh�3��*�6���i��F���CKx`�F�a#�pof*��%>LfN�69�����l4���R"�Tlg��qT���Q"�8D���`����`L9��i� $!�nUw��e�J���Y�&M����V��������0o��ToC��g�vk���S��Ud����fR����=0s
T������q��A��(*��`��x�L�MLl.��1T��.�`�Bi�*�9�a(9u���ZO�j���O/��-v����a1c�]�&U��[mVBUK
w���G%��_�}��j�"+z�T��7(j},4yJ�h��B�~D��(H��+�g�WDh���,���;^��re��W��
H��7:H�����Y�� ������b��iX�� ���@����$�{2*|�L{Q�%����8�~|�������I���=f[r��z�dY��%\�`��p����(XM�3����J%�~:��|���x��_J��<Ul8b����#[�r
%�tE����Y�.�$@���Nr�,���z��5)B�z����i�v�RH����eM�%k�O���R���U��b�����%�������n��|�M>2�!��f����f���B:��c�'�#����"�	3$�~<#',]�����}/�tZ��`9
���������V	���&�����*<$�q<F�I�[�����ZX����(XO�k',��z��>JA�E*�w;���H��z@����wK��������j�+��`��������F$�Al��d2����:���z�=����������F�l��4���.���d���[>Y.0E���[�p8EW�-�U*�"��$���U���8e#�R]3��ZFo��W��D��^sR��<�������ZT�.�n&dN5����*Z\��l����w�	Z�E�T(�]1D>��r���������;��
�{�(e^gk�k���V/y����M�LF����`
��'���\/����|)0:�����I��K�b��u���(���K	e)���.����k��!
x��4G�m�S'x+cj�P�8R��wJ�"g����IGV���%�x�@�*�r?�W��������+#�|$B����GamD�i9�D���C6���.8�!*�w/�wV�u�����G�>7@.���UqpKA5������(�4�����
���_=bdm�m���
W��m��t���<�?�� RT�Ys�J`E�������Z����l�F�/���J0|���m%��@/��2=�~~ We`<�Hp���g�q��
����Y����f1JUj���|�������fsa�i�SAj�a���o:��|�3"����Z}��#a�Y������<��5`C$��!N�x|9L�d����='�n*s�
����V���9/�5Q9'��V8��|r���B��*�����{_'�*�I��*Pf��� E�a�4����XF��S�h�V�^2�5�b*��q^-?UI�k�N�f��J���B�V
/�F�F�W�
��&�,"]��J���� ��$������fE��b,nMM�wWU�������Rd�X��p�����-��"t�)B��Z9N������5���e����Z�5�Ph��^��'������lsfO���]s$�+OD�}��jG�#sN1+p�CT�M-+5����11���Y'Q���,Y�q��e���������
��v�I��i4G��g[�5an�cs	���2���f�z�?��)F%Z�Q��%�����"��{�#���yL�3��*�,a�#�����<��0�R��B���H�MR2\��E+�J��������	�3#�g�=��O��`�`��.�14�����cZX)4���a,�9����-]-�$����it�!�^�RE���7X��D�,g��;��Po�W��e�^�Fl�X�o�N������Z��g��&-+2Z^/���a�k��D��9��{"OT8j���R��"�D�B.�g�c�H���I���5~S%�k�G``Y��R�-���R",���ca���,�{.K�9���v �?��d\%7a}�@�j���Yc�~�I��>������J3W�:���'N�ckM���#��^;��QOe�]����P�m�xY���w�5)�Y��&�1@K�>���>9�h�S��zG �����ETAz�#�9��i����"�y�M�m"�p:j������Y�{�,��[�xnx�L_������f���BQ����k~`"7O�CF�=�*�P)?p�� 3�|	GNWF.cc�|���0:����(o�����6��;,3Q�
�&��F������#����N{�sG4����%�J�I�Q����y�3���D�+y9���Y$�����%�����ZkB�rV�v��9�6U����������
��w�y�p�O���d�>��;�J�����0d��i|�������U���+!�������8�-�<X�����rA{	cx�X����FK�Xy�|cf�x�Y;��_%�Ik�Z������R����2�y0q����HE{~�.K������]�H���B���[�	�Pm�r��+CE��6 	���Tus�/�RK��"�	n��s+x��y��{��d4��)|��_V].�B�KwSX���}���Fb�&E%���{J: _2)����?H��s���R�;������jm�T8�Db�v�p���H���d]���vma��L���d��~]��4}�aPP(��{��8�-H������S��8�!�$������gv��Y&\<�3������(_��<2��z��dX��^:T���W}�o�P`���NW����a���+��"H��WOo�n��� ���]��<����g��J��h���B}t�a�i#�l���"��X�����m���
{�YG)*��c��
{�y��Ak�x	�!��d��q����d���z�EG��������&���gRHg������XW���]�TzaDv��l�NS��!B�#�4Xj��i}�}��|���ic-���!Ud$79q@c�n�F�+���#�&��t�0~��X��}�
��	(��D��^��Y��9���&�����x=C-U�����\t�KF�t��uj�H�c�k����m2�M�X���m�=	c��z���]Y.pi1`-��6�J�i��b�7%���B8����b]�H��sb���OR?�;��7�.�b�������a<�oT��B���V�Q�s���J@v*��=9��9{}rt������LA0K�N���np�,����+B���n����
w��jm�|w���O���������[_[a0oN��O�����l�a"��5#_*������+��vh@�l��_�|��\-i�1-t���{�AH�������t:x���$���1���
�g���P6�������eh�Hh�$&�]��?��R����L�$��g8:;A\I:v�ZBb�l�'��EQj��1�Mm�} �?�L�A�-'��%&GH�%[��7�D���]O����S!c���u��
!2����P���������T�s���Y�{�Z�����M^T=[�\���7}a���cW�1�GD�~Gu���,�I2d���?VOJ:rh>?Y��X�V�����~��z\�*��%B~���j2�FJ�4��V!	��<�8�/�d9A��5.�2�<��[�f��9X��e"���y�[LE:�2���Y)BE��45�H���ivM�m��`/\o�hD�a�}tg7���F�PI�*A+��_��Q�
�$�d��Ol��������u�c=�n��,�������$�'*��1��sd�A���f	EuA���S��O05���r��@O�t2���f��f�Ut�\��0��d�znN�&���Q������ll�"�?Bc%�,�O4�o���s3	"�m����i�"�E�F���J���M�U8(�Il�IR���".���z��F���,e�G�K"x��l���Lo�yG���D�&�l�����=|:�����f�nO�����A��>{�R�N�em6%��l(�8!#��[�L*�TG�x���0��y�F��$���D���!�����#U�����QZF]�`����kn\fJ��J����+s���$k��ny��9_*l��G�?re��X&#d�2D�"m�5�����n��[�hml����m?`��vC5���q2z���+Q������r��2�E	~�RD(?U�p�|y�^�L�7)��`;�r�&����6����.
d~e�u�&����kcq����{[G�/�7|��L"����;�<c�w0�N<;�GO#�����Z2�����.��RHf�>��c@�����U��[&�5��pL��q�dx=U<�!�b���cN���n�V��
R�vu����$��o5k�)��DZ�bW����b,g�S(+�x�GS@����Y�mli)1:L��;��7�W4�b��b�v����b���j==��v���V�8�S�E�N�e?5Y+<��'V�Hu�HQ��)9,�
Jvp���"�<�w��Xk��+�������5�5�,����n�'c�c�Lw��%'����M55����_b�nAJ���:j�wc@��W$*�t7�R�Q,�b�7)F �r��u
���q��tI�����^r�n����B�8�}�
��{M�YO�k�!\y������\p?
j����	\���=K�q"�����AJ�s���9���R�T�&E>	�:�8����3�$I����W%H�Z��3*�qHAx�P��H�`�UX��kq7p��%�pd�q0N����*o#��*0T����`&C�/*b�I�V��
�7(���m9�vj��Lu�9[=���\�����e{�s��y�w��U��"VZ��P~��H�M-inC���SSW���g��)LK#n_�'
,����kB6Sd�R���-BE�byh��{9#�>�{z[�������9��+l9L�N��Jq&8'+��cA_���0��$<�K
��2\�����2��S��k+3o�f�7YO�~X��T0���>��tM��?&1U-4"�	���PI��3�2h����k��u�Wm��l,��a1I����8�.U?D�/���u�D�u�z��&�"�`�:V���g}��i/�C\����I��P�l��C����6$��@������[@���dk��>|���	q�����
���i���j������\�e������k��89�6�� N���_1R�gu��P)J{oM�1k��yGs���1c����T)=#*����~d��OO�@�*����p�j�T����Rz�9#'qg �h����G�2H]�C<�f\"�zS���
��$��=��5�b��HO���
	:���I��A������llI�<G��5h��Uj�.��l��|��d�D�\T����O�*����D7������XZ��*Jz��������
%pJ��]Ad	�����(�Ut"�<+9�������X�e�5m��.��PI��5���JN[��C�"
�fJ��������vC�������`��jzk5�6���t*��;��pR,��?<�B�X���L��J�K��`k���Q�3�8oOxGX�Z��T��r|%��^��z��[T?g��E�YS�	����rO)�L�FN�Z������)A�mXV/+b ���h��,�����P\��p���L�����CPl�T��xs`
���7�#�<����������{�3'~�^Y�
�t��hp��?Gh"�����Gw>��/�j��!/�����pl�&s�I���/E[�2���B
��%a����u��k��!#�Eq? d�������
����_��Y|��g�o���y"���A�B�0��(�a�+��S��c-r|�q
�:���SPA�����fd����5�B+,��k�k��Sb�<M���D	�RcK;����Ru=�;S���RW����t!m��X����-����Pg�����(�cb&r����
x������]��-�f (qk��A$�M�I(��>���qY����I�]��v<z����X��!,w��$Un>�8w5�����M�S)9?������X��8�aL���P��>�)��ji1�'ED��aK�!����[��F��KO�����P���`4�0�WG^g���du4N"�#��[H ��������F
���"O�,U�������F������|{�*��W��QbY4K������b���V���o�:$;��}:S����i�
�\�!iH�km�mq?s�:�r��s��L��9��gF����,�w3�%^��(�j&sSq(��8)R� �X�CI�"�G��:���Yp�N�uH��������
?���s���e���[�"����~�L��8A��� P/�l<�\����;�S�����wP����Z�E��!;��4��PH��2��i'c���zg���"%P�.)l�&����P���t�4N���<1_�Q�'7�+7��w��*<�<���(��*3�`F{��I;�a��@��2����wf3_m��c���J�>�|��oi����F�{����|\W8�e�r���/�+�#���������
H���&��A�������.;�Yw>���W�gF�������J�b����Q�������
��!&Vtz�N��M+�8I�-����L����c��`��*��#���{/bk
r�����.�cu�=��o}i^�r����?.
���6��`
N�\�&&�L�>R)��F�1�hPU!�
�j�����Wo���4�������1U�O��k �1�; q�������f�h���Jq��[R���T>���-�-����w���f�Z����
�����\�����4Zj��Jc��r#�R�����#L��cl>�oN.���:�E�f0�j0�X��=��:������^F�_y]i5��������:���|�,�s!L�r:r%7�K���HHf6���*�~�
�pr�����hd�XQ�E�;�d������T�_����������;�i������H��W���J�������kNt$.:;�bv��i7*����w��D���<����@�A����	u���%@��	�P P*_�z@����n&���K���1��o��\\���F�����1=����!^6���If��i�iw�xy���������g�X�Hd�m���bFDs2�~�CA��!����_�vR�|d0�
QEk��[i���o0Z��������V������u�A������t�NY��a��1�9�����{��q�!3�	���&f���]�p8F�����_���;0�!]���BQ6u_\��N%
cU�(�@&=��;��3�t� ����$�D�EP��������a����R����z��a_�u�s��+�))2<�be�^��19!.\|[�B�G�@�\�_��[=k�\qdWX;�nh
��b7^5Z1����������;�o|9���������e��� +�WR�� �n�����	M�S�]G[�����\���Qy�X��g���eK�L��Nw"U�Q_�����g �J���s�q�H�
�R#���������i��$+��_�:&,]�E�i�����	���Ft���l����'^ZU�Dte<���j����s��Wi��%�X�51�;���^�l* ���������#g�P����f��}�Q��2\9v�w<�X�.z���z�'3���u��Na���\^JX���ol
�"U�i����"H����j�����DM��\�z���IZx����+;���_���E�]�XP\��+`C�
�J��|���q����@�t�I_�����(�`AX6,'������"�}7�C���8VC�L�}uu
�k4��&���)�+^10U��Cg.����I;�<��Y��[siRL�����1<�����^�������Ir}����A���@��:&&FD#�G��
0h��'��l0�A��|v$O��
Y9�Ui�V��!��^��R*���v��_�V��e���Xi"������&����{�l���V.���J���Z��Q�:�/����G�i)�B��c��Z��dCD��+����)�������kKi�bT�Gz��3;\9��"�Q��8d�C���FO1n����������e9X��^�5��R$��h`�7���J��*�W)S9�
�2I�������{iY�5]���A�,v������w��*��a�w����	�KHj�/�5�@�,����d	c�pl'B�Lo,�8����<��Uc�v��~�nn����`e����9f�KY��i{����g���W�6��2X4/�~W:{,�K-8GSN���i�5�\����"�������2���2��3a|�c��������W���J��`j�����U��dv��S�v�9�IL-�JIns���i��lUz� ����5�hh���Ve��y�0L����}(�8v�o*��?��+��
�d����I}�������	9�y���a?�)WC�Vu�qL1�!�UG� /�=P�/�����"=�gU�c��<T/�E� �/�c��C�����f��-fcbw��yTG�A0����A�Q��i�K����y���4.��Ar��mcZ�S����n�s2ta�F�����mL"2��sk<���N��IC�%�)e��5��t��UA�A�Fc�5��)�DZ#����D�q��T��K��������]���t��}��;x��m�0[���bR��XL�3@Zd�����I�GY�W���E[W���	�[�+��5ist�<U�����2�����`LJC�xj������k��I��;pC��-���+�V��:�L���.��#&2���c���&���gp�"p+v#^����\��`H��Ok�E{���M����>��rlm��������6�L���(��I�	��k�����X��@�p}���A;"A����6}����-��S�W������B�4G���c.,oav��E6�i(�[^,�OBz�����[���Zy��
��	!	��������$�Jys���|D|������-��[���!Ck��i��ju����b��gLJ���r������)��2���r��\�����x�2=�{FOi���K�r�
Y!&�eC���U�T��������e5��Z�4`R�����~����f��� �W$tx�y�P�R,�'�3������x�7�;�����>��F�y�.E;���<�C��sDKCw�-'�i�g�nz���kAp�����ao�h� lw<�`��A�_S���1����{��+2�U<�xP�
��L75��^�mc��nfO?�6��A	zLv��;4#=�����ae�+�`g�
VSQ��!p9$��FjI�(���9>z���>x�

9�)=�� L���v%���fS�:2z.e���K.m������	`
�BEB	���-����8?����4����������.Q������ �����@���2��H8!-��j���F��&���
.b�����s�-��p�t�&2V����n���g�D
��f1��x�D}��%n<��!����������oQ����������o����P��M�T0l{D����\���AH[5�S_�F0;Ym{�o�XHId��R���Mq��qo��M�������3�F���@�`�`��I��`��[�^�}(o����fL%e4��ze��-����U-��Q� �K$u���VF��I9���e�p��VP_�taE�fx��@#�����*�����}���t���H]g�Mc}�d��/�����,'��g�����D#�7���6���k��L��93t�93����t<�N���E`��������h?wZ�'�7�jC��hC�1m�	N���������eZ*��0
c������H"�L���Y0����� ������`�����P`F�$�;*w!.r3��|�xQ�d�#N�c��i�wmE�28l���<���<�AB�L(�<���Ge$�E�������������/�(B���_i4��[�OG� M��r2�*p�0�
�����z1
��m'!}���"�(a���ENt��f8���Q0-�{i�7mO�W��}��0�����2��A��G��`w�R�K<�������1���T����bbT,�kQ�,X�!�"�� ��{��B.U�������	�Y�
�s�
1\��lS��b�W���z�De����������������#<�HX��E@������B�	f���H�`��]@"�����i("�7���1�Ii$�a|S"<�H`v&������
���JjxH�����Lq"�A�tbSOF�����3��=pE�������tp(��|��~����kR�D������!q@0��
���sATb��	��:��4��7��X��E�+���7�<�=/r)cf�;i�X��;O0��H �p��)�h@���� L��;��8��_����z.����+y��r)[|'��g9�2�S~���'�e��(�����TMMKp8
Ry �:�t�X�iYnk�&	B�@���,���d�����
��jbJ���v��I�wy���M�3p���������,e�}f�n��T�X���b��m�^�|�ncw���\&�?�/�^��H{����������3X,�1��\r4�������'��J}�I�Eb����������e��yl��b_��c���j���L�&Y��J�Dj���8�
_�`
���R���T�^�)a���N��4�z�)Sz�D����I��U<�)u������P�e)X��m�����M{��i��K������+v0$%$oM�����S�[qe�P�K�A���`�f\D�s�-�P-�	�����}�d>���*�����n��V����TM�Z�F��`�X�A�����e��CaO�N��z�E�numD��X[�4Z�&\��f/����IOY��M'~������A]�=m�W��������!��S�y'�m�Al�!��='��:���}�q+s����r���#>z+�oA�������+r)�&!f��a�R�O��pg�s0�Q>�3g���5Z�~t�LP���E}0L)����U�7��V�����/\����%�l&�7���>�A�����U�J�e)�'���;,��hc�9#�+ql�$7%�;���dr&(��r<���d&sS�)�T�: �@��1�����#�l����$���~��JZ	�d��z��6�"0����t��k�|�h����}J�I�>������Il&w��\.�i�BN�%���n����V����2�����N��EIw��`���-���]���T	���v���bs�Cm��Hm������Y��d�����������j�Ap@z���&\V�r�F��NA^f;�.HH"G�)����`q%VSTH
�`rw���k���R���H��X4�B r�j�|a��,FN��A7�v�C���2�="K�!�a:�a?*�fU���tP��X��!�s�(�VvV:���MI'�	��h�e��Fv���v��h����3H)��
�j��c�R8/�;�g�����������^�^�N�J`Y:���-^~��Oz���o����<9��R���dgSE4�4`�"�s����u���*q�l�B)l�?R���b1!��!n
Of����1��YG%r���Dv)�"�L�q?h��l�8(�+k]�v�����Ib1� L�p]���������)������4:r��M��F��-�|p~z|��l�8+���iC��&@��!�i�kW����V� M?D>9�c�������P���Ft�g�]}=y5i�@�!�WVO�^�Hz���XK?�������+���,�!��E6*�����D6�-����������~
�8�PvSP���F�{�>u)�I-�[Irx'PFh�.���@��R2S�}J�����Qcz�y��������������0��u�^�����$��g��X���D�[���G4��n���C�,�Pz����P�O����h�-qC5��g8����];N��u��f�QxMy{��}?�����Q��Au��(���B9�3P.n8���+�z��,�;�c\p:�1��Z>@�����po�����M#�����b�0��a����;rz�&?�P�Z�bfx#�����e�y����9�&����~��|yd�h��F'�R�'?k',�����KCA�&m�7�Zu��A	:����-�$5��`NM�������^��w#N��
��?;�PX�S��u�����6�p�
'�����S��]hGP�t"�P=y��K �m��-1(v�����}�� ���S�t�J $���.�_�U��P��Kdv�p���TS(�?��q�B���R�%����
�����m$��a	���z�%��z6�&�{�{�*�E���L ��t���R���o&�"1�x����C�ME����A��@�Zq��1/<
���yrK
����Y@f��	}E�qr��ue�}&R���?��U��U�)]4�K��M������Xc2�	< ����
��F���J������h����]8��ZT��L��.2^EfA��k�xr��<�spqq�������fF�3~e���/���VpI�S�($�xL����O?�U�6-��]qk�+��Tudo�A��i
�|����.�5�({xO
A����K�	�9C���������	�6��&��1G�-DS�3R�
;��fr�-�\v��4��Ar9�.��x(�����.?s���S�B�	����9ZM�
P1��&��m�����@C����tsKI����
���(�^q����00H�C���=��)���z�-4�����=��of���"������V�:~��HCsU����$��G�����jJ��>�xF	�����zI`�@�L;#�J�yR�NS0�&$kNf) @�8�x�A�|����H{�a(�J�H�}���~���]��J�u��F:��I���Lz�5 ���Q�U�V�J��!��C��-�S��HN����>�����\�c�_���[l�iL�/27��)������F�$��Q�@���(��-;-&�B`,!��RT�6��x�����������S���"�0;
ej�@� 2j��}�D� Cvo��)*=����"�����������������A/�����vAk�_�@k����0:1Aow�C{�!k�� �`; �B	��;���!;4>�S�qO������fe����TmG^_�%��L_����^���g)���R��|!�v-h�p��������
��B�W��GUGi�S�R��)��epqyp����:�����'��u���J�~}���=���b�rl�P\��y�B�(������!{�7���"c^D
���
/V�����r��W��N�^-�g�Vf\�;,XH:;�����:{�V����� ���Z���I<�.�Q��T��xg�����|��@��#�z/�G(�O!����v4�CJ�v����F�3A���i��s?W�B�N�����}�~�����P��U����;�gve�����R����eYl����J��3��[ke�W���S�-���8���-��kIQ���m�J[�����.�2S����l�Pn�(��e��x�r��-~����'�G���-�_]��A>#*��'X��O7�}J"�F�|�h���uk<�n�,���G�
RE*��Td�#������|��&�"���2��4d(]�V�8��u2>�e���B=�#�yH{��$)�	�����q����(�Rj�e�e���8���T���F5�"'P��.=���F����(�CN3��$V5x���'�4�����PbLbe���o�j6�+,Z,{t����8!�R�Q�V/�&��x�����#
��$��:;�=��Qjb`_�����y�z{_h���-�Cu�&���0�� ��
6Y�%����.���F3������J�����UT7��:���)����#�{���P�}�%������p��� ��X�>h��	>��]��7�J��Mv�������X���_&,�K_��SR�(G����}�&RoF"c�*�H�ld���~7�VH����x5�~Li|d�$�B{l�����zd�-�Br�E�l<�G�Xm5�~�Ed��Q;�&�6pI��m�8q!i��r��i�{D_���C��:O$�s��R*v�)>�>9��AQ�|R�w�����h@|+4$�<B=��O��D�a^�%)��D��$�-�**�Q	dvo�<�C����V��+���C�����i<�I
k�Z������#���7��x[t��fJ�-���!���6H"8&Pu��u����x;�����c�7��������,��	�������8�>P�������H�i�$�t��dF6D������F������7H���&������@,<�^���W/��������l�8
_A�e�u���#&�kc����]�i	#aG8U{(�K��^(7��o��c�ib#V�t)�N����|������x$(�(�03!�5�&����!���E�	�����b���a��&`v�)G9xm���dk���|�j�6c��@������l
w�V5��.x��;��V��.���#�	��b��cjc���R�pj{���L��+'Z�u���@5[��bLpE�����g��F����+�����{(G
������zF���2\����HK��������F�ie�Cv�1�oJy;�	9U��&�������\
x�	�����Sy2������	;��}I6I��\x%�VUA�A>�N�	�&�n"�z���vS��^�t�L
k��;�IT��=]��t�Xh��0���Ed�&�w"�M���g6�B7
W 7��	Hf|PX9�������%�.���-��B!FA���A^�=)�=1����10�1xj��t\9T;����r��Z�� @}���:sO�����������%�Y��K*����.�������9|��9��H&8tK��0�x���S�O�?�X��C[C3m���SdpV�o��YB��l60���DZ�'�*�$�M��f�d����=����x������������e_���z�Pp;@�X������#���|r�nB��J����G|��L�)2f�[�v���)�K���������^���PX"*��:�7������J�������"��I3^W�ZAlg�����1���4	�XhC3�k��2{X��y!=^�{@R�<+j�w	��� R��c��S�a/"CDJ 6�(E�)�1$�U5��h0
"�H@�R��<��L(,���
8�|G�6�^�9��u(�u=s,^rr<dJ�M�G-�:�QiK��hI!���X	jY1�p)G V{?AI��V<z�0��c�[A�b����T���IAN8��r��"�	�R��i�p�R���xNHof�d=K�Q�sa��32
h�[�'-�,Q�����n�Bo�q���_I��S~�0l��{��8Hc��*�����r�1�Y�7KR�p�(
�aqb��Y+x3�����k�n�odAS��;uo�p7C��s>+��L�0�K��E?!��O�?���.O~�GC	K|�:�)
��7*V�`� �ig��">��4�6Y�96C�K(J��������+NO�f;30h���x��gN�n�����KP�4��� ���L�`A�V��%�O�=�T�K��%I����W�����A���g����;��B��
���������_��rYFO1N���$��PV�7.�a����jv��5%9��m%{�PM��{N���N��E�
���[r-WV�@�����P�mbF��yC�����=�r
�z!�X�T��V���.1�u�h��<1
3������8�i�\���:w�]���Bf�^�T.OG��;R�Q��x�%�S>=��9�E%BT���N	Z�&����j
cD��]��R��h��`$K�Z�k��h<Z�y0V����2d�Q#JPI�1p�!�P���V�T�5�m��!x��A�6�CxK��m�F�
���vw�sg�n���u�?���%
gP,�����S��$�%�(^� �1�ZG��#z���3�7��`e~��,e���3��s��G�)���'�fp�+Z2��N9�����L��">dN��`#]5��a��YX�4�/�d�%~����
y�,����=��QR:8��N]d��,�Z���|���������iJ4����!#������&v	����|o�w�k�*��o���%��Yk%��q_@����vAs�d������O��}���?����1���3��Q��*=�����$�uNGW|���~%�zR��&xZ=����#>�����Y���#���Ls�p,���%��\T*�[u������RyM��������L�>�#3�g?�����Nu��J�#����������P9��������('C��Do~��-�V�%�zWY��\�[=���A���R'�c����d�tJ�16iU-�6;�@�s0�l�`��J����vZBjv�(�(�������\��1#��c�����Mb;]����?� B�W}��-m��]���g����?���������V����H�J��C�K52^rU���s
�Bd�@��]�/���7�Q&��������K�`����|�
b\ W=�B
��8@���^S�J��I�����*���������F.����e:��)I*G���[��M�ZJ::������Q����6�y���Mj��EdG�V_�t��+`����}��X
M�LF>v&������1���`ed��x��7G��A;��lI_W+�3�2Q���Q�hM|����P��.�e?�&��������\�� x���Z7&C=}u/�`��'��1'����[�N��:x����������'������
��!>��>��|�muB*�J�F:��-�4�zc9�n��h��6���@�1�bC���K���	�Tlp�o����UC��%�lPM�}�v��
>���w:&���b0�e�z����d����%D��'~�RAS��e
��6XR�N^A2���/��5���	��_�cm���`���v��_������E�E96G��������GIB���=�t��dE|m�"��Z��bw���O�j+\���n�������CX|t����X�5t�����Nqs����E����C�H�"��
��r��}��3v/hb�������+��p2���-!fsF8:h�a���=�/�v-�|����g����2>2L����1���xGP�%������n��*G`&��)�x�s������A{i�,:wy��t�|�����6���@�b�����������?(��<&���\��M����Q.f���P�#z#�/FI��1�
�#��i�l+)�)L��A;�@��)�R��
�� >
�(�38���5�C!�	P7�1&��7��&W�j)���-����~=�"�{	�o
����u�2%|��L��	�%F��Ux�����@X4���q�������R�Bd�m[|n��P'3��N��%T������)�e��s�}��^NH�g����F+�?�^y��9~S�O[�r������g�	���(<uW7�D�-��A���(@��12�"�.�t���OA4!A�#�� �����~�0���f�������4���e���(�_C�LE���#�cW]l#�g�ULpJ�o,}���)���r���;�a2��$��`�uU^�*I<�e�2�&(�:�����B�XK�CJR�C1���^P[_�������1G?�h�Y��vQI?����/����N�L��{.���r���3����&�N��������/��n9*�*qB�V���a��w_"q'���$��Ar~����Wf/��]�Q�!�U1��sH���=;^�����4���
P�&��!�1&$mof�����T���$��4�w�������m�3
�d+>x��dI�[�:J& �_
,D-�����Un�R8#��MQ=7t9��%���`M��Hx��t���p������H���Q�"�"#��o��
���~� ����x CD*y�FNeL$��kO�I"��������m��6�6(U�%��bm����	��M���&����
�����`��xR�X��1Ga�����Y�	��d��1�X�1�C�y�@!Z���z7KM�����E�A���"m�� �gp_��GT��$'��iE�Hp�HolY��[,q��=�)b���E�X�z�S��]�������'�����f�)�������+%�@Q��%C?X���v�������9)	:[��O'E{����$��e&�bI�uvFo���
���c��vIn�t������(��?�|��I��(Z�%^�j>���!�5��Y���.���,�}��C���p�\~)�*�1��D�n���{�����������D���n5N�+�q��C�9��M���):�L�M����s|$2�k�#&���byS������H�o=�I��q�f�&��1cJ��$�Zja�=G��K���:��5����X�)���c1��<&����F�>��+�v�1�X�1��yC�r��
��� ���x�
\2	�-��g�[�5���@��l�/b���Y�M�@ N���U�[�\�.�@w�J��l�������� �xi:
�2����N����	JJ�}�����O�����9��'\�
]d�q[x��m�]zj�&]��0^�	�,����Y�h&.��/�����R���:�������h14?I���=��������k��>�+���E1�g���,A�\�����U�V�,��l��������n�~
����Vk��
�m�\.|#������~P���#��|�S�Y+wu�z@���n���Y������|�b���^J��t�8(�e_?,jh���DH�pz+���!v?s����!a��B��7\����Gx�	�����k���]���$����uk��]�6U�v��s�W��= __D�`��J�����$7C�^���h
M���:ooQ�Z�|�53����-��k��8���Yj@��p�^�=����T(�x
��P�.����|�l^<"
�>N?R�)�0�1��e�-l�[��V�A�!U���6�����M���h���y���n�+s���)W�uV���*�3>u�nj���c�����Jf�n����)�Wo������O�����v������|��
�z�d�s�'Z�t4�O�#������\��g�Q��g����2���������������m�k����g���F��3*���u��J���g��l��z'�<�p��z���;{�f������K$�q8�qw���r~"��C����1�
�h�P�e����e�|Uu���8��
o�|r
��H��h�n-6��y��A'������'(H8���A�j��f�+>n��q0�K ���t2�0���]�&Z����X2������X�����N��5EH8�Jq�YP�(��������>��5�^w
XG�<GA[�c�Hzs�X�XDp@#H��]�ay8Z�������%Nt
��4�����q����$b���K���E��;�I�|�1��onr���i��\�X�@b)����_�YO���/��fN�v��������3��;a�J(���7��������h���"y�M#���a������Ia�\���2	�����R��k4���e��!r!�8F��������tC}���u�&���L+���j$d������W�(�s�*Ag
��!�Q����� DLZ�s��Uat��Z��L���<���g��;h�����S��/]Z6X������CL�������������9<;}sr|x��8=x���%�1^�B�X\t��H��6I�|	ak�CZ:��PA�`�X��R'.;�Ab2Z�M��}�DT�:����i��L)�����:����-�]+po<��O*X�G����q���{�����
�<xurt���������>�S{g�>�(�;��:��	�N���o�Z.���N8��!�P�J�H:NJ���5�+��M�4?]����:Hy������;_�| a�wv�+{Ayg��#Y����/7o;H�=`��m�Q�/$��_�V��Q���cfI�Z`�t�-���j�X��:��%�����"���G�[6X�����������K���G�����G��S�#����ouqQ��LN�����E�y�mLv�s?����I%�BQ25���Id�q��`W��F;(��w-e��@9�K������I�Z���`@9�d�6=8~�R\[�J�n��<�������%�%��+����'E_3Nmd����~�Eh�9���|+���Ef�|yi���Z%Kx����}�,;,1��������2v�D�'���;jZ2��-��H���^�_�(rQ�����6�u���/w��'bM;A��<D�zM���.����zt)>T
q�N� �(��r�nc��<x�����go�&n����?�����7���%��>����J������FN�)����=�c��7��D�x��3���or����!�u���|AC��1~f�}`j`6���E���Ql����A{_*xz�o5�zAq�jm�T�m�$ar������/��T��aI����
H�n+�������<7R��em��.��%Kj�=��0�!�����Od:R2O��VKjr:���})��0����L�'�e�����B�u�#];s������Wq���
@2�z�F�t)���{6Qa���x>P�=�E�9th=��U�,HEz��r�M��5����OG������'���a���u'������Qr�aJ%�L�]S��,�g��W��hD�Z�7�^>x��
�������>�F�9����NC��}�T�"'�����V�si`l�����="�&����|��4>�H��O��Y�6�i��;L�����*=0soQ3sie[Z��i�^^��R�=l.��C�>���)���r���wa���H��#�a���l��������dQ�_��� O��xt����U�[��WdE��=q�qB��,p���Y[�!,�����uGT�>�s,������Z'D?C��_]Z����_�q��'S����K��=����;�pr"���h��[:���C_W<u�+t�6M��R��\_�v�����
�����OvOt�s~�Q������Q��w���FHS�^}�s��77�}���_����4� ��*��C����/������-�a^�����!*���Z�/m�AD����s���]�����I���L��g��9�e��a��[9���BZ)�I�{�A!��6�\p���|��oiv���h�>�DWIE�$��TJx������p���P���z���
X-8L����z�v�v��������>���Vh�l��iS��cMpX0�����9|�L%E�M!T�x
�r�65^'��� ���*9��_������4`���{��;�\!���|��?cBG���O^�����a��l�x�``4#D[�}���P6�!ci
�Yc�*�����>�h8�K�8���.��^;�((����j/�
�Q��������q>l�F3���Zw���*������I;+����rc-����C"
6&�dk\���=3_Pe�gh�P����$#�.��p>�����_�YH}Z�R��g!�?������7��O�CR*��tg��c���q�K+�����z#v�/�K��N�#����3�����nw���4:=�t�Bz��<�����.�N/:�>�yst~��0UN���`7�	�j�������4�Zm;�
������������������h4�J=��K�7����� �9MnB���|9������.����>�U4����+0C�����Xq�]��*��BH=�1�u���� �,@3��\2�O���tVH"���B#�Pi}�hc�1�
��T3���z��^��r�N��td�H��k�$�)�����SJ���� Iim��1Ln���>���:��$���R;�qr�kdx�3��WV��Jt_�m�/)�S=WkL7rw��Q��������0�/mn�i*Y��<���V��H����tI�����,�?��L����I<I���[�bz9U�i�����;������;�T�v�/,rI�D���.����������~���Q�h��1!�6j��Q|�X���xj�D	��gT!�K���� ��p���uo���7*���cN.���z&��?���|�_�k0�����n����K������Y���R^�v\��8���:����*��GOOr1Jw�%j����d:����~<:x����Q�������/�qw�
-��l��W���Vk��{�a���Xzsb+;8g�;�~O�Z8zw������>�F�
�[��U���k�!�?��.;�sS�v�sLC���)��O����{�\��U��#����c�r�����j�cL�)��%0�[�bS�#��;�]���;p0��q8��m+/�ig��K+Z�n���~�{V�J{VjCo�lr
W{>�&������l��������N/�o�w��l^?�s��������`r��&��h������K��v�|8=�<>;����((�;>�l��m�Z� ���
���(�������f��)�m��J����������"N���*����m��RO����,TI�?�"{����`�����x�t�gEW��j�~��R�%��}������
����� ���2UF����'�s!��>Lv��%�R������`���E��JN#�0|��r/�G�t�vk��-U��8;��8�3E��5}-:�~{?l6k�v��wE4w��$s<���T��`�I�E���v+�����:��`� Z�]�����!�bZ�?�;z.����.�U��Z�b�A�}�h|+�����AeAB���� �
�a�RZq k2���p��J�*��H�����p�pm���������(1.����)i��I��[iZ��7���Z��#�
.�a/��Ct��U�`N�tPH��0k���l��`>QG(��{���������$k�nMU���(���o�{���5�T�v�R�I�8��\BL���w.��������Z����O�8k��HZ5(�#w�v����������������s�]xw�T�.���j��MP$�~�:���W��s�m��5~���9�G�����j�h>d����v7��w+�P�:C_��7��SY'3����n79�1p��(|��k�n�=�Jte�j[w8��s%z�eo����������wp�m����� �y��|ue|��:Wvy,��_��~�����)�55�o�m�N8-	F&�|i��Z@�R���}��3��0�S/F	�q�v9���a�I]����mxOU#�Zx��`'�{0�i}�8�GY��Fp-
H'����:��0KN�<|���
H%m���cEL(��������`|�K�/�`X������;��~�!�#���j�	.r)�����8�X�z���uV�����������4��MJ���=�������
:W����7U�m�?�}��^�������X���-O����T[%5���m�"�����l+������������?U��&�+�T����� � kr^���j,�B��z�j��@
��������{���d��v�g�]M���v���w������C�����#��
��y������FGB|]V|w������h�_�� ��z�q���[h=4bKv!&��t�$%V����$�p����U��.�ts����?Y_�������;<�������&��i^]v~:8���-�J"5�����G�K�c�V@�hb�p�[��h��'�>qcTR�T���C��]�i�����4���?mX�4�d"G���R=��/"����\-NG��k.x9>ax���vj�a�p�T�L�VL6�k��\�=	����$��M�2�1s`*�c�L��J�-.���R�@�{���c�C
{����c�6��S�5&�	R�v){fQ	�R��!�����5��s������p�N����A�"+�Wp4��/RT�1������YZ���#Na3�x���j�N,]p��9@6�HF��A���p�{�;+UcH]�$�����\������;�����P)c�5�O��V��[���@�/Y��&m���,�~��Y-|��]���)���p���}W���g�GK�����B
M��5����
��|`���� hy�t%���gK%�>l��f���A��+����0�=A���_u�;�K���@�hL��\����7��������o�S�����>A��i�gO�a�+�s|���Y<@j�����/lw�z������J�Wv�82���T�`'����J�����?A��:��Y<��AL5�(g;dI)�����PI��/tvn������6�$�n�s���fQ���2r<&�y/A�A_��3w[A2~��w1J����������0�&��Q=h�&A��U<�
�B2���x�l�/O�A�Enl6���<�	Y7���KA#={�^'&��v�{����o�(�g�����CC+=��������wK��������b�Q���4+�4�X�9�A��e����������+�xr�5����\����
�����!��O]b�8m�i|
�O^+��j&�!��~����0�_��-����#�w��	�#b�k�@Q_�P�D|�����/(��c|`�Rp�^�	_{EH�2oln��$H?�p���q�����������\���
��m�=l�w�R�\	��|~��I��o�.�c0���������R@�(F;!:��]�V��?gQU�n��{��2�x�Q��
�����B�?L�9�:L&�l�,���R
�� ��,�d@7�F�����Xb����1���3M3-7�txqy~|�D �l"����f�!G
��-B������Oo���[��Ol��'��	!����'��C�Y��<�Fj����%�2i?$�����?��t�pZ��f$�����M�'}����K>G�,A�
���[,M���|[X�������q��/P�l���WlQ=WmP�u
�\�
���,&��Y;]�l`�,����7�P�gcB�Zj��
[@�SN��~'�P�)���*�}���X�HP��1`�m<��I������� v[��.�E��1C��M�:�<^���Q=��t�E`�(�E���]�A��9-T>�~-C�V%�#���rVj{��Y�5?���l`&�Z����
&�<b�{���:^j���w�
�Q�wg[����qxG���Wi5�v��9f���7l$��6����'y�F���l��%���fB�������wo��.��:��@wj����4�:7]Qh9��,�X��.�����P��q��X�;���P�Bm]�8q�o��/.NNJ���;���J^k0)��Q��/P{v|Y��v����"����)*�;%�#�,:�K};�d��Cg�m:��J W�:�1�����=�=��lv�H�g ����1m�P�W����X������������}i���So��>��(�|�0T�5dC���������rz���������M�T�������N��qi�L�y�a�&z6�#�-'�(�J2�����U�V�F�z���,�(�O^�Q�%b5�c0��h�Ch�{������DlA�P�Ff���
f�QO�XJl��F
��` �j���8>QN��J��5j�(��#���7�!�: �PVv;[��E���<������\MP.|�^����e�I���x�+������03�q^��xxp�����'�J�~{|�B�Hf�Y)~��_�5lm�(a�%��'d�)e��ez{�P����B�����[lq���&/y�{����M��U�P>�������4���T��3���_,
E���QR�y�|�����v��Zm��i����~<����;v@��e�?��4
$��T
�
��"%��g, �pO�C�7=�t�N�~�����������
�.t������V2�
������Nn]�����3�d_z������5[�;���n��v��{;�#�T3�v�J}[1�6��YA;���������PY�����[��#rO"�S.@�������h���1P���Q�0��a4���
��������<7P0���3o@`%��N�pU=*^�����gz.�=E�+>.���Q�s��/|6g��F�3(!�r�6��?�.^�],�8'���
�@s��T������`��JYqBE00k���;-'��.����4H�����G��~�z�#�����!
�L�hak;�H��A�*�j�s[�����+�������y��l�)�����V,g_M��Ir:�%G��t<�� ��h�K� =i)����l�r�!3��J)��-�n8m)�&������]�6|�+p[*e�]x'�|�%�H�a�+�*��{crd���������y����<S�g2��@4�"B�����3������������|�%(X<z��A�7S}xZp���������=�.���K,25����`*}`���D3��c���q�Oz��}2��ap���
f�
)"�����9��v�f���z#�S	E�����D��nT�a�uPb���$�QN��K���������kj��8[���Q��S(���>������/;jk;�����G�G�������@U]m���zp�\�6({+a��p���Je@�jnc�o�Qnk�����k#�0�n�6�kkt���������	��%��J�&&����M�T���p��dV�BS4Dp��0��[��@~��./�K3(�����@%��eu�����_��]�#��}r�6�1����E��������f���I|�@^�5(��������0�0���X��[Hq��!s�S{h�T�.��J�,:Y�rPwL���k[u�\+��'m�K� KJ���T��l-xM!N��
U�D�&�<0�8z�ce��	��Z0o�S��{BC���`����.����b������J]A�{�k����*����>���(!���Fh77o�<����F��C��{��W��������/k�<��f�Xx�^z����p4���O�K����&^J�#b��V���7n�v����~<9�����`��u�KBtq>G�z�L)�tX�Z�����+����Wi4�r������#�K �j*�Q����(�z0���\�� ��S������K$%k/\�1����L����7��S�L����sg���Tp��cT#+��^R�D8���jy���:�b)bX�)!�J��S�Y�P4���[�� �m>d#�#v ��P�o�f�a��t�'����{3��Z�������z���Z\������+�$Q�3q:
:ST���E��0d�J��I�m!���6��Z���=(���q{�6Y��>���kbe������&f��v�8!�#�;�=I�����b<bH4 ���m��#���.B���J������G�O���x�SJ���\e?
{�K���������X��1
��d�K��Y[����_�||��������/f���.��6�e�Z-����
�����#�5�re%���%L��nlnj@��*�p��p���jP�'��*
n7�a�t�D�h��6M���b�C]<S8�(�\�]��%Q�U�������1����6����K����*}+�L�r@a��{w�j��A����g<V�3�Bt��wt�dU�I�{��T3�Z�+��x6��,��=$���z3��V�0Tet�tX��^R:G����������c�#�	$b��������K9�����M`I �b�&� e7���>nj�$�]:j�%HA�V���r��1��������m8�I~�<��� 7���`G���D)�qNX��QX#�_'*d������b��(����i�����M#�����1��s�%a�R@������O��?�p�uE}x9]�����{��i�s�����P8@<�mb7������g�M�l��t/���A�X��$���AEH�#6��.$�o}���O�R3������N|���r.Qg����M0����$�&�h�B9�E��+�]�v��6���4f���E��Z���<���\:U���?��tJ� ��s#!Ms
���x�9&��(��4���5�x���D(2�P�n.4��]����p<U��GE�M'�4A���_2�JthGKv�a���s����B�[�M'�����hhVQ��?�-���|���<��|������N��!b��f9i�o���)���������g�x����w=�9fK@�\?�����Lk#�i���4��Bi��X!qP�I-����������G@Q�$���NC�{������6=�R���a)egl�����$������S~�i�C�Ykg[����z
>��1�d�����Lf���GEfc�����$:���R-liC}	�<��<*������iLk�=:�y�-���
xX1C�0�.9/6S���a%�,7�I���#���"ik�W��Z��Y�
���|�cr��'s�p�S����3t|����������������������k�J5F��&^��S4)�8O	�5��c��|��v�4�sy��������<�6D?���f7���n����K�����Q���V'�bt��x��� bz5���H�ZBR�w�������]{v��������hk���=����h?�*
l�������a���b�@���%Lg$���������p2��r�f
�D	����c�[��6dJ>�h
���Hw����u�B�O���{����_������^�)��v����Yl���)���z��!�����_�s��H	��A���PIn�������VxU��w�{���1�N�$�IO���o��R*���PJ����#�PZ-�0K�L.::!Xt�`ftXH)RF��@��2���!EQ��9��$���:'�^"��B;A��l��L"�~�obg���Ex��?F+��k@e��1��jm���p�u�0�:b0+:�����q�l3�'8sU��F��4Ln �G�#H�a'�J����F.����(��?�i��lL}�����t��Y0%T�t�;��Hj���H��X��!��A2����!�b�*��?���pD�$�61�9 !X�tj\�X`��(sA�*Cb>BPy��n�����,�i��Y�.��l�3d
\d��-��V�8�N�l�FP����,`J:���\�=q����3��sB�`d��;���h"���Y)`�-Z�LYe4F�Y@��O�:�mq��U��:3Q�����v�=9������0�b��d4���uV��
���K�p������HO��;Nu�U������oi���/�u��C�����v�Q�8�����f�C}�U�!�/��{�?�_��i�G4���)z���v<���R��6k7���h�x
8�f?*��U<KJz`\]��kP�*�U4�:(�`����L�0L>�	����+������G+M�B�)��hf�z��^�RcFu��#�Q��A��Rp�f��ij���3B�>�B�u\��1�C;"��;���u�����k���9}�+�
���*��u���\�P�=RP�U`W�������
�������K�r��r��|�W*���4���bO�iT��m��b�C���L
��[�RpR �x�m����b�����K�����f��,��Cw_BH�K=�����*��y�	���7�[���@��%x��ZYHw���s�C|kL��������[y	�mS}�!���}����a��?�;>i[�__�E��P5�k�����j���]��Rw	_��."�d��O��{�,�t��.Y9Yf�J:������)i�JHR8�
zIjZ�w	M	(�R������a����d�u��|���a	���?R�K�������0�5sa8q�V=Y�����5v�L�|�OJ������*����A{t�����t��!NU.����V�?it���_�u���pj��)qYF*w�&�,*k��.�k��[�xR��M�Y����@c��Qu�����lO-F���GB��m�.#K��0u�r�bm��G9y�� q=x��������!4M?E� K"��-"���w~xo�"/&���'�K�	G�6d�'0������-��l��U�{O��}�}��[�[`a����wm��`�$���;����	����h4��x>��Kv?fs;GF
�CqB���lzv��Z	�M��L��c���`A���������CH����^d�����	1��{X(����g���=��e�&���%7=i��T�]x�d�2�+|^\V{	yS��2��")IGf���s��{j�vsd�02�	���id����{�x)m�A��xz��35�nwY~ o�} �%,���QIP���+_�^�s�L��r��S��1Csg�v����v��0�]N��� �?4�?4��Aa5w���~��7���&�O���N�����4;;��DD�_h5��� 7Yn�a`�v[b�i1��cL��-��oj:l�P(k���+~�����X�Q����=Q�����:}b19�Z�����UDO9�����%�~_�M�7��������`|����wn�%D�?�,�mhC=4b�������@K������T)���n��n����
������>F&B:a\�Gj�������L����yo�<%d��g��7�D��??{{~tq�k����t��m/����Mx��[i��`u��,��s��"�7�7����?L�+��}+Y.vV0�.��s������jd&�����&�����-���U����>�
To?�|��<���_�*�G���U�}-2/��l�x%����������e�fSl����y8�jJ�y��S?s5���bp&ui�`�_�#x#���~I�?A�.�b�#�2�&�j����n4r��O����'�G/ �G��$M?X/K����[�~X��\]r;,U>s�����������&Z8���TS��O�e{Zlx���:F��ww��L�������]���f���23��
���������?��[�"��������O������e�������Au���Nv;�F���8/�=���T\T.�1s~!:���-��
�S����)�4��H�Q���c���@XP�������	�a�(�t6��(F�|e�!$F>���k�����������c
�cl�6W�]%@����x�*c��k5Q���mB�'�~�[�_�)l&���"4�TS��-829�xc���Z�}����F�����D�F�{�:����W���b�Q���aoA��.:'g�����oY�is�O��F�������IU��F�|����������/]�?���X���������"�Ba�S�/J��oP����~�j�����A��������?lP�6��H����"�*>�oCN�nP��n�z9Wa��3�^4��!1$T��*�lJcUp*G�'���YY8����@���F��cf)~���ls��P�����������R�.S��f:�5�e���[����2)mK��g��U4Z�?Q�lVZ��#<S���Tj�`�(�,x��al�X4OoQ���N37Q�����m^�f�C�&U���,�������;����x�tS�:G���,ov����^���-p�,�e���h{�Qi�+��mU�9��+i�"�a{�������zt��u��������[
���u!��I�[���2v���e-�����n��5Gfd�b���7�'[0���J��\n���q���tI��
�IZ�x�j7�a�K�=�����4�����L��z}{!3���x��\�/��:=J�<���j�n/�����+Ev�������{�{&l��%�0c`��$r�q-0���
`�a
:P�E5���*�1������@z2��R��!U��2QX��6b��a�6b8���{��y���I�k�xy�S�<J���z5��?���|�Yh��dS�W^3��*���E�����Y��-� ���i���~����TD��������3
|����De+&z����Q����k�D>�m��V�G�5��<9��*q@���<����.���`n����w���������J���V�C3��+}|zqt~��DV�K�����J\�s99����Q��Sx_n�6(=�������6UZi7������j������Y�/DSOk�.[��"�nS��T6��8�����������zq���\E��H�_4�g��m���n�A[��z��{��X��o�����b��.��Z���Ou���9
E9N�-�W�~W��X��������k��&���S��&���.��MicE�'�YPf�
�A-)����(1�Y���p��7����@<�[�Kp�N����-.��$���A���Q/�V�:����=� 3���������Y�V���b��|�1�����`�G3Z�%A��\����VnYL��%�y�YT&���
�`9{�M��F�!��������W������$�[�:��6��P�6����uZvj�D�(���_u@%�/a����
�Blh��q��Mj�$�3����HO����DhR'�V�5L��E}���vJ��MK����u�o����`���~Hd6s���)�C��Do��f|��z�/rJV'��a�0���O$�~/��^c�S�{B��5�/���qd�fse�F������Z�h,����hz���[�3�PD#���6S?�H�:�nd�5��@��]�#H��b�����<m9���.���P ���XQ+�]��x,7��t0�j��b�)@���.8N.��hp�Z=^�6���jVf|6����bp�-_���j<�jH��z�(V`�<���Z�}�������Kl�g��Lq�Q�b�fc50���'S1�X'�g��
�TtT*��K�2WBE�&��}�|enPg���RckB��$8�+bpv����S)�vg�p0�7� ��������lV�Mp����H~����YD����Wv��p��+����f7x����%@y��7���Z�^I�Y������f����/d��� J�S�l�Bn%�2���vo�os|D=�aaW�c���F9���lZ��5���/�������C���
uk.����q|]z}����z%��
9���������

�M�t�&M���]������G~^�0��<�~��������h��(WmFg%T��gV�N�}f}�VeX_��[i����=���P�N ���}@����"�P�V������*��|�&�+��j��D������E>O(eL�p��������W��{<�4��mV����7��(���^&��
����f<�o���e���Ux0E�������k�e)N_�/��z��eTQ�J����t��i������7p�� ��K\-��E�O(�N}��|:'�������KV�d�8@M\��8
m'7����)Y�,��������E`3g)T
�e����������W�3c�D���7��%E��f��n��^0������6�3����xg�q�t���`��*��������4��zb�5�	�2m?�S�>!�/�e��U�������Lrwtx��Wu�T���\%k?T3j7V�Y�IN��4�DG'7���:g��+�$����i�~�]1g�w��L}\./NB���f���?�SRtH'k��1\ �%�E�Qdr:���^�I�D����0�r�4����!�A��Jq�����91�
��MnL����b~���V�r<���Z<^�*��q�����G= l} *��������/��t�`�u��C�]�i�v:S��z:&#��+� =A��|��%����:���(:v(��9fr����9�H�S�i�,�M��T�!��\x�/
�~���yI�`�Ba\����X���{��Sl|_C1bl[A�zK�\_x��}�u��-P����7|��ZxT@���C��dv���{�9��Z�p�D�f�
z�Os�B���������yz.�i-'��F���4�����2��a�f���v��]:�a�����M��m$����Q��{��
�5�|r��&uuW�z��������O�\p^��^/_�b�K��!�X���hZ{;_��B����������c�vu��f����:��S���(v��)y�R�����w0*1�e1FI�XC�+�u�I���(��w��/�x�����2����x6
�����P��5��0��G>M�XF�Y���}�@<n�V�l���]{����j��<���w��~��i�?�#��H>����P���m�u
�i��������^�����bpK	Sg@+`�����8�+�l��9W��2Ki����'��#���^:���3����	�C�����'X[���k�`f�����F�6��[q��:���.P�m�-�Z���W��X����6���J�f�I����w�&G`���F+��}N�B���z/Y"^+1,�%���1���"�R:���z����30��������.��]�����n�+���^#�u���K'�.O���G%f�p4��'��YD�w���+�^�Sc���R� ���I�-:��7=mf���v��/(�P�'�~K�U��/G}oj��9H|�T�"4��4Q2�� �#����z���g#��R����x@v���g������Ut?V���:��JQ�E5��eG��
���|�y���	�8;���,��"�_�{��!�:�\��h.�JY���&��@C���HVg|�G���C������j*,Zm@���m�������s�m�3;���%�K/�79���jy33�%��{�)A�e`J;���������z����P�T�e�z�n��F�|�/�e/yA�:���.V�r������tw{"�6���T����^�K3�5�V��/+�l����Y��t�5��,�U���� so�wf��T����s������p����L>u9R6)0jNj���D����n���H\s��#)�`���=�8�SX�^4��8S�^�e���m��2sr����+����k7�z|8(��"x�����v�-_��YT�����g����J��x�zW���
���YK�]��h����Zm��j5w�A}{��j�W���^���������a����8������pl������/���jcr��Sgh�b����_�����K
��G2�������a4������O�q�p����`<����Y��FI�^R���z��Ul��v��[�5�����(�����:����$\H�Z����������m�G���#V���^��lpgtL���
x<���i�2�60���S>f�j�4qn�"�+M��no���FO4���VmpE��T��O"�(�mu2/��d~x2������������g������v������������c����l�^����M/>���V��j��-�b��s��G��05��v�q`G����>%CM����� $��r2g��2��{��|>�����XL��g0Vg[�[y��R�T��6�����g~�"����x!"$%Xph�mI��<
ye.W�o�G�����Z�ivF�n���������~���^���h>,�i�70����bj�&X����(R���H����?���UO���H-�j���O)��p<����������m�����j�������9	����m<Bp�[��O���7��j��^�a�.�i���PI�J�������������Q��xx�����4�Q4�%KO"+;�b�[_������B'����<��M��Xm���m�j<VGpN�����Fsi��a�!����Xe���i<\���|�S���&� ��SdA�#�����:������m>�*�m���1Et�%J ��4����(e����o�����{��~�FW����R�I?��`t2�-�~�������~�-�G
���/�c�d
4z6Q��oJ�a<��$��:��3����UR�6���)Lu��q�7�SZ�k���|��t����\=r�8E����w������s���jN`���n�]����Rj�lP9���h��k)�!�b)�1��J��dn��$������i�$���z��"��v��n��$-�,�ki�"z�]A�M�����R���n��E��pD�
������n���l}� ����
~����4��B��}C�~���fm��1l���y����28������E������gI�i4����?����2����:��L��k�r1�$J$�9T3��>������w5L9AX�����@���R�����P	)'�%+m�<�$����,��/���V�	��P��n2G�:8�c�r���Q�j���W�[Q�����>���1l[)Q��%����lC\E�~�S�4�����3'�:l*F{hG��w�~�@�B���k��.����Q��]A�FtX������W|s�k�b����a�\�Q_����t2�z/u�tKJRS�t���J�c�v��6$�Y8����BA����+������_�
�����,��Ru�!��}�p
����w��9��B��#�d���hc��9���r�k8��3�A��&x����Q�kT`mR�S�7�~����p��C�[[�#��q��I�����i�i"��N����~�Qo��J+n'�������(N,�����~>?�<����G�
��.8�P������$e����,��w��Vm�6�|�Z�g��0���������<��sN�N���[�'���?����D])�X3X����?�1�Y:@��7Hf�T��+_��6�g�����k��!�w��:������4�T:ez��b�_1������Cs��"{I7��3�M�~�_�Yi6z���~��
���~o�Y1�����`H]U?
� �S8�!�����J1����~
(���f�v!E��t�Ew��`
�Tlp��b�0	
�Q�iyNA���P0��B���v�e�����~&��#���m��Q~���G�b�7�tD�M�`-(^�s�?�vQ����
���K�AH�>1��{y3x���M��Y�N�+=Bu+p�L��A1��L��i��S��N-8��L"�o����E������ �� ���B;9��j��L���:h���[�����������i!2|���	�f�
���8
���!��&����''�����_�=h�K����99~vw�3�/��x{8R^�A5���������iIGe��k��t�������v�������}���k��������uos�r7��������i�m�������szv���$�@c��s���[+sP�NR������B~/�M^/�8�]��8��i��������s���e�
/Y��~���T��|�^�,Q'������O����>���o����^�w�K.(�����~�.Vp��xN����j��}��	���a��&52V�<�c��h�������
V����Q� 7���WO��:�0���+m�
�v�a[E�^����Z�C�;�/�&�:�1,��������"W�(q��`��K8M�)�����S�Un�._�\���!�L����<=��� �f��s�P
��~�jy�?��A��T[���������Q(m�l��O�B�O�h
�f_0�T���N���p'����a"��t�aIc�����������C������f-+0�jvqq������%�rg�+����O���m���&+������Oe�|*-�\f(����
�(��q�`t����y�������3�M%�:��p��7Lu�#�`���:���p�V���1��HM\������j����W����^�*����8��Q�t��Z�5vv����%�=��Wo1�����Gt����@��*��������J��q�i�wP�HXJ����q�=��bFJIn�C@��V�8>�;f����nX�t?s�r�m�@@�7a�D���O\�����L����j���u�{A���-"@]�G�(kYn��<�O�:���j�(���.!P+���z�U�w�������n:�(ut����xdfr~t��o5e�{�5����wM&���@���)�6�F���Q2hKFK����q7�h1�%����B��
�)��*"��cM�(�6�]5�'k}��nG9@O������W��>�8xuB������������QY	�8���#�����G��Q�����PS_��8���q<2M������qY�����7-��X���k��S�X��EjC�w�\w�8g�Q�{�I��F7jwk���F�����I��|)����.��S}�����D	������e����
J�N����f�	��q8
��F��x�����������E]�`����>�8</m`��:���y0"��:-�Q�=d����Cv,j�j,�`�����h����Z6���� �O�`c#�@pc��#�=�������1v�x�j<���	����ic��H��g�kf��Y�>4��`���6��hV�M�����c����������j����+<W���[M�d	���
� �����>cbh�uV!3?0��
�r+(�������e��/|�����Jw������T~�'3|�
q��`���x�u������]��������&&��n����rHx0�yg6�aM���������N�����o�={���^v~��l4����7�oy{I�PC#*�u������9�&����
�uY�c��=���������/=���p��-��.�1e�e�u5�[M����p�k�����S���/R�-	��!��V���j��(aw{��}���'%O�]E��=i����\b�hj&0��q��?��5���;S��L�7�����JxWlW�ej*�]��L%���{'+=b*������4��^�\�Q{zv�s��G�������VS��T`��w-���&������eCg����)SV?�s���]�E�P�7�Ln�=LAi�&Nc���D�4?m������v����=�yi��_X���6��Kl�{��9<�\<���5�:�.���v=��+u�� �����s���"1j2�_����@��k��xL(f�
����u4�s{�����{��+�<~}(�EQ�T���h��A��*i�K���s����{�M��C�\C/}?�cg��^��/Yk��a'�$�18��l��%��q�O]xp�-����3
���R���-��l��4�Z��F{B"�=xl)�6d`�`��T0$e<U�e��2t~�L��Bhd�lN��L$�����I����������u���,qbe�%���`����Q���;�����B-2���VVo�`���]�<F�����g������s���y�����R
����a*�2��'�����%iwo���l�jW��~=
�I[?��h��\��o!��F&0�'`@��|6gN�Ax����W�<������n{�����n{~f�*O�;x���v
 �``Z���Y��K�S�V�DM
h>���0V_N�D@��;%��������X���l����eB��%�}���|���3�����a�m���zc�Vk�u��a;�z��dI��H���{��G�G3����?��O�tB��[��]�e�jse�
>����T�e���^n|<��:����O/���IfI�ZxS*�(��R������\�sM�~N��L���9�5*���-y/�}j���t��J�������	:*D� �,$xj y���~�RbH{��o�/p3{�XL���|�	�ye���a���)X��v����r�9�{��Jz
�{��\	m���#��{��`W�E#.����/�'Hh��pr�D�~��;�4�M���<|+%��,��8�*d$Q�[BL���Z�TX�U��,�uc^N�w*�d�h{�����+y���|
��P�8t������[�����k�����+����4[eMm^�����y5!;�J�X��|��\Ze 4-+�������$���7�4u$hR���"�T�(��(R�$�,�8�ZBP�B�R\�+��9k�)%�j�{�V�]�l���~���Ff�sH/�����|�%���"}B����0�!HH8^�a�0'ASJz����bm���o�-����j.y�zq�P�5��Z�.�n���4�a������Z�v3+�����^��������a�t����]���Hu�P�j��jK6��P�5_��'����L]�tl�_�u;X<���W���p��;�p�T��E9������������_���35���X��Fh6�?��UX�P�q5���2�i��H��k#�E������D�~�H#VK�3Q�H-����5�y���1��/+���s$�����q�^�<G.���ha������/c�9�'��LM�F],T����Zx��J] �������`����R�����>�y�q^r�]�
�#O��qG�Q_�\���+�^��Y�7NNt�R�K/k|�b	Id�%�����k���6����T���]�@�&�W��B���wIx��0��n��O=���C5�>F�������P��)����^}I��Jt�L��ix�Qt��"�5_�p1V8�hP�	x�+ bUkBr��`�`��H8~MK��)�
�O�?d�����x���pnJ��e2=*k�8�,��n�(m�<�c�W�A(f/����������yO!����a���EC�(�&� ����{�d��O����Lo�w�wi3�t�-�3}!�%D<��WdRHS^��K����n�MXd*�#KqYC[6Z��.��$pwr�VJ-���;�0j�j��^ww{�^�� {e�P�.���h���z�3�Q�{=Ht~�#�S�_���|6(����m��/��O5x����J�4�_������.�4h�v�8{��]�����O��?�tT����^��5�\��������=�����}+r���3X����[{�����o��w�+��U/P��o�/�����
�u��L��o��$���m��lS���_�7(g@�;>�
�Q/�C���r�#�B%|4�A�������X���������x��^?��py�]�;�������"rUl��X�����GP���-������\�y�Jyc�3���(�*�xXl�sM(3�-���<�)��tbP����x:M�:E��KS�j8�R)�b�C���~<�h|�	)��������l��/��a���lh	G[�������P@��+~%��)M���^=w��w��]u)�k��?3g?�0�-P�?~�\�1���u��v���1�a��z��YS]�K~�}�'u=~E�I��� �&�����R��l\�9��<��5���?\��!���P%a�3V����h�_�~���_{�e�#�"�t�fMFu�Yxh�l^P����:��������=���#�_<<�N-&�N%��AD����6�����@C��hZ@\Y��{BI-��t4Hj��?�;'�V���'����+�9W�^�Quw��G1�R���+�$�J`��,����A�?|�A��FJ	��eo�=�*�y/�BJOP���1U�W\����k(���`U�`�FW�/���K�u�#R�������6����#���X�B��N���B}O<���4��7���7���&s4B�(
RU>����.�xv�8D~�j~`?ZUH�!?}A���]?l�4��[��P���r�ZsS����y
��1�2������9�N���5�S��P����n�gX�M�Z�#
����"V�
�fmt$|vr�_����.��U���eT�
-����t�;g>��c#����Xt��{g����KrB|C�'�/Q]@�g���?�h����4���,��3�������EM�Td�T	��M`i����������a!B�	�W��`�$��	q^8k3�FR��s5�c.7������L���lKOx.�����@t[�z��Vk^�������>Kr�
�X�J�X��vZ�=D���Fq��FfJ[������+1C��W�6$0;�mZm#���Rq�����>���+��Y�<t�������+Ywi����a2{�������&�^���`��#�#1�K[�6�����?���iD���2�0�v��-I[�!S����z�������I<��c�n\�"�w�HS�ur��t�EC�e���g���F�������w��� ��!-��l����8pi�1_�������s���2%��f?��p�������x8A����8��-|Z�l[��0-�\
�@I�(]|�w
aA2����/B�
[{aK<�P
�`��Y�q��/�;9���E�J��	�E�S����L�P�<y�u�����+a=�	�8>����)+L1�	��k{ �
d�9�����~T�v�s�7}!���M�9��@W�N;�Q�e�x�c�������8DL��}�vq��C������ToU��1"�5���0(�*U�x4���vD�fSg��/M�fu�a�����R�@_(2��\�g��2uGRG�D/�fjz�G���m���Y���������,8��c�}���� �P��b��P��k@�pN���{�!���i��k��E^u��E�x�}H��~&2�,5c�3��~�4���O���e�6��<2�hH��Q�P�	l��\D,�y��]��/�
w��w�HK��Y�8.��X�j�����j��@����)o����������#w���f��;u{�[9��z��.88??�[A�[�u&�e�Q=-b���i6�Z-��7Z����|��n��y�t#�����2�r5k6A�N@2��	�-3,��9�g����L��=v�]���9�h�N��s>���g|�l�s��Y�c*�R��h�A��]�wJ�U�le�&%�\���}�~�� ���4i\O>���L�{����v�V���W�V�=, p��B�[����Rn���4��(��MX�y4�O�d�:a|4�����Y�2~�g7����(S����[���B
vh�rM6�/����9�L�n�"�,�.���_���z�,�p�a��'���������uR_�� p��J������:TK���C��	�)j�����~���T�a�t5P��N|�8#uI4+�l22�GB:�.�E�G�_/HN"HIn%'��.U0��������x��A�h��D6xj
�MH{�$�2��u�z�V�'f��y+!��_���]A����w�,f��\�� `��>��@�m��3�����B������s�N�;��������1��'G�o/P&2����|��V��W����jT~�y�����Z��(-�$C�A9��W�D77���3S�F��0t�E�H�Tw>����w
��zS|��{�P�7{�7!��+$e��d-h�1��)pD�O>��lQ�<Ba#j�%5�M��@�g%g����ily�X �Q��W��vRRT�{]f���
H�G���lYY-������Ne��?98����D��|'h�\}�j.����0@^zq������������Z}g�����9hq�����1H�������#N:�Q��*kV�L���n�&cIB%��L�pC�w� W��{�~��Z�y�|�
g�v�bm��_|���2q��>��`nU8� �8S)�?�`;��"Y3������pGr0��3�	M�H#��2_H�
qrp���>�Z-,�4�9�Y?�BilH��y���a0��{"��|���hw������V��]P�4�|��3-P;�Fc7��3�Z���S�����%�%�]�BK�����qn���w���-�����>�6�d��#<��@�T@_D�^�w�����~��@Q�t�����jqh��+�)	��.���T��>��/+��yRV
�q�M��N�L�nLA�Z�^�z����`f7�LZzb��M9�n��Q��z�MJ��<��Q����C#��(0�zQ���R51<1�(���3"IwrC��c�]7�@��8�$0��Q�����3�?�)����'�x��
��B�6���$
���Q�H)��q8�$Y�bt��=���0�V�-�n(���
j���JE
���	a4�#Z�[�)H@hn(����Q\�Z�a�<�K=������l��U�d/���B�����a���0�}
����W�����O)���YD�a|�D.%�����p����"3�\�P������yw�	�v�p���!�������e&���`�����f���������H����l,�g&��<C���9.��u�'�����K�4��4����*��/CuF����Qu�2�[-���]J�L��E����Bhm>u%���d�io�Z5��4��$�D���FN%�K0(f�'����n.-�[��"%a�o8O�Z�f��8r/��5NZ�p�>d���&�@V�AJ���hU��REc���~��q�N�E��L�j��}��T������� h���@1N���m��z�����k�������X�sx������k��ar�B� IW����( ?��A�A���4��u,1�~�<���mj��N@'IU��3���H�O�����"�X�u*�y�~@|�f����8����L�L�CeY���K"�|u$����$����T������5�v�M%]�t����2�R��-�{���0Qc��d�Q�NF�����I@���!�����D���0`�Fp��D�u��8�Q~����9���`�y�,B�})k��h�T�i��^�S�k(�$�	:pI��.�~]y��}��Oj�-(I�~������4����^����LFCf��A�z��\`���	�
<��A��wU���w�v���Z�*�j��=6�e�_����v��O�/���u��_������e0�\�u��;�1�*���?����B���������NA8�$�e�0�6�|�/��O��7�g��RTki3����0
^l��d�
_}n�
�}��f�\r�_����'Go.�����q�ykE��P���1x��L�
�0V��_}������8�'�	��*���(��!�?N*��������.��������es �>34d>LIR���n��*���D��������j���	�
"�*��ud�]H%r*w!�+�U�"�+^dE���y�E/�t���.{aSV$�M^������*iJ
����?�)�g�������2O������8���NH��!�
���`w�l���:S����:�)�>����-�C��s��_N��4U����n�2�\a���t��v�f��)�����l����S]pV�'�1�*�-�\y>.�)�
R�j�4.���6]��a�7&��-��������}X*4J#	�������t�v��[EeF��2�Q7�c^il����C���������9�TJ(����z5����z�y�������~��`5�~v���5�/�.U�$XG�>�?�%���R]�_��D`�~�y�U���}�'�^�����������@�g)�y�9����41/n�����w����?�/th���^��E���`���X�����������k�������@�����?"��G����'���#"i>zT�O��r�R6�����W�.���z9�m�uz��B3�����-]�[AI����������kN��X>�r�j9N����D����K�h�
E���\��|�	n�������C�?v�G��Z�������6 �Aj(�j�����t
�2��z}[�O���v0p =���G��~����T;@_l�vP6��4T��j�P���]��-^d�@�f��G(b�}����9�f�����[�_�g�{�����%E�`)�Q�7���wZB{����	�������zJ��ql���j����J�IOA6nQ�j=u�������0��P[zmr?���zC�(�_c;�a����w��^�i��TC�o�C"�G=�����#��}��P�G=�x����^������P�
O��tz�W{�q��������z�Q�V��x�
O?��|�i��Ek�G�Z�+����h��(Zk<�������h��(Zk<�������h��(Zk<�����5+��Z��HZ�R$����Y�,2����W�����<���1F;Jg�DW%X��fi|�9�B[�����������Q@t��� �4�/}��%���'m�dW~��������F�~Q��&�
�W���,��vE�t;�b-^��K�d���m�l�_��5�oM�[K���k��v�o{�7YF�h��.k�����M�n���6m��o��7�S(��o�m��~G]��B]���B]����a����������h�w4�;�
���~GC�������h�w4�;�����7M�������&����M�GK�GK��~GK�������h�w��;Z�-��=�=��V;z�v�Z�����k���jG���^�6���+�R�65��)��l��d}L����M��������%,���;����8�+h���\�S���$"�T�����d&&���G��V�y�~����^����R
�tl�<g�!{���h`(xB���|����d\�6�s
���zj&��J�n�� \m��}7\h�1�U��&�R�#���7�[��w���j�e��{�_A=����J�"�A
0�O�������W�����g#5}m�q�*h`��������`lZ����X�7.�����l�{���	�e�7Y����]���*O��Mmp[p��-q�fQ+�a�?b3{��3����%.#�����fs���l���vww��w�[��C��D���Y��c�����!�wby�V~��k��a�8�����~�9������n�l_��jW����+O�e~9�~�
y��8U�W���0GqWAg|!�z����|�K�����O��
�����_���{��%����Z;��~�{�"~7����)7���S���b}�$�z�S������(]E-Q!UeU���D��E������,��

#43

Craig Ringer

craig@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#36)

Re: Proposal for CSN based snapshots

On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:

Yeah. To recap, the failure mode is that if the master crashes and
restarts, the transaction becomes visible in the master even though it
was never replicated.

Wouldn't another pg_clog bit for the transaction be able to sort that out?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Craig Ringer (#43)

Re: Proposal for CSN based snapshots

On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:

Yeah. To recap, the failure mode is that if the master crashes and
restarts, the transaction becomes visible in the master even though it
was never replicated.

Wouldn't another pg_clog bit for the transaction be able to sort that out?

How?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Craig Ringer

craig@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#44)

Re: Proposal for CSN based snapshots

On 06/18/2014 12:41 AM, Robert Haas wrote:

On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:

Yeah. To recap, the failure mode is that if the master crashes and
restarts, the transaction becomes visible in the master even though it
was never replicated.

Wouldn't another pg_clog bit for the transaction be able to sort that out?

How?

A flag to indicate that the tx is locally committed but hasn't been
confirmed by a streaming synchronous replica, so it must not become
visible until the replica confirms it or SR is disabled.

Then scan pg_clog on start / replica connect and ask the replica to
confirm local commit for those tx's.

No?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

Robert Haas

robertmhaas@gmail.com

over 11 years ago

In reply to: Craig Ringer (#45)

Re: Proposal for CSN based snapshots

On Tue, Jun 17, 2014 at 9:00 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 06/18/2014 12:41 AM, Robert Haas wrote:

On Mon, Jun 16, 2014 at 12:58 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 05/30/2014 11:14 PM, Heikki Linnakangas wrote:

Yeah. To recap, the failure mode is that if the master crashes and
restarts, the transaction becomes visible in the master even though it
was never replicated.

Wouldn't another pg_clog bit for the transaction be able to sort that out?

How?

A flag to indicate that the tx is locally committed but hasn't been
confirmed by a streaming synchronous replica, so it must not become
visible until the replica confirms it or SR is disabled.

Then scan pg_clog on start / replica connect and ask the replica to
confirm local commit for those tx's.

No?

No. Otherwise, one of those bits could get changed after a backend
takes a snapshot and before it finishes using it - so that the
transaction snapshot is in effect changing underneath it. You could
avoid that by memorizing the contents of CLOG when taking a snapshot,
but that would defeat the whole purpose of CSN-based snapshots, which
is to make the small and fixed-size.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Craig Ringer

craig@2ndquadrant.com

over 11 years ago

In reply to: Robert Haas (#46)

Re: Proposal for CSN based snapshots

On 06/18/2014 09:12 PM, Robert Haas wrote:

No. Otherwise, one of those bits could get changed after a backend
takes a snapshot and before it finishes using it - so that the
transaction snapshot is in effect changing underneath it. You could
avoid that by memorizing the contents of CLOG when taking a snapshot,
but that would defeat the whole purpose of CSN-based snapshots, which
is to make the small and fixed-size.

Ah.

Thankyou. I appreciate the explanation.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Amit Langote

amitlangote09@gmail.com

over 11 years ago

In reply to: Heikki Linnakangas (#42)

Re: Proposal for CSN based snapshots

Hi,

On Fri, Jun 13, 2014 at 7:24 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

Yeah, that seems like a better design, after all.

Attached is a new patch. It now keeps the current pg_clog unchanged, but
adds a new pg_csnlog besides it. pg_csnlog is more similar to pg_subtrans
than pg_clog: it's not WAL-logged, is reset at startup, and segments older
than GlobalXmin can be truncated.

This addresses the disk space consumption, and simplifies pg_upgrade.

There are no other significant changes in this new version, so it's still
very much WIP. But please take a look!

Thanks for working on this important patch. I know this patch is still
largely a WIP but I would like to report an observation.

I applied this patch and did a few pgbench runs with 32 clients (this
is on a not so powerful VM, by the way) . Perhaps you suspect such a
thing already but I observed a relatively larger percentage of time
being spent in XLogInsert().

Perhaps XLogCtlInsert.insertpos_lck contention via GetXLogInsertRecPtr()?

--
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Simon Riggs

simon@2ndQuadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#17)

Re: Proposal for CSN based snapshots

On 12 May 2014 17:14, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 05/12/2014 06:26 PM, Andres Freund wrote:

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?

Ah, no. Even today, a subxid isn't marked as sub-committed, until you commit
the top-level transaction. The sub-commit state is a very transient state
during the commit process, used to make the commit of the sub-transactions
and the top-level transaction appear atomic. The commit-in-progress state
would be a similarly short-lived state. You mark the subxids and the top xid
as commit-in-progress just before the XLogInsert() of the commit record, and
you replace them with the real LSNs right after XLogInsert().

My understanding is that we aim to speed up the rate of Snapshot
reads. Which may make it feasible to store autonomous transactions in
shared memory, hopefully DSM.

The above mechanism sounds like it might slow down transaction commit.
Could we prototype that and measure the speed?

More generally, do we have any explicit performance goals or
acceptance criteria for this patch?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Alvaro Herrera

alvherre@2ndquadrant.com

over 11 years ago

In reply to: Heikki Linnakangas (#42)

1 attachment(s)

Re: Proposal for CSN based snapshots

Heikki Linnakangas wrote:

Attached is a new patch. It now keeps the current pg_clog unchanged,
but adds a new pg_csnlog besides it. pg_csnlog is more similar to
pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup,
and segments older than GlobalXmin can be truncated.

This addresses the disk space consumption, and simplifies pg_upgrade.

There are no other significant changes in this new version, so it's
still very much WIP. But please take a look!

Thanks. I've been looking at this. It needs a rebase; I had to apply
to commit 89cf2d52030 (Wed Jul 2 13:11:05 2014 -0400) because of
conflicts with the commit after that; it applies with some fuzz. I read
the discussion in this thread and the README, to try to understand how
this is supposed to work. It looks pretty good.

One thing not quite clear to me is the now-static RecentXmin,
GetRecentXmin(), AdvanceGlobalRecentXmin, and the like. I found the
whole thing pretty confusing. I noticed that ShmemVariableCache->recentXmin
is read without a lock in GetSnapshotData(), but exclusive ProcArrayLock
is taken to write to it in GetRecentXmin(). I think we can write it
without a lock also, since we assume that storing an xid is atomic.
With that change, I think you can make the acquisition of ProcArray Lock
to walk the pgproc list use shared mode, not exclusive.

Another point is that RecentXmin is no longer used directly, except in
TransactionIdIsActive(). But that routine is only used for an Assert()
now, so I think it's fine to just have GetRecentXmin() return the value
and not set RecentXmin as a side effect; my point is that ISTM
RecentXmin can be removed altogether, which makes that business simpler.

As far as GetOldestXmin() is concerned, that routine now ignores its
arguments. Is that okay? I imagine it's just a natural consequence of
how things work now. [ ... reads some more code ... ] Oh, now it's
only used to determine a freeze limit. I think you should just remove
the arguments and make that whole thing simpler. I was going to suggest
that AdvanceRecentGlobalXmin() could receive a possibly-NULL Relation
argument to pass down to GetOldestSnapshotGuts() which can make use of
it for a tigther limit, but since OldestXmin is only for freezing, maybe
there is no point in this extra complication.

Regarding the pendingGlobalXmin stuff, I didn't find any documentation.
I think you just need to write a very long comment on top of
AdvanceRecentGlobalXmin() to explain what it's doing. After going over
the code half a dozen times I think I understand what's idea, and it
seems sound.

Not sure about the advancexmin_counter thing. Maybe in some cases
sessions will only run 9 transactions or less and then finish, in which
case it will only get advanced at checkpoint time, which would be way
too seldom. Maybe it would work to place the counter in shared memory:

acquire(spinlock);
if (ShmemVariableCache->counter++ >= some_number)
{
SI don't understand this:hmemVariableCache->counter = 0;
do_update = true;
}
release(spinlock);
if (do_update)
AdvanceRecentGlobalXmin();
(Maybe we can forgo the spinlock altogether? If the value gets out of
hand once in a while, it shouldn't really matter) Perhaps the "some
number" should be scaled to some fraction of MaxBackends, but not less
than X (32?) and no more than Y (128?). Or maybe just use constant 10,
as the current code does. But it'd be total number of transactions, not
transaction in the current backend.

I think it'd be cleaner to have a distinct typedef to use when
XLogRecPtr is used as a snapshot representation. Right now there is no
clarity on whether we're interested in the xlog position itself or it's
a snapshot value.

HeapTupleSatisfiesVacuum[X]() and various callers needs update to their
comments: when OldestXmin is mentioned, it should be OldestSnapshot.

I noticed that SubTransGetTopmostTransaction() is now only called from
predicate.c, and it's pretty constrained in what it wants. I'm not
implying that we want to do anything in your patch about it, other than
perhaps add a note to it that we may want to examine it later for
possible changes.

I haven't gone over the transam.c, clog.c changes yet.

I attach a couple of minor tweaks to the README, mostly for clarity (but
also update clog -> csnlog in a couple of places).

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

readme.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index d11c817..23479b6 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -247,24 +247,24 @@ What we actually enforce is strict serialization of commits and rollbacks
 with snapshot-taking. We use the LSNs generated by Write-Ahead-Logging as
 a convenient monotonically increasing counter, to serialize commits with
 snapshots. Each commit is naturally assigned an LSN; it's the LSN of the
-commit WAL record. Snapshots are also represented by an LSN; all commits
-with a commit record's LSN <= the snapshot's LSN are considered as visible
-to the snapshot. Acquiring a snapshot is a matter of reading the current
-WAL insert location.
+commit WAL record. Snapshots are also represented by an LSN.  All commits
+with a commit record's LSN less than or equal to the snapshot's LSN are
+considered to be visible to the snapshot. Acquiring a snapshot is a matter
+of reading the current WAL insert location.
 
 When checking the visibility of a tuple, we need to look up the commit LSN
 of the xmin/xmax. For that purpose, we store the commit LSN of each
-transaction in the commit log (clog). However, storing the LSN in the
-clog is not atomic with writing the WAL record: it's possible that
-another backend takes a snapshot right after a commit WAL record was inserted
-to WAL, but the clog hasn't been updated yet. To close that race condition,
-just before writing the commit WAL record, the committing backend sets the
-clog entry to a special value, COMMITLSN_COMMITTING. It is replaced with
-the commit record's LSN after the WAL record has been written. When a
-backend looks up a transaction's commit LSN in the clog and sees
-COMMITLSN_COMMITTING, it waits for the commit to finish, by calling
-XactLockTableWait(). That's quite heavy-weight, but the race should
-happen rarely.
+transaction in the commit sequence number log (csnlog). However, storing
+the LSN in the csnlog is not atomic with writing the WAL record: it's
+possible that another backend takes a snapshot right after a commit WAL
+record was inserted to WAL, but the csnlog hasn't been updated yet. To
+close that race condition, just before writing the commit WAL record, the
+committing backend sets the csnlog entry to a special value,
+COMMITLSN_COMMITTING. It is replaced with the commit record's LSN after
+the WAL record has been written.  When a backend looks up a transaction's
+commit LSN in the csnlog and sees COMMITLSN_COMMITTING, it waits for the
+commit to finish by calling XactLockTableWait(). That's quite
+heavy-weight, but the race should happen rarely.
 
 So, a snapshot is simply an LSN, such that all transactions that committed
 before that LSN are visible, and everything later is still considered as

#51

Jeff Davis

pgsql@j-davis.com

over 11 years ago

In reply to: Heikki Linnakangas (#42)

Re: Proposal for CSN based snapshots

On Fri, 2014-06-13 at 13:24 +0300, Heikki Linnakangas wrote:

Attached is a new patch. It now keeps the current pg_clog unchanged, but
adds a new pg_csnlog besides it. pg_csnlog is more similar to
pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, and
segments older than GlobalXmin can be truncated.

It appears that this patch weakens the idea of hint bits. Even if
HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
snapshot.

That's fast if the xid is less than snap->xmin, but otherwise it needs
to do a fetch from the csnlog, which is exactly what the hint bits are
designed to avoid. And we can't get around this, because the whole point
of this patch is to remove the xip array from the snapshot.

If the transaction was committed a long time ago, then we could set
PD_ALL_VISIBLE and the VM bit, and a scan wouldn't even look at the hint
bit. If it was committed recently, then it's probably greater than the
recentXmin. I think there's still room for a hint bit to technically be
useful, but it seems quite narrow unless I'm missing something (and a
narrowly-useful hint bit doesn't seem to be useful at all).

I'm not complaining, and I hope this is not a showstopper for this
patch, but I think it's worth discussing.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Jeff Davis (#51)

Re: Proposal for CSN based snapshots

On 08/26/2014 12:03 PM, Jeff Davis wrote:

On Fri, 2014-06-13 at 13:24 +0300, Heikki Linnakangas wrote:

Attached is a new patch. It now keeps the current pg_clog unchanged, but
adds a new pg_csnlog besides it. pg_csnlog is more similar to
pg_subtrans than pg_clog: it's not WAL-logged, is reset at startup, and
segments older than GlobalXmin can be truncated.

It appears that this patch weakens the idea of hint bits. Even if
HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
snapshot.

That's fast if the xid is less than snap->xmin, but otherwise it needs
to do a fetch from the csnlog, which is exactly what the hint bits are
designed to avoid. And we can't get around this, because the whole point
of this patch is to remove the xip array from the snapshot.

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.

My current thinking is that access to the csnlog will need to be made
faster. Currently, it's just another SLRU, but I'm sure we can do better
than that. Or add a backend-private cache on top of it; that might
already alleviate it enough..

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

Jeff Davis

pgsql@j-davis.com

over 11 years ago

In reply to: Heikki Linnakangas (#52)

Re: Proposal for CSN based snapshots

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

My current thinking is that access to the csnlog will need to be made
faster. Currently, it's just another SLRU, but I'm sure we can do better
than that. Or add a backend-private cache on top of it; that might
already alleviate it enough..

Aren't those the same ideas that have been proposed for eliminating hint
bits?

If this patch makes taking snapshots much faster, then perhaps it's
enough incentive to follow through on one of those ideas. I am certainly
open to removing hint bits (which was probably clear from earlier
discussions) if there are significant wins elsewhere and the penalty is
not too great.

To continue the discussion on this patch, let's assume that we make
TransactionIdGetCommitLSN sufficiently fast. If that's the case, can't
we make HeapTupleSatisfiesMVCC look more like:

LsnMin = TransactionIdGetCommitLSN(xmin);
if (!COMMITLSN_IS_COMMITTED(LsnMin))
LsnMin = BIG_LSN;

return (snapshot->snapshotlsn >= LsnMin &&
snapshot->snapshotlsn < LsnMax);

There would need to be some handling for locked tuples, or tuples
related to the current transaction, of course. But I still think it
would turn out simpler; perhaps by enough to save a few cycles.

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Greg Stark

stark@mit.edu

over 11 years ago

In reply to: Heikki Linnakangas (#52)

Re: Proposal for CSN based snapshots

On Tue, Aug 26, 2014 at 11:45 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:

It appears that this patch weakens the idea of hint bits. Even if
HEAP_XMIN_COMMITTED is set, it still needs to find out if it's in the
snapshot.

That's fast if the xid is less than snap->xmin, but otherwise it needs
to do a fetch from the csnlog, which is exactly what the hint bits are
designed to avoid. And we can't get around this, because the whole point
of this patch is to remove the xip array from the snapshot.

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend most
of the time acquiring snapshots.

Interesting analysis.

I suppose the equivalent of hint bits would be to actually write the
CSN of the transaction into the record when the hint bit is set.

I don't immediately see how to make that practical. One thought would
be to have a list of xids in the page header with their corresponding
csn -- which starts to sound a lot like Oralce's "Interested
Transaction List". But I don't see how to make that work for the
hundreds of possible xids on the page.

The worst case for visibility resolution is you have a narrow table
that has random access DDL happening all the time, each update is a
short transaction and there are a very high rate of such transactions
spread out uniformly over a very large table. That means any given
page has over 200 rows with random xids spread over a very large range
of xids.

Currently the invariant hint bits give us is that each xid needs to be
looked up in the clog only a more or less fixed number of times, in
that scenario only once since the table is very large and the
transactions short lived.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

Jeff Davis

pgsql@j-davis.com

over 11 years ago

In reply to: Greg Stark (#54)

Re: Proposal for CSN based snapshots

On Tue, 2014-08-26 at 19:25 +0100, Greg Stark wrote:

I don't immediately see how to make that practical. One thought would
be to have a list of xids in the page header with their corresponding
csn -- which starts to sound a lot like Oralce's "Interested
Transaction List". But I don't see how to make that work for the
hundreds of possible xids on the page.

I feel like that's moving in the wrong direction. That's still causing a
lot of modifications to a data page when the data is not changing, and
that's bad for a lot of cases that I'm interested in (checksums are one
example).

We are mixing two kinds of data: user data and visibility information.
Each is changed under different circumstances and has different
characteristics, and I'm beginning to think they shouldn't be mixed at
all.

What if we just devised a structure specially designed to hold
visibility information, put all of the visibility information there, and
data pages would only change where there is a real, user-initiated
I/U/D. Vacuum could still clear out dead tuples from the data area, but
it would do the rest of its work on the visibility structure. It could
even be a clever structure that could compress away large static areas
until they become active again.

Maybe this wouldn't work for all tables, but could be an option for big
tables with low update rates.

The worst case for visibility resolution is you have a narrow table
that has random access DDL happening all the time, each update is a
short transaction and there are a very high rate of such transactions
spread out uniformly over a very large table. That means any given
page has over 200 rows with random xids spread over a very large range
of xids.

That's not necessarily a bad case, unless the CLOG/CSNLOG lookup is a
significant fraction of the effort to update a tuple.

That would be a bad case if you introduce scans into the equation as
well, but that's not a problem if the all-visible bit is set.

Currently the invariant hint bits give us is that each xid needs to be
looked up in the clog only a more or less fixed number of times, in
that scenario only once since the table is very large and the
transactions short lived.

A backend-local cache might accomplish that, as well (would still need
to do a lookup, but no locks or contention). There would be some
challenges around invalidation (for xid wraparound) and pre-warming the
cache (so establishing a lot of connections doesn't cause a lot of CLOG
access).

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Greg Stark

stark@mit.edu

over 11 years ago

In reply to: Jeff Davis (#55)

Re: Proposal for CSN based snapshots

On Tue, Aug 26, 2014 at 8:32 PM, Jeff Davis <pgsql@j-davis.com> wrote:

We are mixing two kinds of data: user data and visibility information.
Each is changed under different circumstances and has different
characteristics, and I'm beginning to think they shouldn't be mixed at
all.

What if we just devised a structure specially designed to hold
visibility information, put all of the visibility information there, and
data pages would only change where there is a real, user-initiated
I/U/D. Vacuum could still clear out dead tuples from the data area, but
it would do the rest of its work on the visibility structure. It could
even be a clever structure that could compress away large static areas
until they become active again.

Well fundamentally the reason the visibility information is with the
user data is so that we don't need to do two i/os to access the data.
The whole point of hint bits is to guarantee that after some amount of
time you can read data directly out of the heap page and use it
without doing any additional I/O.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Jeff Davis

pgsql@j-davis.com

over 11 years ago

In reply to: Greg Stark (#56)

Re: Proposal for CSN based snapshots

On Tue, 2014-08-26 at 21:00 +0100, Greg Stark wrote:

Well fundamentally the reason the visibility information is with the
user data is so that we don't need to do two i/os to access the data.
The whole point of hint bits is to guarantee that after some amount of
time you can read data directly out of the heap page and use it
without doing any additional I/O.

If the data is that static, then the visibility information would be
highly compressible, and surely in shared_buffers already.

(Yes, it would need to be pinned, which has a cost.)

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Amit Kapila

amit.kapila16@gmail.com

over 11 years ago

In reply to: Greg Stark (#54)

Re: Proposal for CSN based snapshots

On Tue, Aug 26, 2014 at 11:55 PM, Greg Stark <stark@mit.edu> wrote:

Interesting analysis.

I suppose the equivalent of hint bits would be to actually write the
CSN of the transaction into the record when the hint bit is set.

I don't immediately see how to make that practical. One thought would
be to have a list of xids in the page header with their corresponding
csn -- which starts to sound a lot like Oralce's "Interested
Transaction List". But I don't see how to make that work for the
hundreds of possible xids on the page.

Here we can limit the number of such entries (xid-csn pair) that
can exist in block header and incase it starts to overflow, it can
use some alternate location. Another idea could be that reserve
the space whenever new transaction starts to operate in block
and update the CSN for the same later and incase the maximum
number of available slots gets filled up, we can either use alternate
location or may be block the new transaction until such a slot is
available, here I am assuming that Vacuum would clear the slots
in blocks as and when such transactions become visible to
everyone or may be the transaction which didn't find any slot can
also try to clear such slots.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#59

Jeff Davis

pgsql@j-davis.com

over 11 years ago

In reply to: Heikki Linnakangas (#52)

Re: Proposal for CSN based snapshots

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.

What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?

I haven't had a good track record with my performance testing recently,
so I'm unlikely to be much help there.

It seems a bit early for bug hunting, unless you think it will raise
possible design issues.

I think there's already at least one design issue to consider, which is
whether we care about CLOG/CSNLOG access for hinted records where the
xid > snapshot->xmin (that is, accesses that previously would have
looked at xip). Would more discussion help here or do we need to wait
for performance numbers?

Regards,
Jeff Davis

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Jeff Davis (#59)

Re: Proposal for CSN based snapshots

On 08/27/2014 08:23 AM, Jeff Davis wrote:

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.

What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?

Design review, mostly. I know the performance still sucks. Although if
you can foresee some performance problems, aside from the extra CSNLOG
lookups, it would be good to know.

I think there's already at least one design issue to consider, which is
whether we care about CLOG/CSNLOG access for hinted records where the
xid > snapshot->xmin (that is, accesses that previously would have
looked at xip). Would more discussion help here or do we need to wait
for performance numbers?

I think my current plan is to try to make that CSNLOG lookup fast. In
essence, rewrite SLRUs to be more performant. That would help with the
current clog, too, which would make it more feasible to set hint bits
less often. In particular, avoid dirtying pages just to set hint bits.
I'm not sure if that's enough - you can't beat checking a single hint
bit in the same cache line as the XID - but I think it's worth a try.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#61

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Heikki Linnakangas (#60)

Re: Proposal for CSN based snapshots

On 08/27/2014 09:40 AM, Heikki Linnakangas wrote:

On 08/27/2014 08:23 AM, Jeff Davis wrote:

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.

What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?

Design review, mostly. I know the performance still sucks. Although if
you can foresee some performance problems, aside from the extra CSNLOG
lookups, it would be good to know.

I think for this commitfest, I've gotten as much review of this patch
that I can hope for. Marking as "Returned with Feedback". But of course,
feel free to continue reviewing and commenting ;-).

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Alexander Korotkov

a.korotkov@postgrespro.ru

over 10 years ago

In reply to: Heikki Linnakangas (#61)

Re: Proposal for CSN based snapshots

On Wed, Aug 27, 2014 at 10:42 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 08/27/2014 09:40 AM, Heikki Linnakangas wrote:

On 08/27/2014 08:23 AM, Jeff Davis wrote:

On Tue, 2014-08-26 at 13:45 +0300, Heikki Linnakangas wrote:

Yeah. This patch in the current state is likely much much slower than
unpatched master, except in extreme cases where you have thousands of
connections and short transactions so that without the patch, you spend
most of the time acquiring snapshots.

What else are you looking to accomplish with this patch during this
'fest? Bug finding? Design review? Performance testing?

Design review, mostly. I know the performance still sucks. Although if
you can foresee some performance problems, aside from the extra CSNLOG
lookups, it would be good to know.

I think for this commitfest, I've gotten as much review of this patch that
I can hope for. Marking as "Returned with Feedback". But of course, feel
free to continue reviewing and commenting ;-).

What is current state of this patch? Does community want CSN based
snapshots?
Last email in lists was in August 2014. Huawei did talk about their further
research on this idea at PGCon and promised to publish their patch in open
source.
http://www.pgcon.org/2015/schedule/events/810.en.html
However, it isn't published yet, and we don't know how long we could wait
for it.
Now, our company have resources to work on CSN based snapshots for 9.6. If
Huawei will not publish their patch, we can pick up this work from last
community version of this patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#63

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Alexander Korotkov (#62)

Re: Proposal for CSN based snapshots

On 24 July 2015 at 14:43, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:

What is current state of this patch? Does community want CSN based
snapshots?

CSN snapshots could give us

1. More scalable GetSnapshotData() calls
2. Faster GetSnapshotData() calls
3. Smaller snapshots which could be passed around more usefully between
distributed servers

It depends on the exact design we use to get that. Certainly we do not want
them if they cause a significant performance regression.

Now, our company have resources to work on CSN based snapshots for 9.6. If
Huawei will not publish their patch, we can pick up this work from last
community version of this patch.

Sounds good.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#64

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Simon Riggs (#63)

Re: Proposal for CSN based snapshots

On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

It depends on the exact design we use to get that. Certainly we do not want
them if they cause a significant performance regression.

Yeah. I think the performance worries expressed so far are:

- Currently, if you see an XID that is between the XMIN and XMAX of
the snapshot, you hit CLOG only on first access. After that, the
tuple is hinted. With this approach, the hint bit doesn't avoid
needing to hit CLOG anymore, because it's not enough to know whether
or not the tuple committed; you have to know the CSN at which it
committed, which means you have to look that up in CLOG (or whatever
SLRU stores this data). Heikki mentioned adding some caching to
ameliorate this problem, but it sounds like he was worried that the
impact might still be significant.

- Mixing synchronous_commit=on and synchronous_commit=off won't work
as well, because if the LSN ordering of commit records matches the
order in which transactions become visible, then an async-commit
transaction can't become visible before a later sync-commit
transaction. I expect we might just decide we can live with this, but
it's worth discussing in case people feel otherwise.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Robert Haas (#64)

Re: Proposal for CSN based snapshots

On 2015-07-24 14:21:39 -0400, Robert Haas wrote:

- Mixing synchronous_commit=on and synchronous_commit=off won't work
as well, because if the LSN ordering of commit records matches the
order in which transactions become visible, then an async-commit
transaction can't become visible before a later sync-commit
transaction. I expect we might just decide we can live with this, but
it's worth discussing in case people feel otherwise.

I'm not following this anymore. Even though a couple months back I
apparently did. Heikki's description of the problem was:

On 2014-05-30 17:59:23 +0300, Heikki Linnakangas wrote:

One thorny issue came up in discussions with other hackers on this in PGCon:

When a transaction is committed asynchronously, it becomes visible to other
backends before the commit WAL record is flushed. With CSN-based snapshots,
the order that transactions become visible is always based on the LSNs of
the WAL records. This is a problem when there is a mix of synchronous and
asynchronous commits:

If transaction A commits synchronously with commit LSN 1, and transaction B
commits asynchronously with commit LSN 2, B cannot become visible before A.
And we cannot acknowledge B as committed to the client until it's visible to
other transactions. That means that B will have to wait for A's commit
record to be flushed to disk, before it can return, even though it was an
asynchronous commit.

Afacs the xlog insertion order isn't changed relevantly by the patch?

Right now the order for sync commit = on is:

XactLogCommitRecord();
XLogFlush();
TransactionIdCommitTree();
ProcArrayEndTransaction();

and for sync commit = off it's:
XactLogCommitRecord();
TransactionIdAsyncCommitTree();
ProcArrayEndTransaction();

with the CSN patch it's:

sync on:
TransactionIdSetCommitting();
XactLogCommitRecord();
XLogFlush(XactLastRecEnd);
TransactionIdCommitTree();

sync off:
TransactionIdSetCommitting();
XactLogCommitRecord();
TransactionIdCommitTree();

How does this cause the above described problem? Yes, mixing sync/async
commits can give somewhat weird visibility semantics, but that's not
really new, is it?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Simon Riggs

simon@2ndQuadrant.com

over 10 years ago

In reply to: Robert Haas (#64)

Re: Proposal for CSN based snapshots

On 24 July 2015 at 19:21, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:

It depends on the exact design we use to get that. Certainly we do not

want

them if they cause a significant performance regression.

Yeah. I think the performance worries expressed so far are:

- Currently, if you see an XID that is between the XMIN and XMAX of
the snapshot, you hit CLOG only on first access. After that, the
tuple is hinted. With this approach, the hint bit doesn't avoid
needing to hit CLOG anymore, because it's not enough to know whether
or not the tuple committed; you have to know the CSN at which it
committed, which means you have to look that up in CLOG (or whatever
SLRU stores this data). Heikki mentioned adding some caching to
ameliorate this problem, but it sounds like he was worried that the
impact might still be significant.

This seems like the heart of the problem. Changing a snapshot from a list
of xids into one number is easy. Making XidInMVCCSnapshot() work is the
hard part because there needs to be a translation/lookup from CSN to
determine if it contains the xid.

That turns CSN into a reference to a cached snapshot, or a reference by
which a snapshot can be derived on demand.

- Mixing synchronous_commit=on and synchronous_commit=off won't work
as well, because if the LSN ordering of commit records matches the
order in which transactions become visible, then an async-commit
transaction can't become visible before a later sync-commit
transaction. I expect we might just decide we can live with this, but
it's worth discussing in case people feel otherwise.

Using the Commit LSN as CSN seems interesting, but it is not the only
choice.

Commit LSN is not the precise order in which commits become visible because
of the race condition between marking commit in WAL and marking commit in
clog. That problem is accentuated by mixing async and sync, but that is not
the only source of racing.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#67

Alexander Korotkov

a.korotkov@postgrespro.ru

over 10 years ago

In reply to: Simon Riggs (#66)

Re: Proposal for CSN based snapshots

On Sat, Jul 25, 2015 at 11:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 24 July 2015 at 19:21, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs <simon@2ndquadrant.com>
wrote:

It depends on the exact design we use to get that. Certainly we do not

want

them if they cause a significant performance regression.

Yeah. I think the performance worries expressed so far are:

- Currently, if you see an XID that is between the XMIN and XMAX of
the snapshot, you hit CLOG only on first access. After that, the
tuple is hinted. With this approach, the hint bit doesn't avoid
needing to hit CLOG anymore, because it's not enough to know whether
or not the tuple committed; you have to know the CSN at which it
committed, which means you have to look that up in CLOG (or whatever
SLRU stores this data). Heikki mentioned adding some caching to
ameliorate this problem, but it sounds like he was worried that the
impact might still be significant.

This seems like the heart of the problem. Changing a snapshot from a list
of xids into one number is easy. Making XidInMVCCSnapshot() work is the
hard part because there needs to be a translation/lookup from CSN to
determine if it contains the xid.

That turns CSN into a reference to a cached snapshot, or a reference by
which a snapshot can be derived on demand.

I got the problem. Currently, once we set hint bits don't have to visit
CLOG anymore. With CSN snapshots that is not so. We still have to translate
XID into CSN in order to compare it with snapshot CSN. In version of CSN
patch in this thread we still have XMIN and XMAX in the snapshot. AFAICS
with CSN snapshots XMIN and XMAX are not necessary required to express
snapshot, they were kept for optimization. That restricts usage of XID =>
CSN map with given range of XIDs. However, with long running transactions
[XMIN; XMAX] range could be very wide and we could use XID => CSN map
heavily in wide range of XIDs.

As I can see in Huawei PGCon talk "Dense Map" in shared memory is proposed
for XID => CSN transformation. Having large enough "Dense Map" we can do
most of XID => CSN transformations with single shared memory access. PGCon
talk gives us result of pgbench. However, pgbench doesn't run any long
transactions. With long running transaction we can run out of "Dense Map"
for significant part of XID => CSN transformations. Dilip, did you test
your patch with long transactions?

I'm also thinking about different option for optimizing this. When we set
hint bits we can also change XMIN/XMAX with CSN. In this case we wouldn't
need to do XID => CSN transformation for this tuple anymore. However, we
have 32-bit XIDs for now. We could also have 32-bit CSNs. However, that
would doubles our troubles with wraparound: we will have 2 counters that
could wraparound. That could return us to thoughts about 64-bit XIDs.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#68

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Alexander Korotkov (#67)

1 attachment(s)

Re: Proposal for CSN based snapshots

(Reviving an old thread)

I spent some time dusting off this old patch, to implement CSN
snapshots. Attached is a new patch, rebased over current master, and
with tons of comments etc. cleaned up. There's more work to be done
here, I'm posting this to let people know I'm working on this again. And
to have a backup on the 'net :-).

I switched to using a separate counter for CSNs. CSN is no longer the
same as the commit WAL record's LSN. While I liked the conceptual
simplicity of CSN == LSN a lot, and the fact that the standby would see
the same commit order as master, I couldn't figure out how to make async
commits to work.

Next steps:

* Hot standby feedback is broken, now that CSN != LSN again. Will have
to switch this back to using an "oldest XID", rather than a CSN.

* I plan to replace pg_subtrans with a special range of CSNs in the
csnlog. Something like, start the CSN counter at 2^32 + 1, and use CSNs
< 2^32 to mean "this is a subtransaction, parent is XXX". One less SLRU
to maintain.

* Put per-proc xmin back into procarray. I removed it, because it's not
necessary for snapshots or GetOldestSnapshot() (which replaces
GetOldestXmin()) anymore. But on second thoughts, we still need it for
deciding when it's safe to truncate the csnlog.

* In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an
"oldest CSN", instead of "oldest xmin", but that's not strictly
necessary. To limit the size of the patch, I might revert those changes
for now.

* Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in
his review comments two years ago, that was quite complicated. And I'm
worried that the lazy scheme I had might not allow pruning fast enough.
I plan to make it more aggressive, so that whenever the currently oldest
transaction finishes, it's responsible for advancing the "global xmin"
in shared memory. And the way it does that, is by scanning the csnlog,
starting from the current "global xmin", until the next still
in-progress XID. That could be a lot, if you have a very long-running
transaction that ends, but we'll see how it performs.

* Performance testing. Clearly this should have a performance benefit,
at least under some workloads, to be worthwhile. And not regress.

- Heikki

Attachments:

csn-3.patch.gzapplication/gzip; name=csn-3.patch.gzDownload

�a��Wcsn-3.patch�\ys����[�H^m,Y�dK���}��������$[[[*��dn(R��c��w�>��d9�����$h4}���v&�lN�H���{Q���#Fc[Y��x�v��m����Oq�|�
����P��m���Z�����;�����7���:#m����F��?D�7l��:�?��O�o���9���tuq&���(���OK��@�]�9�7]?�nj�046b�G���%�;e}w*�S�#������4���(q���>����cK�[W	�w�������{r�b��S�k���i�$��W������/n�*���������8���T\g��C�.�f��v6��FB
�y��k��CO��;?*&�$E��sj�|����K����^4�
+� �!�g������5�At�t����A��J��6����zn�V;����m����[�����~���n��������j�����K^��v��A��u�6H�0h�K�?����o�������c��V�����=��~���T0��
��ROH'���3����������������k��4�U�f�1E��5���?l�������&�
<����������6�3m������E?�Y2��|J���������s��B��K�w����N(�������h�����u�-z��N����b�i�(�^(�����(�fF
���K���3����\.���N�d�3��0����)�J������,`�N�9�#g-,��&�'l��AOB�������>�Y"��m�&��+���C��F��9����o�~�&������/4=\+�+����xG~yI��d���Fa��N{�9F�������
�'��]���E#�n�+��=�_�#5�D����t]��������%���@\�����oV�PQ�fa'�I�Y��]�Rh�/�+��
k�J�@P�I1���>B0�v�n�m`�&�����^2�HKT����g�V���d��"�m%�}'1pc��M��Mg��u��.��O�jEl�=G��gm�HK\�{'�R?6��Je=�bk�Vy�+�}�]u^����������p~e�cf���=C���vc&�������K%�����;\�y���D�i{���p�N�
��_a�/
`O��n�|��{"z�[�z�n�s?�����w�]s�&p9����l���C���R�!�tb�N]a�{�"y�(�'�`�P�U��!���_��rnD%�R�7���Z����O^�L�#�>@.3Mk��;�!�{>;)�����	�8����(y�N�!����k� ���! �2{�� �������[��z���j��2���{�Cl����&����h�$�9��D�������&�UI�)�
�?14�|���7�|Mr���9sYJg���h���m���8�����+��U�������@~D%%J���t��:Cr�Q�w2������W~�<|j�g��y���c���3~�zxsC�����=i�`���
Cl�T��q��yX�{I�x��ue�k10�T�}vt3|��+|�8���Qag�;$��Ot};��I�z)�R�[�Y27�`��8�������T5I�����Da��o�wF�?{D�/pj
��@��;"9����m5��\yu/�_O��'��W�\__c��'N�:���	������l�����=t���A�����n��B�f(�����������E?;���@��)��g�b.�
�?�V�K��i 1L�>a&���|DCD�"���X�<x���������)��Hh�0�g�-"0;����� :�#Qa���$cB�������X�������J�O�_��O�i���O��&�_"�,���o�*X�+�����\�Q_����'G��v:�9�\h�O����7)��8��#�T5�)+�
��z��f���&)�b�9�8;;��9>*1\�N����q�hS?�$�u�{��`�t�3/!,����E�y%#��`�Qu�����%>��q%��b
 ��s�#a����6�zY��$�g�>b��m�xNx�F�@A�
V�NV��%�@+��*Ca/5���Y�!�B�q
��A�]@i���:���~���P��Bp4���L\����f�I������QQ�q�[�w�����0E%O�AlK���H$$S�A%M����g.u��)~d��_�sGJ@��~���
�5�d����$�����f���Rd]P%�d6�c��,rY�c:�V�#\��'m�	��8��W�)F�Z��y���v�}H�~��k-do/�d�Z����9�/�{���L���d���
���8'�3
p4�J��/l5�B ��
1����q����j-y&���e�� ���W:<3���L�yv����%J��k�D"p��d`���	KI�!5fKB���#5tu�����=j���'������1�{���d����q�(b(�O<���5���>
ib���!3%Q#1_j��$���T9��^���a�D��P�U:a��dy=�lr��`+�6#��+���5��:
]�`�@�o'�4}���o����$�.�^?���}�O(p���$@�Z�d����OU�5-���O�6)�b
�X@�fS`�a m�P����pu����;�H4����_�2�d�`A��Z$���P)@[M}n�{�)�s�tM(#��@Y�V����#2�U/@��.��Y�Z�8�(��fcB��i��5q�4D|���� #0VBJ�
t_V5DgE���\PR�0�����a>w^-X���*cD��fX�3�������n���DJ~�R�U�V�<�+Vr�aV��&��}�����|�����wabk.��y{Z>�GwwQ@��Fw���G~�����6�|���N�CM9�	�!��^#qK,H,�8*�,�8���/$������/O��G�G��(J���E�D������RZ�l��!g
���w�o�u�'@�q<~��g+p"<��t��=[���uI�0��0�fK�!�T��L�V�t8Ua@��7�<�
��5�f�f\�+/@���g��1�r��/q�g�l=�����ShR�b=�TY>%\�jP79�Ki4g��"�De�5%T��J��l����k����]#/���$�O�t8����*�ul�hO�G-��%
�a�����Lt_�:i����.*�n��^K��d�*�������h�K�G�
����),jfI��
���9��.��+�����!�R1�^`<����lF��F��;�����	�D�x����/���5��&����e!?�w:������XTqRI��L�n����S%?'*�/����I~�c5<]�|���4�y�����,�)���I���l�;�c/��.h��,���G>���&�\���&5��KZ��h2P3���|��[�EP���]]�1B"��^��<�
w�	_�h�f������1#�j����K��f����5t���<.�lR�MD-O<��f`
� ��C�=���rt�f���_��	d	T���a�A.p�L�6��i�E������;I�G������/�Z4��j��*y���e��X����2�'X��*O\��h�g�>%�x����"I
U������<�T KH���;F�������U�[o�{��7�9�8���g����,�6S��m�,c����TOFb�w�����A`Ut5<��
����Q#��1U�V�S����^\@���l�#<;F82�����j��������H�]|6����JAa%��^Z��p�
ol�kiY�-��;��b�M��u�$[���������/��N�)z5s*=�����8FS|�fN����T?�w����_�4c��\�<��7p��K�UB���j����"���	��t$@SL#�����*�$������oc�T�C���*j�^}S�VWaL�|LJow�����X�`��V����������C��T��
-�?�������R!�8��|�91#q���{���Jb9[E�}�����*@�S;r�����wE�������Uk&}=��W�����&�"��iz��O���1�	<�f�z��H�6���,	&�n��xt�Fq�V��Q�)�oy��X���OKt9����1�Q����|TS�N�uk����#�j�����]&�
q���.&���<����i�m�i�'��Y+����
�&`"t��p� 8 !�y�����t3��f�����D�����`>c�0O���/�I�b|�s�;t� ����6r<B�+I!f\B
��Cl�O����kz���1������*�o�p��	3:���j������KH���&�&��g�;y�?�����2�-N
�;C^�]��.z��h�Qh��%�S~�����b�R�{x��T��������U7�?����%�����
���"%� �������S=TUL�z,\��" +�|������������.x�Q��@�*�V�4�e��x��*C���T}�>M�i�;�Mg���N,?��c��
�qSm����Aa�j����F�[jz{	?� ]��!�+�����jj_ ���g�4j�C	��1������e�iL���b	�bN9'o ��4�y/�^�"�Y�4�x:I{��ymF=�b_����:��Eh�p�u���������^�	�c���P���G?�%:���=�J���y��r���PL����v�K�� I�����	��xOt
�9��`_�G9�}������G!
��JVRt�
�%JU{�V�>�1�D���� �Kk�V��D�O&��[(�$v�&vf�/�
>�Z��B����hZ�k]b������{l�V�*�`2�v��$�r[����������nIW��t���/�-������6$:�;g�)�;sQl �6���4����K�Y�)�9X�����[x���9�n��o�.�P�=B���l5�31����<L��r��)P���;���"�'%!#[
�h?��~p���v�w.R�/�I�R�UXb/E��C
~�S�.�N� a���E:���	�H�i"����V
7�Ffz�pk~2hb��M�o[c��
�P�>�����|����Q4�%�ye7�J0c%�������9��]����]���W���~��x4�_r+�q,��9�B?��TKz�`z��Y�R�`}��������?���������R!�O]�����%*��
�7�
��s�����H<L�%�[���a�����|a�7�g5��k�����I�L{�t���A�L=g�X���;D���7���U8��!#��<>QN�W�5�����p�V������s��C�K�����\�3/N���X�?�N��f�?���E{>����/r�i��^t�����?*��	�����.�3d���LL�p���h�|�'�V��n�):��@�B]:~�FP�D�%3��^�8����@�c��;7x���X�&����8M�0�f��RGB�B�7h'���+s�,i�I�6,����$�g{�b'�#�#J����<}�E���|~Hx��z.-����W�A�B���e�:"��_r����J�?���7�)�f���R����?G�D���j1�t9/2���R%0�]�y��&L���>�s��`:����Qj�e�.VN�tdo���c!�#[�ZB�g<�2?���
���G�	���)��d]m�B�#E����f?��h^r��UKn_�sm��5���\R�������B�$�"�B����h���h���,Z�oH����i$�{S8@]���r.����&���M@'K�����	P�7�Q�^gT��_%Z�����p��h@����m�9\N�4 *��2������L����
��VZ���m����T�������f�o�����j��d�
���3(�{SKN��(�����/X�>.�F���Gj��n�+p��A�T�^�����&3�{�BN�~��'���T&��}��E��e�_��H��Q�3��,�.�0��l������6���3=}i'R9���dLu���o���<H�v�3��SW�)22+V���:{{�z��tS|�zT=bm����T�z�x����nM1�o���E�Y������^0d.� 
���gP�q9f����������f��.�`�b��^Z�^�0_���[d�q���n�C�������M:��|s���7'C^@�Ta���/�Y��v7�	�_�*�~�Y��A���#�����N��m����K�}&%�yo��_p$�-��'��]�%�����vb�\��9G�NC��[1#(g%�ey4��B�T����<���@�}���+b�k1z�$�%��~��tF��
6<�������Y2h/k����>*���:�����C���g��+yjh��^��l�;��O�w#Q����E(��J[��2����������/����E�������*��l0��GL|����@{��y�e,�����I�_I��H����G|T�&I���r�IB���.���8q�z���n��Iz��;��9q����q���w�KI&��0JF�lF\��7Y5�#1�?����C�G'��_"*�PR���!����0�	��H*���x/���+P��D2H��@�$^E@d`*#�/��,x��~7�1ZF���X6��3������4��m<7�_E������������/����q~MaC���=��a����G�������C|c���jzh(����i��N,q;l���U�������$��!r��++�\	��T�����G�j��M�����}��
H�l����;��#X�<�n���3�����3�	I+��]
S�7�(F�%��p (���w��9�"����u��,*r���mv��T�����I/
�s$��h^GyD*P����d�Y��p���z=tu����@z=�^��c�e^���p9d���DTj??��^�B(S���K�WR7@������fs�d��4��9\�W�H����..�@>hM�>ME��'.Z�����'�Cq���w��[]�(��b+Rr���<x%W�Ei��#R� �>��+�ea�WJ'�a]R��%u�d.�uE�b�|(� �i�V�������b�����9���X�d�)bD�J�=����U7I����6�����q�G��n��U�����XE;4��n�������r[|�����!�UT~=j1��Wwju����Z�tt������d���$]�+�����|�1oy"� ��koUR>�&a��C�$Nz�zw��p�Z����z}F����R�w�V]jf���^�/u;�A�(h�o�|N6��N���~����G�+]�#XqP��3Pn�65�k�E@�1KX�
�����Op�P ���������i���Ao�<z��@�5���+x��p�DzO��i6��O�1&,1��D,�Q���b�]z����O/��+	H��sT!g���"N`ao�?w}q/�	�^l�^�(���N�$�S�������@L�i"���m5�Yv�5��?b�]`�a=���Y�82�Q6�f7�Z~�u�r��@��#��#�����Ea,��>r(A���a�l�]Q������-cM�~M_�k�*k�V�	�D�y	BK�4j����DY����}�����l)�J4�S(s��[hvm	�%L������jo�J^\��j����/G��VV�1u{�A9(�a��n5�i1l��Y�	���T�����B�pd\���`i����fg�Ls?9��AS�(�m��X�Z�#��t��]Y��A�d�������d�s6,��la	������|���>K��_�
/�S��>�'����`�r;v# ��y�U���'Sb�V���r��WO���aTl���)d2�����*����,��Xo,@x�����n%���\�E�/7,����J�Z9^�.���-N)id��l���Mb7�D��|�n~���.}�]�[pY����-F�
)>,7=�;|��'��2o��Q�&����^����~n�!��kKDY�Yds$�!�[_���H��;�W{���v;��������b7���bC�DQ�:~Pp}��I����{�8�'�1�N�^��_�r�?��������]�v�~9>����~�������_�z~����6��+��(:|�4�+�S�N"}�\"��I��/�E��[L������t<c�Wty�^��Z����m�] >Z�J��"�Q���s#-�?���������^s��nt��v�H%;�����s;�T����R/���fs�.�:AQ?���e�f3�L��*/��O��� I5����
����dj�	�
[�����\f��)�#3�E��Z�D�f��t������pD�=MR=�h�yu=7oIon��Bt�s([���;�w�e��%��p�C���'��%9n��2� ���F���o(���Q~�g��F�b�$E��]:�5�@_��P$������\-9��P�J";�/6z^X�cN��hX�$@e�07���}
�R��A��j�b���ogk7�t����F���[��]?���kI��
�m��f�
'�q*���6*�\���Z�����ta1s��&��/�<�<u$���T#>�6�M,�"�	����z����L���a!`�!�Z�n� ���D�u���a��!��Mj������h�z*8���c�O�������y��L�[@���h�7y�7�b�e4������9/�DY�gP;�1��ub�$v�6����M.�W[�Px�q]w��\?�u���V����l��IKb4�#�%A�m�����0���H���R�����#*:�#8�o�oX���E�W�O������������O��W��Ei�FHy6Lcz�JF���5�d�"���_��S��������YB��O��4��tPV\�1�y������%dsa-,�/�M[��AcY��K�_b Q���s����^T�	\����T2B�"'�"����Y�X�=$���N��
���J��XD��0��Q���{�L(8`��_���2o���	H��K�DU(�~WI�>���G��M���zXFQ!�-�|���4t��CR�������Z�����T�,#�3�sec��iJF�2�
cV2���A?r
h��������f�h�a?S[
��'R�dd�=���8D��/0'�/���m��D��E�~�
��{XT�I�k�pG�VbJ(�DtY&��]4�@�d�&oH�YP��0S���#�\�,11R����H�q<�N3b�'s&�Jp�u�\�g��8���"��Q�5��Y�rw(]U�������Xx��H]��D�����Y�8�bv���
y����i�����O}oF���������}���q��S9�*r���7�K��Sd�Tc�^]v����%g����
�#:5���E�|�P5��r���#l�9�>�����E�4�8P�[�[�V��ys|N�����s`���e��#�84��\����o^���\-�6�t�x�4�2�k��@����6O���r������ A��~�� <�$��Z�m���xA�b�.����n�$2)�� ��i�%L��[N��R�!xP��y���Y�IMP'F
S���a��������n+JU���$=s��g��T_�m��mL����'�p��Q�
�AA�6��<���mH����Z4o���d�m}�ld�^��{z2	<���eBr��f�L��m+�]}�v�����w:��:�{�������GM�B�(!p9�c����m6`��q�+W�d�M����u�tq�0#�0������A*�(1�vB<I
��`]���d-%�J4��u���UyR����r�p�=����jL0jx�n��t���#�W}�Gd�=�D�{B�{�T8_JW�:����z��$o-��y2`����1�3y���%�!�U���Rq�>���;A�B���=�(�m���$A������u�z�P�q�z�G��������'��I?�{��_���
���������m�7��6;Io�����Z[wUcv���x�A�t�h���"�h�1oFs�����j&@N�R����A/z4!P|)/[sF�Z�|/E?~|���G6�$j�,�Vg-�C�^����O�<P�8K8��w����vo����������1��;�����]����?�9�8y�Nv`��,v��Nt�V�����^��n_��;��W5�->�a{"��fy'Da���a������_�?C@A�-2���A5�j(�G"%�6�X6&��|�����Btf��W�$3s���5(0$#(�����N������h�d���������F���=sg	��#�0�(�~��������|B�k�7���$z��gW��.���Av�V5`�������d�����g��Y/uz\C9:���^���z{W��1U]->7^c
f��in"������uL��j�Nx�>e��I�����&�/�O`��8l&���h��n[O_���J��5n]Cu#m�Ho)�3�	������9>�R����sVA���n���y�@����2��R5��:�xxt�=��8y{����������K���\��.�V^N�����
J��l��]�u��3\+��w�l�$��cY���h�I�������p�� ����������	�5��8L?��E�����UR��;�K�y�c��(�C�]�}�Ez}�p���>�����mu?�8<?�nR��<��xI4n������A���#6�����vUy���@���:����v*w�\���]���W�������Z�Y����R�����k��^��^�b,�������IGi~[Y\f���=������[3����tb�6F^I����#L_�:M��d��Te�
}5��E�{��sb��5��t6-�1�{q>5�� ���n�^|x{rq�}wh�1Z�����w>=�n�E4�>��������{Ql���{}�l������_-��R?��_j��k3����~
�u���`K��Y:�s�v���o��Y��~�5���}C��{�����YQ���
0�in~'SV�#����f-������+��#Lf������wh4���z�V����?h7HG	~�U�ow���������9�����#����G<��"�����k[X0)�\���u)
s;[� �E��m��cC��;���~gw��Y����q��vs{��D�����D����]W�D���C�����B_O��G!�4�M�i��0��p�uT�6mB�rDI�R���d�����B��i��3��eR������K�}��f��-���}������G���(j�|2C�����H���U��h>� ��\�A��q��	9�d=��
��s�/`�$�����u�����!��0���Q�b�GAE���D�~G)���8P� �)��:���a��\���.�5r��b�����G���)�����ap9"�0���?0�P����7H�%�����_�!(OLM��$�v��f�K�v���+KO��H8H���J;��F������,S�f�(�f����� �B�'N��0[�f�ev��Rj`�Q����� ��#���E�w��V!frC�h~���V�}������9*Lk��H�B�*{X�6A��g�@����3�B���P��Y/_0�~�8:�st�j�Y�cxM��#hc}NF���L�g5Yt���=���H��.�����W3��dz���!7V��6���.k�<��~���dw@��h(����
�F�P��,M�K=p`?n
�_R��kk&��A<^���YJ'* RU����C���	G��6���~?��e��E���������1E�5mp����3�0�&H���l�y�n�S��z��+�*��a�O���t 3�K�j�R��
��s%�8��]��%�����3|��w�������~)�#��MFXq�Rt��QNX�R��SN)��UB����V[�0W�Y)�7���Yo���j���m���\64wOE�E(�m�P���p�	C�s���3�!���������6%��gT���M1����=Q�37;�P[]t=�oH�H���a�LX����r������as�yz�4m���z�Q���1�x�-�H��7F,a�DJX3�l#$�q�3'gXA���������&�C��]m�3��Y�m��3�����Y�[�bu�J�j#=������y�'��>�
���C$��e77Dd��:9�?'�7���e#�!	`���Y�&�����1��u+i��c(��EN<�M�Kf���k�	e�
;�.:�Q����G��$r��f�=�
M��u�u$�����"���S�U�f�${���|�^
��=��i�@�)�t���%/���:�� �"Gsh�_�����
����Zi��V�,������e�3��7��4%D��q�=LEX��){\E�Pp3�p5�`������U�S�}�G�]HO�!�Q�1��Z�1Q1�L�4�[�/��n�"���Q����4/�����;��9���h���� E0�A_�34��=d��H�)���S���S{s*
��}<��;�M.��f�HC��Op�(��/h��_���<9}�FN'I&F�I�p3M7���y��u�0F������W�u� ���W�O�O<J'���#��z���hS+�����D�T�4X������C����B���0N)���}�.5���-����u�����>���w�5g������X���%	�j��>r��r�!�H���K*]<�nK�c<�����X�!MlZC-Qb#���3�i=�1�'~����&�]�����f��4^uA����7!}h��^l�_g��@�z������N(�MVLX:Pha���J����
Xd� �Efi:�����zZ`�BnL��6�O��37�����J(Q�E*�^���t�$Q��������L�����f��C-@�A�5(���C�u���[�:�Q������%r@ �pc�����)�A7�{��f�|��P(sm ��j8����A��Qz���
c*�,��3��4o�!��N�<"��`p�t�M���d"�g��(
��XJd���f0'�h�@�����.*�VK�i^���A8p��;:=|�S��ER$�^����aC~oD�a���L$���������5���;M�h���@��7�Mfw�������fp��E(N�U����~�����m#a=x�*�j��b�����'%\������v�r���Q#�iF
%f
�u`:1�H��bG
��]�{�\a!z���'��5�[�F�`N���3��2L�XE��>�8N,��,,@$��3d9A����a�j�Os�e��KIf�!KH�($W�����������N��h�TJ�0����W��G��T��,J�'���o
��h`�\m�No
�6���1��VKH�������El+4VP0Y�
<��VC��oz ��^���M���#KJz(���o@a����`W�3�yq���M�OS����47���D�
C���r0����	�9"a��V���@{;�X�.�$�b�.s�2:���F�"��5�f���\��D��)��
�|��J����G�^m��-���/2&��I��Jt��MO�x��<a:@t
C��,V����
f=���G��Z�
V����h�1�3O�?�E��H�4%Jr�����|�y�*j�r!���	W[��P$�5����A�NjQB�������aa9e�DBq.5c`���\���>�r��
��f�`e����Bg�����e�K�$��|u\�i�����'���sWLN�xr3��h�0�kC��Go0�&��O�������D����pL w*�k��
����1�b����w;�F��140E�`�[��6�*������;���Y�-jM6��1a�M����Y�����%��VC��W[�afD!��[�;F����Y�E�8�O�x��$9Zr�� ;	��>��!+f�A���Q�&�L��j'O-!pX����3�AU��`^i.�9��8P�y�����{�4�@�T�7��+/��ia�s;��J�N$�sa�R���HuY��*��b��o}�C�����(��m.�,�Ji���k�w]}J��T������q�!����{"1qFc�^���b&���E�n�k�b�F�S�3!P��/J$*���.���.�vE��a`�eC���B\�d�6|61B�o�^����Q��ld����;����t�'��h��/O�~��0��
/���>T"q����dz1L�3��/�cm������L�l&r��t��\��Iae��q�v����P�P��Z;�����y��
��5�f}q�A���'R89����C���lSf�T�x5�t�����N�U�H������-J@�&�� �X��gI��"�c����QBj��,��A���y026���]k>�]���5�Q���*nJ\Asq�{��M?��""�Kf$ �?�������d�j��r������d�U*����F3m�	����1\,��7�mN��[|�Oy0mYAv[��$C����� =��f�0$��_;zN�,�v�
�#!��D�@�@����d�C���3��0��_B�@�Y/��*����6��L�Z��t��
� ^c���;�K�1�����dd����p�x��>�����'jK��[S^v�].���e�����29�,qy�W��
�)��M��O[6������ �����	�n�1���qi��|'i�*h9V�!��t��W����-!:���N3�S���3rh���E0
#]R��(���\��������%�������(q�q+�@�Y�kJ�!���8zw�6b���D���u����.�Az��H��c8X���� �n��#��z��D���66�+�jf�lu�wC��q��r`M�5��jn5D�U8������R�����9�]�`��.����
l8_X*��31�Q��S"�5t�=�'��q�!l#��,���Z��x�������n��������Z:�������p�w(��8x*�L���G���&LR�k��~-�r��d�������2�P������y���`�����S\���:���/�0y�-`\����~���g��e7������{w0����q���i��*���}-�B��_��<�B���{uO��L3v�VQ��4���m=���G���e�v	R~*}������W�������)�8YY�Qi���,�h�"��%�����|�
B�2����eca&-L����������������������p�:��?(=,��������D���(���ue������]�>&
�h��f�]V����+I�2a��%����eA[����M���|�-T�@T��0�f�\�Pez���LEE>wBD�G���}�*����V��l�=%���n�-U�G�u����F]5�@8��zmS[��b���rj���O8���!`������cB���e��|8��A:���M������6�~�^�� ����5p���������x�����F����������_��]�����*uW���U����%\�9?p�`�~����[�6�(M'������=�x������[����I8���?<�,6�~��c����2��y��o����;�!��:!A#�J�=��h:h���/q
P'�*�����o4�9�'���VE��]��Pv�������|�
p�">�NV_�?$�n�����A���������t��F���_�@[�W�����������=nyu?���^��YhvM���zZ����PcNb�_�\^t?�v_g[[{(3�?*��[��e^�{��&��#.��X���i��0�E��������96;J�g�n�|�)(�]��/�����NF
�v��{��f��(����{�"6�c��4�y�Bq�n�(��c��O�c�.�W��bOr �C���=��Pm�\��Zk�KOu�������^�M+��?T>qB����_\�y�p��W����b�b�����'���Y
�Q�OI��7[��	���on�t���X���
-���H��)J2�&�lJA���>=�d��E&��s.�_��3j�f�
17�7$��7����R
�@Isk���A\lZ
"��
�[�x�����r�n�sN4w��2����HMf�d#F���$�`D��������N��,�s�Zrz�X��x������������3D���K�]��lo��7�F�\5j�l�����=��	��P�/��z�.����8���UkF_r�	�G,�z7���(��W��o\$��o�^/��r��G�sio�,�H�U�jz�
�#@*����^?�����?���|~:�!�c�@����h�&����n��?}R���P��*>���?HP�,;�Xn]P�^��
9�}f�pZ�0�7��k�@�
"�7Z\�}QW��#b��2����c���c��P�-bu�
"����(��O�^[����0{-�w)���$�����L��	B��A����n}�>K]&e��"Osb.�>����z�8�e�������n�'U����.�#I����&
�A�h�<e�:�>��5���i��q�����<� �b�����.N1:3E$�L�N�C6��e�{NMl�������Vgcs�iZ���Ir���Eg��u��	^�-x����Q��\�\C�~�T���E����MFil���y����>]]�EEz4�����G�(t_��d$���AU~��}��0�{q�f������������N��$��=��\.0�=a������W���|�o���ctD�`H�0|#�o�F-���w���:�����M����j����F�kR_�h��pB�$�d^��S~�n6��������i��Mz�2C�{*��������vq��r�����+HJ"�z���Kf�r	.���,	���>���FiD[�O�~�j��8����� ������}7�Z[T��V�H[���]"�>�NT��QC��)�Rv���>��)���$��lOV���7���x�����Kwj���t��������p�=�2r���=`di��ra���(�������_�M�x����� ����v4t�k;�-(m{]t�\��ClT��^��Rt�m?wH�[�2,U�hMsn[K����o�ky��T1��C���@q(N:��[�g!��N�N��.�,��l��p�:������uRh����f��.�b�[�s�\���'Q���~��N0���K������C����L�
<��6i�o���P���I���Rf
Ekc/��zm1�A�b�����K��b��/w�F�;`Q��Es��5�@rcxz���_Z��U���%���rm��h�����[��#q�7�(W0S���)1�QO�� #r���
yi�PA�_��PB��<������;#�����yQU�3��?�,�/_e�����`z��:GNr����zo���{[{��z�������O��_�P+��w|~~v��I�>d,�\N�e8
����M1#���y������*\�RzLVp�������N,�:9�W�-�������/<��O��t@H�=c.��f�����t�;�74��N��P`�i(��Rzb)�8?�7~���%e�����"��1��9}#���1'7����m�y�R������3�29�D����;TC>�pBa���:'�a-�s������gj�[w�ulr��k���_<�]5�h�kX�M��iv86l�����/����U����;�W+N"d-����������0���x���i^��5o'R)�6��y�����M�v��ukr,�:�������bdF����T��-����K����u�@~�J���H�R�O��M���J_��n��@�]��H_�DuBg�^�'�+�I�73@��99�S�6��On
���ppr�M�
Q`���D���z�U�;��(�F5�pI�R���8@�<�������~7�nbMcVN�����C
���J��@�pNMs�y�
J/
�H���T$
���f7��$sN_��Q�f�-!����=M����5��R������R�>�6kp�����IV��$��2x�y�}S_��+H�@Uj�E��M�R�D��]�\����7�)(�^���'*��W{u��[v@)l����"	g<�bunJ��%�-��%Ce�����TP�@�d{�`��z��h$��=����Q��0P��m�]�gT(
s����EY���1��>Za��������mZ%y���u�c����N���]K�����V>/=������=���+�\��1�����&C.��7p�
Y������V����4�����e�4�������Q:
A��v�x^7|P1���Q��
���Cb��k���]{{�[����L��;��#�G����G��f\"�Ge��I.�l<�����_��bh��������!(��tvztL���E	����u*\��	mX������s�H�I��mI_o9��(��{�3rR:�_5=_���}�-B<t�H�k�����#,�+u�k�h�4\�*�)�m��)I ���`]�p�'�}X$)�Z�3#)�KU�oV��<�A������Jt��7!��I�����V�f��	�$���(x����7��������"���G-UnJ���k}i��]�C�r��,�d8I	KQ�'"fA�V��bY�w�PA��k�;��B<�������<���.�j���������W{��uhD8�|��G�X$5�k��F(���-eU���#:����t��`��',���"�\F*��y��s�lfV�-������*���6�����a�}y�/xj��$`8�1@�Cdm�Q��(����[��.
C��;J�}�;m��7��tn��`!5������@,6N���M�o���5�N���
������$�S�\2w�� �~�H�F�,n�Gv���9D)�[�{�/"�����6)2p�[w��b~5|�����������N��)����s0bUhB_��O�Xb.Zf�g���Ing�>`���)�/l=_oHU�;�!��8Rn�\�M�U���k����d��<����B��=���X�~r5�{�cB�F�::�~8���Q��S9��dG�pvrz�@�����s&��iZ>�vm��������u�m4�����q<u���*�w��Bu�?Z�o.�2~�@�v��<�uW�~�m���~,���
)��w=+o�N���8a$�1���}'C(��5Bi�Wj0BT�������l��B�K��Ry���~�p��8�S��3#%�w9��Wq
��]	����|������=�ly��	�l���'lF���E(%��=������u�����������$��V�{��5� :H�������q�6K
/�3'��C�dT�@���L�],2�(qeq��%�mD���$�sG���M9��.��c�C����bL �E��SSM��D�h��s�k0S�u=p�W��G2�G��z����3P��5���jK�Bl��7
����:�`�y��3U����a�"b���hE��_DK[kx��S9�|��z��_����E2m����k�_K���<��"��^�3��I��k�ZRR���r���������w�:�������=�0&�g��~_�����y����P�t�w���NR��N j8�bW�,�}\ 
�&�J���&��
4y����JM�@���F}��4b���$e�3�[QD��������]�e:##�:�a�<r\)��M���������3��\r�����Z���g�z	�9Q�3�`�{w�;p��h�R�.`��W�
��Y:@����#i�q����1��Bu]P��*�3�y&@U����64=���@Sc�<z�oNI�F�^��{�"�/Yfa��1
3�^�f�KYQ����16����Lg4Q���.�����S1��%gPy���N��M������$�,�`	��� ��i�-���	�3)�����`�3�4}R�T������G�EEa��z��@W�TJ������
�c�l��?�l�C����X�}
.���������`���L��p��k~������]wey��Z3���m�b�����?��A~	(CJpy��\I�;"�=gK!7���C*�tJ�u����E������g��g�
��%��)�&���3���1������a�~�#Z�����C��P�/s��p������X�Vrs����j[1%
��j.�:��*%���^�:����(����>�0m�{8������M�(��?�'�_�NL�t�z�,����wg���9h	�$�w�d,�|���/�?�J�a��l_�[���^ow7�\���jaN��i�;�[H3���M���G0s�~�ox�0hk��B�l� S��O]D[3�/����t����&v2� ��
y������r^z6���������Bg��`��/�T�� IU_i�\������q�@w�%/u*k����0^kTrqb<�Jcp��I����A����>/�[_����5�����Kv:^�Q�I~��piP�I�C�@����.(�5~��[�������:���8�����W���4�Gv��d���Q����$��B����(jY��d�_�L��fro���Fo{w#�Z�i��������Qhnn#�9��-����E�����ok��k]�0�OxvaJ}s�������
x�t+��eK�����]d[P��%�� �u-�o�����H����+�E�I$qn����Q��S�=yqPp�9���T��]Ni�k$^����:���'|PjM I�k��$��^�u*�=J���=pj.9��� ��rW�P/c���g���t:��hq:��r���{z@:1(�0�6���g(���:�7�����_��[~���W~[�,�O��I�9s���Q<��,^��fsk'Z��lv:z�0�l�����{��J���>��zP^��>x!��m�b��!_�(}��KF}6�K�3@��^R�X����:���9����&�'�4F+�����%��-��.cq:A7�k�Ue�h�z��,�T��
OIcT3�q��u�0���S�*k{�*w7��Yj{��s7����
�������9���_��q��[[n�bhV����&[O�s(�)b�h��(��'����>�W����E�y�j5�.q��&s S%�Jj�J4��c��Yy���+���� jq�*��cr��0*������V������!h�rD��/�G���4.�\5���lV�����s��*�p����d�'��Gk^;��\8���o����4Z ��I�^��������I�c�`Glz'��bl������|O[`�%4��B?f6�>Vf@���X������[JO�z	�!HWH�`��Oo-���A?��F%�����7�z�
(�#J����3�P`.�	���VH�`��w���?D/(�H#l�����A�IBvQ�n������x��4b��5b���"5R�����V[L����,�$9O��
1
��v� ��r���KF�������x]��jG��� ��N��k��J��3Uw���z!��_Q��{/�(����GU\y,���5c�"���C���sp�-�G^���F'��������S�B�]kf���Ts�.��<�N;J@�zv.�zK�������"�<*���J"^^�\����<�zi�x�(�$�Y�	E�K�'����n��ns����������?}B��n��IUq)�g�Y�+��0�U$����N�|��|���YtGo)Epq�^���`��������Iut��S2�����;������.����mj^�������=�\��i���/uA�3���]���Qf6d���Gx���y��]"E�Y���U�k�a�J���c#�t�� 6P��,��1wF��n�A�4�!��g��
��HO����^ )��%]��h�o�@o���� z"�wP��*wk�YU~�� �uF�:w��o���(A�I�g)���+��I��+6�5�H���5�
�-��#B6�(���d������1JP�[(
b������y4
����#a��U����,W*��q�������vx~
Hj��9T1��^��u1�Q�q�����`��h9HR��A�E�+1@u����&��5�H���(���U�Y�v�����G�_��UM�_������!S�8�
��*$��I����7�{��:�������w�z�nD����B�����:���������
�J��\���0E��>O|��V`��fk��,�ljg=��2Y��N��g��������H�~�>\k��	��K�Y��q����T@��G�:A���U�)�����Zw\�E�u"����
c��H'����j�w�@3�~�3��*L�.$����ppxF������0��hPG3.��C2.����tZ��R�f�����
K������� �{��rW����dy��ON��o{ ��-v#��&�P�a2����sb�s��P7=���N�H�����Dh��I���m������H��7!�1�Fq���o	�F��T�V"��~A��Y��w�9>���{ ���o��B�	:�I�0����C�����������f��W�w_�_�M�Q%��/%$�0�%X����)e`���?��3i�_T������`�H/��j!
a���
*������s�`��������2!�-�u]J�Y���mP������jv�P�s���)I�;��=2�����l\���l�E�P-����"#�P_��(V��<F*0��8���*A�a^0�hQ!���T2���^���Un�����z�{���r�1���KUAFF����J+R@�H��Z5�){�y�����������{zv�����:d�%���5���S��q#^b�"n�#�'������v���1�b���g���}$�=hx�m��NF��F�?*����{�{1�^�8�1����'6��e�������\hi�m����>_Em��\���*�iU�g[��R��^���T#9~N�D�P��r0s��B�;8�pN	������,�������� k�����j>��x;YQ���?����p��������9b���0w��%)�{�\%�������mtE�:��"�6+9���)>�W�(���~����o�n6�r�tEaA�KA(���Wa��)���e��
V70\7��K�K������p*���R�����m��I�������EI��T�)I�Z�T����u�j�@R����^��h��J�n�=�UJ�-mGj�jR.�Sy#}��9{�ox6�����2I��2`v���}�E_+�
�(�l�����~�hm&�Y� ���KY��	����+�cS�}r�)����������.�+�����]|�K�Al���������8/&Q�*hW5���c�@/��������l|�E9D��$\����^�}�l_�m,�D�zZ�E��������'�qK`�9�7#i[c0����8ude�0������
x���@e�������[$��~x������Z/Y�zP�m<)8O�O0�>OXhP���9�y�qA���6OQJ&b.����(��m�	<
R	���e4u��d�}�yku�9��6"AlU`��nx34�D�I7��}������a���@^`����#=�DOn����c���/�����{�u���I�D���u9��A� �;�"I|S�����>l�a��/����>��fSV�B���~2���n?�QCD�"���qo6��79C��L�4�
N���p�W�c��W�;���)
�P�������=�H�<��}�������/O 4^�����_���Y2K0�q�;�k�Z>�X1��J����S������wY��0���T����H�Iz}���[ye����S��n����z����X�(���������mn)v{g�v���UM��{'�F�#5-��L�Bf
z������P���xb�	�=0�bn,T<qH��y��B~��V�Tl�����W���e����^f����Z3�����h�4���_u�o�j����
�w~���z�S�z�i����8CH�
�f.�A������0����n�R�����c���������.va�*�$o����}�'���
�#����rr�&���=�
,�,Z�Z��\<��1���E���S!�C����]��/DKZP6o%���d	T�y��!���%����]I���+�P~
���jQ_�e �@BK�j�*�}`L��!E�"�����u����^��������K4\v�����o,w���^:8��s`t{���'+<o�A���I�@���U�K2^�F�d>3�qj	����J���EL=�g:���w��a���)+Y�0�`0���IZ��$��d�M���6�@���B�����'Hg�:�OP�[f}�2,V���+A�A$��,�������7A�
D��0��DXw���s6�/��Z$I����^.�*�ix��R�����< �C�!7�>����\�|����tf�Q�D �A����*f�Q��0��g�.�q�d��1�s�g=�D$[�������D�����y�j%��`�TH�$Q2$c
�
����pY�1��92���n
�=�l�*/;(��3y�?����'C�@3Qbl��'l��
��3s)�m9��_��0��������a�Kw&����S�1�Z@��"��>��9�4�r�g��	�M���t�E�@G���DG��LG����?g}������%���?��/
�(L�t25g��3+g��.���^4��]�3�/��P�*���4*�dr!���S�f�_���(Bl��@��$lA��y�{j$��0�i<"({���e-2b�M�7��?�1��G�����]^�Q��-8����"?x�ww� olu��[����8��za�5`-�J0��FW"�r������&%l�~��$�
%��0a�p�f�b�t�=���FQ17�!=[2.���9����YtBXx�X @N�X?���b�B���&�L�I�����u?����S*z;��j�a��+#�;�m�2��X��k���'��
����d��5]����~��e��I��<��g�s���h��E�Y�.�!��~���5��R�lD?�j�HZ���I���E���;�$.v6v'���cl�)%���x���
M��
���s�+�����������E��$k�#�/�=�N#��n���CBF�k�����U�����B��,$1���H��8�$�������*�sLSs�L�	������C�� �{��|�y��������'G�5����8�@Nd����n���	Ty��[����g�EN	m%.���^����n�vz�ngi���g�C��$��X3�/�#��9U|C+����D������h�������6��
A�&w��!���U�b���A2�O��cg<�i�J(�J�A-���I���@%%�Xwi�������a� ��\��p�7�h�U�������
��X ��)P��t���=����������T�X�������c���'��_� ;s���
���T���9\>m�d8��N�8>��@��#�Wz�~�-BUI��(cBA��/AC�<��n��!t�'x��Mx����S���[l��I�w���z��`�n�����3B^� z���1.�8�(���
���rI�"�
�Fh�����L���� 7�s7fuI��N������'R��2A�����KED�*�f������(.�|=�o4�K-6�)O��0��M�rXJY`lW�!�X�l,�_?��5�H�A�KZ6�VLM[�c*��0+�U�b!�l�b���4e"���V+l�q/��L�]b�Y� �f��!A�� D��WK
��R��E	
�sK��0������=�7|�g��G�Y���efB1i��"��'f�^��vl>�r7�A�"��}��H%y�2)E�g��������t�e���EZn�%����93���Eb's�6c���j��6�����"�`����2�|s�#��;�T`�������%v6L����c���P��t
mT��<HOj����[+v�b�z2���{rp���5"��F�\R>��si��(h����HlF!#��;�=C�H�$�x����}K��
�'�������7���C?��Mn��d2����]W�����b{�����Z�`s���P�;[�[�I�nw��p��^Z�\`������b[�x�����6���$K�&�	GZ������*�s��z��.�:�N����2IE��v,�q�9���Z�s2�yj��	�.'O3��x�fD�?��-�1�N��#��[?k�y���;��8�0D�Q&��"�A�O2�{��0�w����c0���_R�hl�$��,+j.��rg�9�{���1��?*�}.n���d�
����2W~�"��h:<��7�E���c�uiCy�
���q�0�U�oa��_�rE�-�s��,���-~���@)��0����� [��I.i����x
��^\��=}|z)DVe��~
F�h��$�_k�.��Q�F|�>=�!��~X����6D��2�#����.���|5lC�o�������ft�������d�M�R�-�t8�4������FR���v��������;�}(y��V�������b�exf�0CwI��/��pt�FH/
��~U�Sw��c6(�<�y+fkj'�i�A"��yz� �Y��8}#�+�s������*�����[/�~k���m�X��\�?�D��eT=�F�A�U��?��h�*aM�|"��P�Do�2mM��U��E�}��P��*c|� s��������Q4�����#[�a����qg�=�����ej��;QsHR��ON<��v�a����{�����]�����eRb�P��w6w�6��A�K�1��_���������Vt8Rn�DA�T�'���T� ����{S��H"�P>��8]2�"�u��s�kt�X�������wj��������z>	&!��� ���F�V�
��t������^g,M���5$u�:���_���F����:/� }�Ps���L���i��a���8��f�.���p`���]������ �;�������<�lQ������u���_t����G�E�0y!��$��;�,3
���9Y���`]�����M�`:�g	����kA3��0��g$p}����!2O��V+� 3BU�#�+��6I�x@&�A�8��*�����p�!y!NQ��iH�:dI@<Q���!����+l��K�P�# ��P7���ZW�*��>�6�������f@k|��SZs�TL��p�)��.���S�"i�+4����(��0N�p�����v�{���K�Ge*��(X�Ye�D�G��#;����b���V#��+�
��oC1��Z��S�E�:<��Z�B8l���a�v��mq��E4�H���|KNuCV���A<!�:m��X��x��<UsN���������k�p�|��VO��*paJ� w>�"����e��>Ea�va�#C��8PO3����n9r�&����t�Tm����������	l#��q����M��q.�d�4�*�IG'#�_�����r�7X=1u�����������yhW|��!�zy�EZ��JI��$c?�L��*m-���z���q��U1���.L.h�@tJ���!$����s67�5��iN��VH�'��ZQ�'v�q�0�G�fE����������h=yK������@'�)��fT�3EGI�����F�X����l&��a�f��d�|���j��L�q�J�j]w��/�LDl^�8������:h��L�f[}^#.|o��9�~��7B4+:�7� 9%��~(mt?�ym�xc�.T#\��Jd ��y��t��C�+�R�x������&����S������&]�g��������O	��DEA�fJ�P�Bu��M������(�����L9��4�A.��f�����;�K-�qUY+.�L�fO���!!@+�t�}p�
���5A����c|U�c����_�r�����R�m�^6�Kza�!$v��u
7;���H������9�v<�k��AN��|_�����DNF^��%����s� �a�t'..)
S<��aW���A��-	�X��ff�����ia�W����^8�Vq�E���7����u_/R�;k���W��9�V��t\������{t~r��8&���������s���:���������3�����s���Y~���a�v����0� ���D�� �Q��pM2Lzi��VH/�����J�e�x/`�uR��*3r[�����������x�zl&q��R2�x�%���v��b����ta�^6�]8�;���N��7��7�Di�7�K���q�:a����o,}��Ri��+������$�&|�l�W����j�A"��}�]r�xQ[sh)B�s�l��6�6v��"�T�0NYfH�(�����~�e'��*����/X�w�O��)Z7�����t��J�( �"��oQP�k��������#k�r!�������e��2����F;LR$]�a�f�9]6��/�0�BA������;3<���������S�m�~�N���T���-�����Uw���Z�OLW��{3��;����r5c��	i:��F��_s��O>�0���l8=���H;lq�� }n�����C?AE��X�oq1J��A��64�p�F��d n��x��!A0���UQ�1.�
�@��jW��[A�r]
d���(j���mG!��7���Bd�{w��Ox� ��mw�<����v�+b�V<O��g�Q����	�Ms�{ =�g��Z�������N�dYx�����/C��9���O�M���@��F9%������1����������TM������y)�#�U������-?Qz	�@�����Q���Qh�9e%y�w�%
I������tE��$�����	f�����b����s�F\��
7,��.�z}A�����T���*R�Fd*@���	�E���`` O��;�����U�i�/0���y�G�d���2��d2OE��jdX���>�Q_�lo�p�k���-4���
����Z[xp�V#�]sx���]�H�q���x����vcn}{������}M�w���N$��f�6e�a��	�������mQ*I4�^�}�������h���A�w����^��=z~��2"P�^2��{���Cf���%��\PF;�$���n)������6i�l
��w�^�����_�k������w�i�
K�����|"a�G����Z�xBAn�~���m�w���t�r{{R��K����=��t5����0y�=Xi�dl���g�_�W��i
��>#����2��$X��@f�q�8i��u���|��@��E��Ac3\]:�����v�LE�pVDl���	���D���sU:����(S���F���bC�O��I(#$N��L�SuJ�ARZ�O>��SB	���G�,C��i��Mx��D��(����B5*��W�!���A'f���P,u�?!b
1Nd>��2M�����-�O��4Z��)�s��������
=����{�PNx<�2N'	�3���P2Rwf���	o\$fBHe�C����b7%(��x��P��vv6H�������XW�!��kFW!X��(-��[����g��U�(�5 i9T����;*�#���Q��)�T�	#}N0�A?	6O��c�x���rf�~�;�F���6[���[��{����f#kr���Q�`f0����	$����^�(r�S0�6(|�.�WrQ�\5�"�^��@0O(^Jw��k5C�*���K'����\��;�7�C�������{�.*�#TZ/��B>4N% mi>�:jV�3��v=�$(���U��nt�2���O��EO�F2��7��z�����N�z;^>��{Y����H*����y��i���`�jB|��IB��m���3T[����Oq&�qkErr����< ���]��V�Tjt~�Oy��2�bb�6�[���~g;���)���H.�x�#��M�G�6�
!��������y������g2xsSJ�e�X���[�c��p^ ����=~oFD���q��g��F~"nF�-@ ��U����Z���W��Tm'�y�SG�k���-=����-e�B��/�2��F+�iq ��}� 4���8������v�j������] ��[;�,$H�������i�u<g��JR�o�W<RG�����=��$0���;�;y4���b��{	#m9.�dn��V��������rw�`����k~�e�a�}<y����]&0��+P�Ld	�����]s�I�ZPz]"���uP�1��H�E�s��6+����,@� h��x�NVW�|4@���I�VTc^������y�-�7W2{V�������I�-I�\U�-�����s��+n�s�1{�*Z����v7���?�n�� 6�����l�x��M�~X4�t�����*c�����#%�R�nU�%R�u.�p!?���H�4��z^�/������#/O]�
�e����f\@�0�� H'�NP����>���(��]z�������J,}���:t?B�k�����2�8�y�@P2�
\�����p�PY�F����a`R�������T���`G(o��f�(gB�B��<g1���v8%n�Fu�	�c��5��E�6q[�sb�,�d��.��Ch�[LB#�����=��B���������#%�V�i1���d��-�2��V� *TXk���{6����9JV�u�V'\�Yq���tC�{okl����N�� �y��X��X���8 �������P��
:��fd��lhH������,h�r|��g'����{������~;���_�������:��]n~�O�����~J��iI���!����8���-N�pbP�C|E���$�)eSHq��=��
B�)O�q>2"bbl���.�`,W���R�p��a�3��b�����KY�b���}oe3
��I��&�"�4:�:�*Q�qn[��z���%'������u?�n�v�4�����A����M���O�{�<�g\VB7
g�=������+������!m������`$�I��s8,����_Y����5z�0r��Y:��<�s�i����~5��N&y���9�w���������5��9o\���&�
�m��>[#��N��&`��@�<��/)Qe��(/�!>�(�P�2e1�qfteO�G���Z�w�����'\Fnto�kO����z6�8?�-5R$#xM��p<e��lIM���f�u��_$��'I��~Y���[�������9�8�i��w�-�G�>�,����3^����r��K�q�4FV��j�����Le�e�����������Z%Qi���Pe���V�F�5�F[��;$=��/�H�)�
��0�6�0M*1Ed�����U�I���l�z�
fk>Em�9d(��U����x��sp����tt� UJ��t���<�	�r�N
j��V���-+��%t�e�mVV
o@M #j�V�����yL����!��k�.�*��XZ������)��C�����R
����C��i�S��Rw�*�y����ysF)����'��u�
C>�.<K�q1M��'%
\Q�
',RW/���+�j����o���{R���?
q�O�0�R��fo�?����?l@�������N���v�"N�)������Kq�%{W;{�v{����S���x�?Wh@��m�sn�c�y���=yJ?������!���<[h#�x��U.x���bx�y����f�6����A��WF@���o�@an�R`���?M�T
���n�_j�y���^s��cS�0{�3N��%���0;�[@v
���5��eF�V�7BL���y=�~OF_4�Sz��tA���kJ%�������GB�������U�J(�_b:	ku!�-��|+�w���l�����U|�YL��|=�k�%g��4��[���7���qanOG81�H��.B�&��c���B3���0C�p��R^9"�Iad��vt��5ra�V^A�9�
�$���t���h�nL�}F��}������s�������zd���k��j������������������z��]�|���������19t- �������lX{M03����!�o���p��rt�G��>��1<������fH�4��iRQ�-��yV�����/���W.uq�t�����2��SL!����� ��>���������_���e���������6f�����pA�Qe?j��j��6�+����.hs���F�om�KJ���m���Q��&
��������]7�W��ys|��q/~9<?V����l�Q����q~�&�_��/M��hz�����lqM5\�c����.<����������w_�~������
����.o��9�a��f����{;����������o-�����Y{�)�5�]�?`�:?>�<��rr��3���
|N���������.�� �R�G������2(B4���_������2w���3#=��o���c���n���$nB�
2��.�A<3L�����������a��^�qzx�Z\��h�[�p~|Cy�z� ��f�����|��v�lXX��yy�h!��8�)��O9���dK��&'�QT���?�+�-N��?JK�K����^^��M�Y�qw\��wm^���rJ����(����W?\jW�����l+}g� #.�~��>�&�fPni�D���0&N�5:�(����e�RaFqoo��i�;�d���aF�.*�P�T�J�0p�����P����.wWP62����	������VV�FIE{�a������#�,�}��+m���Y�)&[k��hmr�L�� ��CYr�������{H�4UBP��t��Y:q�����0��*���y� ���U�����7�iW0��:q��M�t��Yd����,,���ttVJZZ:*<��a
c|,A����f�{*K�EPU��l�nY���k�D|W?6�f�E����;�����X���		������d������a����Z(�<�]�+%�~1_@BwW$��������x�td�><fsX������fo������Kv���(u0��HN���7;���_��M���Y2KD
��w<�N�"7l!n ����(cM� NXs�9
X`
}Qr���T��
�������uQ^K�� K��2���3G_l���W�#$�V�
qE���mj�R����J�H5TN���jQ:��~��#��4�H-V�e-9�,z5Ib��O�v���U�>������/����
v�(����P������	^|��E��C[�aKV�ulB*k��`����a^_�y��Uja�����l�@Tj!bL����f���\j�������V.QP<���ZKc.u�
�����@��k��&����A`1q7X����@?�|n; {�@`�f��H��o!�R�s�=�k����/�T^���Vc��������we~�9�(��j[��YX�����x���w4Z�db<�XC��(D$?���J�8����gN� �:{F05��������[WW���v{{�`s�zs	�����]#R4�I��.���]2�`b
�W���>M�pk6#���
��d����UF��
���:�{d�a�'��x@$��-z�������>�f�$�'�W��$\���f,G�b��������F��=
���k�`�)����3�'I�`HO��4��V�
�T�f1��9B����4��:ax'n�dNHUeN,W��%�&����+�`��fx���/���M��o�l��[��q	X*�G��t�f1�9j�����.�����S����'�\q���s"S��SdN&K�_�%\{�V�$��������w�J�0�J��H~���& �_J��*��D��%�p�#�y��C�u����Z�Dpv�4�5
S]B���]�.�������������J��I�q�<m�Lm�����gp
��OX�?���_�G���b9=�$��Ia��QC,��aJ�G2a�����7vo�����}`?�-�:��O��6#a���0�
������x��f�����dQ�����������sO*-�*��N��=��L�C��&z�~^/zxMD��on�w���q����]���d���7��b&�?�4�����_�/'3�n���
�1E���>Q_�m��C�������F�H*lO��T����V�����St��t25�p!Z+8�0
���?�����'�?������'�#u6)�C~.\��_O��0

��N9��T��L����D� �j����n)� wJ�T���$�71
�Fn�J��������������������t
j1l����+����f�j��{�����Jh�W�6�P��4j'S��L`i���dKpH�����Cu���}'�j�l�O%�u���X�6�qr�{a�4z�d��F�[/1w���9,���Au@e,x��Q
���^4~,I{e�VV	4�cW,��<�B"#��4'�_�����E�]�d�[��`�O?���~)y��x��5���tjb��:FD$/]�lv�=��g�����mIU�F��F�J�����������d�"S��{&��������)��?�z�D�3���W�h�4yDL�H#N)-R*B���-�����Pk|���\65[A��)�XH�dIa��"t<H�8f��UN�E�d�D`�o?Q}5DR�$��j�I�����*��-h�|h�Y�T�}�����3A���L3��/���/����gW�e��� i��V�t`4F�[�Z�O%��L�l��TU���-OH	OU���5��GFv������`$��=E{t=�9)�"
��:k�U�&	�<�x�l/@�XA�S6Dr[6`5[;!�#[B1w5�Q%]���
%[�<E_H��@�zSs�q.�s(11B��iox|���	e+����)k�FJ0j6m�(����}��B4lXu���w;�$/:���nE�����"*�ur�=����c�����]������dfc�����m�E�_��/��~�'��!V��E
�R++�%{`���^����E|����"$�8���0�K�G�����?����7�*1�����kl�jv{�~���no_�����Z��b�N��F�p����p��pfQ�6��� )&k��@b��O���(�O� �y��#��~)���F���^�B���c:#�Zg���
>�@3Kc��`>&�d	�����
���`��|F�_9=^�sY@�S��a[pc�Y�s9������K�J9���NV��:��D�}79���n���{q�?/ ���9��7c:���)�H�U_!���iH�fg�+]u:�p�%������5l��t8���6a��Z�h����n}ej�#�(�	�o��e
w0��^N�]�P�D9���R,7YD��c
�2��p����p|~qrqy|zt��yyo���B��A�0Xu
����H^��-��������z��S&���L� �<F�pH�j�PA`�V�������,���������,�	�f�:t{��Rx��)��Kq�]���q80B$��{C��t�>�J��'w����������>&_��y�%Vn�	�����<Q����?�+@���'#u����a�^Mo��&-� ���n�/�vx�sv����u�^x����V���n'�����2Na���n���*6�u�i>���A7CIs	�xJ����KM7 3W��q���5�,�wG����&��_�CA��(m._]P_�2P�_$�5/�vC���d�0�x�<����G��HG��DY@+K�t3K�|��YT�1�������C%/vJi:p-rn|�#b���PxQP��tLr��i&���IK��=�����o(,4���)�����m��U��
��h|�'�9�8�$st��~k�7���l����D*����B�_d�]x�:�������m�B��
��(4�����M���g�����o8 k���qC,Z�7x����zeB:�S���@Hu�%M����I�CBF.����xJF��>0��`���zd�D�X
!��%Z`���-�.!���ed���T�5�x$�(���2&�7���FJTy<�����27�������gr{ce����gD�Q��a���8��"s3��(��g���*�^aN&�S� z����h��5��2g���E�?m���^<�n
��p�����������[r@=�7�J�a`!�kD�m�������l�r!����'h���{,Hw�t3w��m��������[Y�n���o�����2���
�&�zdh��}��� ��#9�u��G���d4��z6����d�%w��+�|=����,���z��gvk�-H����^�y ]�\I�Jp����%���m����OON��-�H&�a~�x$��/"�\������#�EQ	�.�GG��52
�/y��o�-�w��0�l�If��0o?Z�����%���.�������{ll����6�@xe������`n���+�

�Ol�E
^e7�](��O'�{�����UU�yd����������w,��:)���`o�g#��`���J�����d����3��O�-q���fE l%#r��Nb��o�
�@�����_����g@qB-3s�u8�wP@�
����Z0V��1��P���Rx����u�KI8��)p0����$�H$�U���M�mL����C9�ib��!�JpK3b��T���m�S���2�����l�Q�`QE��JO�hJc��u�q=�o��$9T��)KmMO+�.JI���3��&/��Vd\R�
�/*C�X���b9u=��[�7���S2'�������Z5�J��Z}���m#��8.�W<`x�DC["z���z:�����[��
9��>��y�U����\��2.����?��"%0��0�5c&t=l�����P�d~�XM����.��It�wG�?�"����
N�?�<S�K������,����(��.�����
�	B���~>�K�����<�
�Y��m����`����~����������t���U�q��s���HzYhA�vdG�#/�^������m��:�JtI0VK����_�_a�����L���9��`���������@=��A�[�O�$�r�50�z��ch{�rd#~P�����,���F�]�_�Y���u�P�H�T\3�A�
q�����r��]�$$�.�leRW�?��;C={@�6�JG�F��,���c�
�u�k!q��Y��s���?"��3>9�jkU�0#DZ������[]��Y"�����*��S���qe*�\�������uf0��"[���%�XM�>����e�����G��J���c�@���������T����H�v	��T
�*��U� �U����X�q�A�TW���sU���!""[C�$����$o�-�q����I��T36����8���!q,��Je�9���;><��~T��9��R9��Zr�SP/���
�4J����>`')������T^URTy�#X�RE�$�	�j�n��)e���)"^3p ���'�{�D�Q��,a0#pPW���1����o����$��A�&X������`�$A~����g��rt����z�:�Wa�m�0����{wGJcD��1�����@
��,]���<'�w�>��{9���r/��|��Mo
O���4���4M*;���S�#%v���|�(�M'�Z��]�v� �S���|,���+�P�z�����h��Q%����F��7,���q/�l�b"ur���������)#H�5O�<��>r����K��l4@�$�a�f��G��*j���dw��z����8^��0!q{�R���P�����o���B�9(�5����a�`cm�T������M`x���f������ ^D#E<)��;��i�m�Z���������2w�k�j��y�s
�9O������.����6uv��f��j�
���$�FiE�A7�����U�J$��������$B�^����	��K�q���4�rZ�}��0<%��M(SR� 5������H�6����H�m�m��[��z�nDG���&3ij���1�7g�G�]����p�������d7��6:����2[�\~TtOn+O�"���hh�da���-��t�w5:ZQ��t��������<^�������E�.�VaU�u����9�Hx[I�p
���4�.C�����K��gZ�|��D��y�
����L��y��?^�S9�P��;�8�S�M��������(Af'2D,�9�t�������=�T�� �
�H:p�a(W$��j���
o�#E�=o�����*qnS�k��#z����#��\xr�@7��5�C�^���N#�t�V�K+����!++�|��?%�V%w�e�e����v� ��A|�X��;��z�T���d$��zW^kv�+u���(d~L�;�
f��6MB�rp|�
,�z�](<���'�k[3�@�R��]��d\�Z�I/���K��|�������`S#*C9 ���m�<�����o���������j��� 0
� Y^t�L�'C�3�S����$���{��]�/''������X��YL�T����`i>������lqX�K�:�1�����!�X�N@a����b�I8�8���eR\2]�1g�]�	�<�p��K�U8���A���mF1����-[�kyw}HDB�6HA��q��fq��	\�������c����A7��f#D�4l�5{��:]����"�������	��:�T�����'P�y�����-f��~9�
e���� �89�� |.����'���e%B��7:�����
�\Y���$;����[X�s�m�2XsE�� �U�]0b��#�j:pC|�:���K�_�b�
��k���^�"yY�k��R�|W��g�tt��������h�)L��A��j�;�����~���������967H;���#����-]���2��+�������
60o�~JdE��)(^�e����Y {�i�!:�v���?w������sz����������9��'?p!����:K;:�������;Wq�}}�w�����Yy��6��z��&vw��5|�6��F�K�I�m���7z�/���f�j�I�&hz6^�;�V���K�?�VlI������"2�9���|�=��q��(�!�
I�WX'��V�8�>�;����9��?��3S�yb�L&<Y����	����w~�����������/�
�lS	'��&}|��{��{~|t����e+�z;����d^�X-�,lQ���P�	��b������,>]q���$�a��E�v�j� �N����I��I��:�����0��^7��>iP-�y��-^��Rd-2(�x*��������
�_LB��"��O1��z�l�\��������;��=���E��.t!NB@��
�6JG"�F�5��nCj��d����`�T�C���s��V,1H�=�%����������_O����1�x^�rv�&.�<tv��~�k�@)����N�S��A��e�g� 8u�B=�?��b�������g��/*���\�<<��^\��=}o&�����W�������/��"VqPLZU�=z����DT-��<���U��y|�!
>���td}L}�^~1�u�o������R�K�l�R M�)�Z+nj!7=����������i6.�?�4dL��2sY�E����O���8?�pxN<��(>8���!����v;_M,���FS�����E
g`1�P` �����O��C�t���?k��k��
�����������(�F���O�x�����n�����)EZ������A���\JA�m�(e�� ��j�w������*�{[NEp��\���[|�R����s\6���R(�)�j��7
N��4��x0��&y�B#��&���0vo
���%�l��Ao9zE�I���5����h>�`~C4�D���Z9�K�*�<���3~bM4^�=a�_�<���i%���J�.c V�L���
�@���,2Y]�>�n�\�C���^�'�g���Y�����\,qS��#y@�Qq�G�	��_�
;;������ZO�s���d^2�}s��d#�O��S�I����p��H&��ln�=l������aa��[/C\K�/��g;]�����TCU�%�=�{H��{�|c� |������t���D�)lm.i��E�lO����CCf�+b��tN���$;�)~����A�2�!S:1�x�0��X����@rW��N��O���V��$H����1`\P�CQ��lC(���x��5������?3�3/1'�:�����f9�����~|M'����2������8��t
].[���j���mk���=	��]e���9�s'Xs�X=�����y�����5����T�C��Hx��U��csCW���Hn��������F��\H�����#E�'N��
��64���D���N���q$8�\�R���o
^�Cd���X_Ci~c�5�Ttw2�a��Q*o2L'�H�k:&�4&`v����7|?�����3?r�M���U��n�=%�
eyPZ�R��/�ib6rIq{�A�'�%	��;M�:��)���h��[0�o�eDw6����'2>\3o�	�����4+�h�u�	o�E=z��B�KR
���z�������l�su�T�5}_h#%mH_��4���j�(zBh?�����C<�����]������X������s���phN���[������1jX����]S����s�]��(��{�[�������-�;�,�R[��?&����Qy���&f��)�/L$k�yX.�Yz�$�������f�T������E�������!���9�Cc������v)�so_���Y;S��i��v�l��t����������������mNK�r���D����
5�����o�����%����`�I���0��^��[���#�GO`�%�Z��U�.�?�����@ ��V�����N��O
-��zq���R�*��+f��B����t�6�����bS8��rJ�.���	�4B���Q�"�}jG���k/�Il���LH�a'��ZSH�s��n`p�D��Q�HY��������]����tiF���(�D����������+����Z�Z�����x�J��e.����-�������u��q��s��l�`������^s��ug�t��7*]pR5���Z,A�bp]���%#4��x=��#��9WV�#6�l������y�jjv�x}����������&�,�q��3~���I4K�y.�U�:,�v���>l���^&�Y����>���_}C ���D�F��|L���2���n��JJBi��t���l(�]��RVjcx[����U}�k���0R���z����1���T�;��2�W�I�}��X��8Y�'�nIu�s�oQ\;���/���xDv��6{���v�:��S�b�>����bP�=������Nx�����E����iaY"���<`�HO�Rb�c`�����>��Z�N��vH'����gT�uRvj7�_F�c~���hSnt��J����d�5&W����4D�k7;���-��t������g�����t�*����Z�"�M74Wb��[\m��*3�2�)���S�b���m�X.k�b��B��ZD��d�5L���}�>�6�����N)�h*3q{!
|���K���f6	������,�
e�~N
w7�ioa���<p��vcV���0�V����\:�r�w�`�LN��_Dn���>/4�k���7�*���7���o�.���]�KK1j8)ed�6t��,��#��d#&�a<�*�ET�,�������[W�s�'�R�
�Z����S�,��Ef�Z�����cn}�D�8��3iF�i�r��"�Rve ��4)Q�2]s�B{E��y�1�DJe�g�W�����E������+5[>Zh&;���&���
�]�I<1�&q�����F
��(P�q��2���$6�����K����1.��N�B�@�F�#F�;����|��h�������Yni�X��_���<.f���(Z
�.d0*����m����z�y����Q�������hU�P�B�qI
#-K01
>��������b���\X������_�+K���#���Z�:� +�r�2l��aP�����~��.�e�
�K[Qb��4|�*��NIV?������<OqU
K��n��`����7�CY���������i�����m������g2h��;�s.����(��8��������.�Q�{�0�.=R��a����x����}~|v����+���~9<}{�=9�<>?=|���p|���7'���I���!�m��oo�v@��]NK�D�>��yJH��:���G���q�
P��!�6%N����s8���$i��*��G��7g��T2��)����<��i��������S��/k�hHp�Z$O�A��f)a�V4^���I�"
*�=����;���y �����EC���8���_._��mh����<�d���i�}:����"�Y�-j�T�#0���uww6'|���=I�}�b�H�0������ie���D�9��F�h��9��-�"���y������&5���k8��oy7�kv}�7ph��1�,7�G�E������Ac#s�u��5�]������#\����y�!��cl�)S�W��ch��kABx��#"�p)���Z�FW;��r��w����>/�o9���I�;T?�H(!�H@e��|_�|���)��%��+`@�V��S���MU����[�C���k���]b��/Z{�0B���	�����U����u/8yt=������3����\%�]4L=���TsgWTD��3)�j��QC�zO���/�o8�n�����}�Z�vPp�z���|�=�U���0x�����2X;U��2��q��#���1I�<s��$���D��0K�,�|Hug!���p�%�����p`+;�X5v����aO�6�d��&M���0��P�a%�����z�@���O#�vP��D���E�!��
��U����T�mB]��S@��rQ�l����C��y��~xs���M�(%���eR)����2�������O=���v���LWD��1�2�H8	j4���THKa��X��i�!�R����m��_~����B����c�
5�(H$�
�z5�;��8��u_l�yS$�
x"�����������hZn���S�-J�IA�Z}'�s�p����^z�_�#�vaV���l���[�
M�0?HbN9��y��`�-��w�++$��e��)���V��W��0����/
��'i�<�G�F��|���|���6��Yb�������? A�&p)ZTe%,����^^�U��S���r�[0Ey�d��f��t�G������F�^;���U��w������F�����@z�_�$��r�}^L�U���~.y`XS�H�������|AE-���*= <w�����������,.������m'����������T���>Q�q��@������M��L�I�)I�dfI9G��_�O�r&�e �;�������	j�NR#��|8�����g�	�I��p�c�T����l�b�+k�������������I���)����)�����IF�����;�@:��#9���-�3���I\d�KQ�1O9h<��$Q���H�EW%BWD�Okm�)hXk:�0.�kz��A4G�.>����i� ��1��O2R���/1:�[�������l��t}sO`!��$��'��m{�����BL��G�	P�����s�2p�"��d�A.��i�Tcyk����Es��0[L�
�x���2rY���� �}6f>E�^�����l�K�O�s~����^��D���~��C��ex���(Q=�_�������Y����qn4Rb��J���C�RiZ������Q�<�&�� �l��U��k�Mr��17�;�Bt�R!JQ�3TY/��r��cB����<`$��u&#c���prih��a��-E�D�X�d�ePUA��9���Z��l\��<�����	?^���.�Z�����9�+U����SLCR��68���)g���Vd�~��U���rF��0v.q�{�	E��dB��`��
Aj�"I�b9V[\� R�5K���w�Y��'W�a��{�]���z�a���1�O��,k5����]��?XeI�tgX���d8��_���Q\��w��� �!���m����HH��*�$	��c��<v�I	��2#����R�9��]J�Z3���������u=���z
���1�X68�J.~+n�� �I$�9���)L��N�.����xh�[[�����d+�7��56*��9�����f��_��E��"�=�A�HD����z������?�	�H�����i�<��w�yoT�������������lnm����
��#�Zl�_$���B��o������(���P��&���Fo���A�r��A�-s�n)'OP�lJZ���`HC�Z��O���T��u�
1�����R+�j��]v'�������!�,���
�d]\�ys��-��N{�g
K�D��w��y]%_�?S�*hx��o�%k�H��YV�\���Ia/y�yu��y�N��9��k�;�}�����p�f8�V|�����N��j����Jj�q�^�&�tD�
�m���!!y�S2(��t�e������sR�6�:4��T�vd�/�;��M��y{b
PI��u�W�#���S8�k���H{{_0�n&DO������q�,?�P�O9m�2#���k���$g3<�)���m�I�bd�fb'�M/�����+���U����yA�����T�Q����Vz���nc������gC��B�)��8����_���H�?���������"��Ps�
�,yU����]�)@��'-�����^�l�B��s�B�.Y�������v��EQ�c�������ru���Z7�(���h���#oE���������8���P�E����C����Ez}����^��A�_�q��;��KB*#�.��$�`3%0�-2e��?��'���c�7�������:����z(��
mhS�w��$Ros��y�r����l�du�#��"P
�^�'�1+�Z��^8���NL�R���mW�������C��`�z��������Z����0\��!��<��1��z�M�t��F����6���"����	81����(8#$c�a�o�����3b���f�%|���S	�xd�v.x���8X��������(!���p�&�TL�����gbi������Q���$��A��k2�1���v�zF*�/��3�m1�PM�L2��N�5�Wv7b�p����6�H`�Z+����kX��:��\���,��o�����.�G\s���K��� �
?�b,�#s0�1}O�j3{���JYSg8by���t%w��`��\\i$�;��9<x�h����S"�B��j���K:j���@/C|�aO����g�qo�"'k*�h��H���48����]�>j��u��`�u:�\�9��.�n�ly11$����H���I������/^����������Z~	_�D�Z����y�����g�$�,����-����_�gN��.I2F.�����6n����hh�@�W����G��:Wv�l����S�b�
��{Q�Y�u���\Rl�S���&�x8�S�����y�V�������214��d��Yqb�G��
MJ0i�"mh/��8h���#������ %�O)���^��K2�M�l�����Q1����
�u������T^���7����k��2
����������������'^PI!Y��Y��1��F� ���������0���-����CK�}�)�}	P�����"�p�����`��_>�;�o�~c,��$
���w�rg�=Vp���r���������`���C�2�<���b�`m��.+�[DB�������w�����R��fD��}C�{E��"��F]�C���^��M���,x�l:��~G�Om��g�+�Y�����5 ���h�������
PO�����V#`�����%����2��
e�s_d'5�M���$�8R�x���(��
�*F4���wzI�F�����:��mLl~l��W=�&�V���`Z&�
y�V��JU��8�j��b]��
�Wl�e�r?�*��t��*��~(Ia�v�/4�i�nq�s��
l��w� �~_�G����E]�u%{�r��=����B�\���$�#�����r�����.�}����T=_��T#��*�.�,r�����7��[[��ng����5�)drF�����p��X�r�8�<���R��l��������@I�-F�������YZ��(���
9
a����8�(���0�cQ
����@^Z5��#V���IZ����?H��azs;���������]������H"��k�����R��F_�^���3v��[�DF��pG���9�F���1�������~�DB��LT���Y?_9TF�U��`t3��De����-0c��S��C	PD)�-��MNq�6�a]X+?��w�S����m8"��jv��&Xw��\��pN�y,a��U�l3�Z�%��]6��y�0�M>"x�<�l�g��p�a7jY��@&p�S��$�1��&�n��!�!��p��=w�
L�����v�|Y��L�&�R��n+)rI.{��^�������)����/?����5����
������,���]�~�<�o����mM�T���$�R�9/�/�[*����dx���������?�u���Ke�\*�q���FH����I�<6)���b�0M�"+?�J3�2��Z��+��g�	�v��
��������Tq��:k�S�0����I��RW%��.�!E����r���o�5VW��_���J�H������&���&�^��4?��nx1K�*���B�t�����*z~5d��^��8)���=l��8�H���B�h�����8c�j���@��|�Gma��"8�C�8B+�R��kE�5�_R��U�~h�*�y����~4J� 5H�:K��~,N����3�I��K��D�1k��U����-�����d�5���+Kf�k$5�.V��y�J{{9���Y$���P�#��~2��A��wl���`n�>�\��V�[���CQ;�g����
�������#$&fQcC���r�H7L���R���s-�9tU*�V�
���-r�'�E Z�L�����R�>��CG���j
�"c�s"�z������W�a�0)�[��S���Zo��/��_���w��������n{�|{GV��>`��|�i�f����2#D��
��r/���Y�mex��VQIv��0�
��p��&(y�b�2�Z���x�/�^�����q���<�6K��in�}�5.h���d�����q
��N����,��x2��jW�A�!�au�b�^d#���J�tY�h
��(?A�GB�{\<�����o��7Ej~q�R��k�
�#pA�\��������].bPS�&	"�b�������"c���K2��ZuDw����;���AF,,���,�/�nBw�^��tz_<lS�-����\6����]���|j)+R��\g����]G��
��'���e(�As�:��x����c�q[��JFfX��~����}���Cz�MC���$��E~�i�VV��\�Z��F�Y�8{f���h&���}0�|'R	\���w����Y�h$��	�f��,���%,%��6Qq�D�r�����`oHi�<v�~����.�������Jr-S?M��H6P���)��D���<QE�?�9\��[���u!-�>DU^���eg����i��`~��%����MJ"^�[b�sXx���2��p���]*���	������2;���C�����(<���wN�.s��#��.�y�z���9<fN�VZ0Q�]p���)�[8_��S)��Y���
�J�����z��6K�'�D
���Z�H�)^�E�4��B�j������S�����U��H�)���F���{����2��P���9���+�b=	�ZIO���P�$�\��FL6�k��j5wq��X�!D���WZ�,���	��^c�I��HG\HU
�w����}$lXx7,e$O9X{���������q����6w�^�wU����v�;_z��E��s]��������X��� � t�&%?DW�{�nz@���c���;�����jm�f�Y��x�y�77��;\����
}&"F��A)��)�s!�Hu ��'@Q�]:"N'\����5�JC�cb]��<��,gd��$�wj��	eI?5W�l8���l~B��<�������T�&����y[h<$g�����8N&P�hN�$���,�>I��nh1���(0�w��K�������M���{��~E�@������@��}���c�S��jw�$��]�B��� -"�1�������!$l�E���%���9�3);bK%�\@/r!�r�9��~�;\0�tZ�<P��D���"`��4�)KF��s~b�)D�rX�A�������{S*�RJs���fh�����k`����-��B���zq�mm�!D,
M�%��R���@b��/�b�%�Z�|$��U�0���������
U����)u�dZ�v��[���������kmFr]�B���(���5d���~�9��fC�|8u�q0&����.H`]Rg��ff?6���Pv����_�9b�"-U�}a?���7M�L=O��@&kP@C�$=�x�{R���MHo�{{�����	�&e$����)�9
X�g}9��c��d�e����)w���'&�
���ydv�����y�K;��{\�)�l�&S&#��hss����=��ln���H�H���c.)���-�T�Y]uU��������l@�����N� Z60Y3����p�+�����&$�I6�Ll�C�dg���1�Me��^-��x_Zv�������FW����8���������e�1Jy/l����s�1d%7������'�5A!u��S����j����0�Y��%�I�$���`cx*21��RK(�J�OJ��8��/Bg����}�@V���/�|%o1�gU���Z4%rfP��,E������3O����]B�RN��fq�/C���-��m�FW��Z/E$�~j����� S���N���g����4��8��QWk$���|A�|�[,�9���8[���~d�P���!���xG�';`e�/`����{���� ���VO�"I�>-o�������HE�]l�|��LX���`��)t��v��Zx)R�1I!t'&d�����V�=*�����\���R��'�n���)�I9�|��������^�m`&�����$��������]�S���������vd�O��5��k�4�Q��m�����$��|�x�3T/��K-�y����=��b���wa�DM|"G��YH	�6EC89��Z��b�t"�2IN���w���4I���=�M����(��RY�H(QjE���d��T�O����8�����l��7B�f�s�~�E�CW�9x6[��i��i�V#q�<���jn��,L�����ZzA�{g�W�~���1���pw�F%B��`U�����B��
���|j�g\0��u���� Z��$���U�8b�<�����@O6aq���Ub�$�;���7t��=����h���E��S�5����5���]���5j��w�L���^Rz��xC��|�WQ�Po(���/��SB>�q<IsW��a�lx�er�H��:v��\���p��[^[!�`4u���<<� ����.��;��> �����������($�jd����N�aIt�z�V����"�Vp���Z'N������=�z�x<~i��k@s��|�,o��j���t�C�-�$+q��MP�I��-F����g��T���~�a�|o�Q�������UM��\��
���j�rX/�k�w�&vy'���9G9h�&�����(���#h!�^�2�F�@��b�}����yIu���A���J�c2������}>8fA>u��`.H����,�VM_��P��g�� ��o@�V%�a�O��]e���� ����������iX���J=�U���B9�s��WI�-�#5����L{��|Pfa�Zr^�o)�-�p�����z���v���hW�*�(����Y=��w���wPw�W����{�W>
��h�on%���!�n���%�V�:���50�K�Kq?�b�#0\���D��H&�P������(Uv�9$�������8��@U�9�V����4=��|By�	e��N1j���G������RU����r��`
�L�M�	�h��N/�<��A�\�8�Q6M{��M�bw
y��t`�y�7��4ca�$c	e+E���)��]�)�|r�(�s].�OZ
�\�;�F���Dhh��5��~m�ML�1��o��$�g���&����{`b���Y��fan��*l4G����
K^����y;�v������+�����������o]d/BA��c�rd����Z\��K��4#�b�Tu4%�B1.��Y�t���f)��`h	V���'d����'�4��D	�����eh��� �Gn�V��gcp^j�dWO=O�oZ�AN���\,g�0��)(5�j�d��:��O��^����w�(��,	U�+�4P	1rO��_of7�'Fo�����7���~��g ��=�'���H����u!g2�:�q��!�;_F���E"�t{��V���
�����Md8lT�9�H�
�"����f,�`��s����7~	S3+^��\�<��J�t4w�����M��=HF7����aFn��#v?T�pO���h��B.T���$������wG2��������y2�z���$�j�:��G�3��/����Z0>+P�?|{rm|��<�<����j�����������]��L����~	��O���g�L]��&����	���*`��X9[��=�{-�:����b?��'vK�MBB��*��&���g� 7.�U�0'e�y����2�4s�*���t_4����O���\P9�T��?r�Qb��}�o%��|�������R`�I�k�J���'��������G�������+����}���������}W�w`.X�!,���W3��W�3�YWF1�
)��?���QGB��G�Cnp�E�C�OG�0N�x���]*�������q��%F�H��Q�A��I�LZ!]����{s"���F���e�k�U�P(�8���4���P\���=YS�)�k6"qO�xN��\�f=�)�5�'O����Ds�~���b	��;r��]t������U�P��,���r��MW4��vy�����c�z���.;�0��:������TD�F:
�L�Q�DcL)���5�!�2��GA��(Q����K��������Eu�x������A�����=��F�7D���\Nq*k�XO����F�u#��<ov����X�����^��-E�����Y���Q���L�X�i`�w�j�Xy�@Hm��Vq��T���p7�� �'���-(�E/�j��	Y��DHx�o�tT0�M.!.#)�	�	}��D>RA��~��pLk��,��4�SV?$���c]/k[c���U,�]X�V�(f�0���&i�S������EJ�,�k`t�+��4��2Y����<�������a�W����7� =��>��FQv%�&���p�5O-l����b#�90,@cp�Z����H+o��(q@�����|����y���i6��qYHe7*7��H�X^,l�������n�����\�i�E�����T�'RU�a���hA_I2��A�)�cBL��^�D���S������1��lu
�uH�\TZqP���|����b������������M���}�R����f�+-�:2��'\FuvyI>T�C��y�i�l�i	�������S��.7D0&G6|��
�w�UTd�����RO�4�����������������XO`�����}�=���k����tq��y��h.
A$\pUqX������u�ga��
g�r�*�AG��!P���^��S��Qo����{�%?/)�~��(k�~3��%������{t~��9jhk5�������k�4a��q�G��~v�����o]����
o����4��f)>�N=��`�mAL���'�����*���BAZ����������-|gm�"��[noK��W.���
-���w��!����9��:����$�N�y@?���p1�Imi��Y�Q��<����;�<SM=a�C��w�
c�S*��*�K>����{��4��@����^��-�N?3C9>���a~;�x��~��������_?���:9=<�W���������k��o�7�)!b�����}����B�
��y��q����~�Z�R�������u�ZA�<���`�IzwB�-	hd]�:$��CB%����_"�����IX�l;Q/�����3��e	��D�$��O%u��j:�s��[-F�:�X9�����P�XKU'_�l��ld�l*5������@B2�(�l���n.��;[?��ij~Q���Pe�d$��M$�g+uf��QNVm�:�������Rv���K}�3��`�-D*��m�'�����]`%
���7C�E��	���)�%��-<s�����wJu�,�����gu�e�X�DlK���L\�2D�.S:�����������$�^$S�d_D1��OX>W���b�~��A�)��d��	��x�����i��:P�{v.�K�~����pE�TW�|~��zK�w$��0(�*���i������i�/p��F�A�9�G��U�=a��`���b�\q�,�/;���.�*���$N�~����
k]B�y([F��0��"���[��������?���9�C�:�4�V�4�qP�����>������.v�C�X���R7��@U�QP@j���x,�@�?����:�����@8�+]�m�`f3��#g�&C3xN/�H��)�1�P��R���OR�cC�h�V.v=�n����Z�����X�c?�Ry��4�H4������b
x��l?��d@Y�4X7����$����Bx�?g�X��w�h:#P���|DZ�m��
���Oa6��������
�)�K����j�I�v���;I�����E����xT:��f�����<��jw�\*]��B�����]mS�kD6����b{����7�������q�����_����n���e;\}���W��?���K�����1��Ks�2o�>���\�o��+pv��jh{���f���p{�'2�N~K	P��io����;g�k�1���$��J����I��No��;�z��C['�?z����bk	�G���-�a�}�B�tM����&��YNCr����2^n�������$�o
�X�&�sH����%�'�ra�(� ���t=D�p�_���P��&�f4�[of�9�$��]��>�� ��KD�0��]-h����&����PWD$�Y��'�$�i�)����$W�b���f�Vj�@�����
�}�����)�O]r�-��T�c���s��<a���K�B�R_�������t����R��K� �5���L|���b�+.H}��y|_o�W�sU��������;�*V���y�+��Z���{���.r�|�^���#�Z�-�'�G
�Yaf�@��5Q��b6��Nz���]~����N=#��+���BjXB����h��WjA��z����,��������_���x���61���#�R�
����%��;���c���/D�d��|j�[�v��^�c�W��?��S��rF����g�=�V�(Ggb���	����/�����s*�����:�����&g@��4C!��Q�(W�i��
�F�]J���g��J�����jK���i����$I�#\�V����	�����F����$d4��f�n��C�y)Vz���)�>4k����'����
��S�2i���r���d�#�<-&$F�{�Y�������4�1��U��<�<kW�3�6�ON�	cx4�l�+��h)�-�!���/��`W������������
7;�N�v-�$jl���
v�?�6s������
a��.N�w�p�oQ ����Ey�W3��#~)Z�~�Ul�l���Pd�����TNG�a5
�m�M��C�W���a��&�����?Ds>j�q1��s^�
���Nz�������H�G��R(��]�7���c�i'��|��,i�z��^(��/�:(F�#����q��E�����mC�1�pwhTY�JL2�$��
��8k���048�[�J��q]W�Q��L��qeP<ML�|�����{d���8�#T��dzZz\�O��u@�mT]h���E�K�JQ��\ ����g�n�"��B�%�.�M���KcX���l����C���H%d�z�c<����g��S�������u���i������u�����mj�dt�:>��������f��X(����/��~�eO����Zt��E?�ld���@X��<�O`���S��:T��K������N���-GE�b���p�nR��M{8T$@L�) �����M4]��L	$��1�&�S��gH<AkX3�����t����G�)���^.s���� ��x�kY�*jXl/{P��>���%��vHK�
y+o�`��&@/�c���]o�n��4��4���'
<�U��EW�"��K�����m�����U�`k'������^m�Z�������=�\��m5��u�a>(\�G������]i|�i���(?�^�=T2��t�PwF^n�Sa�.R�"C�a�m	jJF�{sT��H�s���q������u�6_�;f�N?�H�*`�G�<�\L'�G����Y@�{�{']�������������w�����?TM�������������~�LpFU���a�G='����k�E���L�'����
���T�����8�(��QVbS�D����z,brO\������G�E�B����T�/j����E��yW�S`�vvw�.��,���^]$�u�>��YB������dy��&+RB!�U:��[d N&�n�jk��Y:�<������?l�*���n�o�����77�O^�����z��[x`��{\����q:�������hnm���e�X~{�R��a�����.����������<��<p��
qq�"�S�B�6�7����+^f�s���w��m��Z��wUS��q:B
�0��c�|h�[
*�.��xI��1h^�o�e�W]���X�&o=�T����`�Q ������^+�`��� �+5U3�����Q�)��i9���ey�����(�����o�2�
�/;=��f����-���*o��'��+���*����2b#d��M���I�.��s�t�Vx�W�}iB��s�T:!��V>TK��������7���;��R�I�ge����7�\�7*8d���n��_��v��U&�f��������F7+W��k��)��A��L��JO�Kg��miL�v"��99�=�d4���]��+�Y"Y�FY�q��-��BX�����F����A� ctp���2�#�Z-�Q����T��}��l���p�������o�B���m�����8������U>Zq&�^]�*rs�)<n��2i�se�W����MQ��U<�P����SU>��a��B����������f�}��7�^���C�Yl���d��< c�0�o�7��,R7>�����������{x����94����Y�Y���4y{|�=?������Z^�;�������-7�����l0X�yY�����J�U�����WB��%��W��\����^f��m2N�]�:��:[{�����K��!h�H��*��� 1��u���h+e3O�
���Y��U-��g��0�p��W�n���S���X��!{���-X�����DwC����3o���H%q����$�r������dJG�!��"��Rd�,������B't9/7����yj�N`�XGa_5������RL��]/1�}2�����oon%�V�no���vw����|�������M����]����H���������/o�����:C�~b��L�8�����p���X����e�G���w[n�L���}�!c����q�o-��e�FC*�����.�V�W��%�e}Q�2��'����O�����g����]K�mM�C��@0����o#�j�_��av�����7r��?�C=�J:�k�T5�P�Nt���,�
��-
�b�T��*@1l7�7A���vt��y3��������m��	�eKg������F��NQTW�J�����?��H�v
6ul�vn�}� NG�OR@%���d'@�j�s!�7'�?�~a�@�9�m���j�v@E=����\5����r'�$�p@����h<I�����D���cj��8
���iw�;�\z���'e���yp�e�O8�`j}k���hr������)�F?x[t�� ��V-]CA2��}��AM-2����[�{�,�(�X����-NeI���z��(��GB	f��A^��������||*�]y`A�����H=gx|������oo� |&�����^x�xv��}}��0c�O�fv�D��U�v7{�k��|��l��y�)�h(����cyE3zl� ���9b&��.�V�������r���>�_�����������8��h���?������!fO����da"���{���(�����J��r�%Tv����0j/��<]g���3������?�x��~��nF?�������R#��RN%[o�������.=������]~n�>@�E[�H��L�4��m6���
����J;��A��
������L��(#��=�K�K�u�2��:������%im%rt�aN��A�������K�����R�mI����Ns�h����P
��4���{���b�����A%V��@��������950l�&wt�b����,��?[�|1\��/��%:wz���{�^��a��/|�����$��,���r�6�����*9�|[`���"��l�eq��)@�&�.{>B?�}����+0������tE������3��0�s���-W�@X]0Pe���k"�2
���@�t�"$W�+X���W�:  iZ��+��ioM8�� �9��
��~�k~�!h�p��%�+���U��+�����a-�T>B�I�g<8S�z����N��'�^���}�c�*�j_��hwk���g�-~�!�7�����m�s������Q�}D?��B��~�{����Fe�Z�ts������M�X�(���P��0�AA"���3&�d##��������*��\�S`)�:������+���J�k�*,�����6�*�\!�?Dp�'~R'�W���J��'��)	���2��������$�������{���d�E eV��z(q��w����o�`��X1�,}�?��g;sI	r�V��`��5��lU��J�)�G"���A5{E�� ����,������B�{� Q�'y"�|Q_V��RTY�`+�BvN�;w�xX�x��E'l�h�rJ��0���9��k�t9NM$^�i�k�_*/�B)#J����mI�.�����w6������4+P=�!�+`�?����*�^^�^x��B[����EO�yz=�\�C��W�lr��f	�`�M;l��e��4x�����w�n��/�Zb_	^��q���r�)e�)%M����a������	H��G��<0�2��Sz���`L
U��9����rotlD�.;.Ox+�/�X��4h�j���h��sR�`��N����d�B���������0�in���u�'x���yi�D9���F��N
�U�y�JlH��U�/���Vp8�B��=PTaI��+����d��(%�B	R���i�V@�&96��V#!Z�2W��T	�<,i�M��
�������Q�C������JF+v�,�Q�@9��H�}M���@�*�0H����L=�t8��,�"���K��YBJ��~�I!��]gC^�����9
��#UB)K�Tk���&���2oq�m�6gQM����}�%��� ���x
���������+�f|�k[�������w��=h>����=
h�:M��K_�2�+:[8"9K�\���N&�Sm"�b����u��Y�z��'��*9r�t�H6��'��-v#�|��Rb1u��T�;��!-[����4gO�H4�Q�(���'T��HR�P1��_�~���_���\����$\�5��7wff�cyv�".�����JGU]�dv��;���-��:�/P�H����>��xxt��8���4���?>�N��0Ey+F�NFy�~I�
��cb��!>���S��u6l$W�Xn4L`;H�!W�A1�P���u���y��e�g.'��+�Y!���C.���jU�P�^�
��C����QL��l(�?Ip�����V�Z_��J����`��1�F�{��{�Z����i���}��w�no��;[WsL�]�����B���y�b�1��\�G���g������oH��Ns|�^�:�>
4�xX��p6��Y��zt5�$	��H�
�@g���9�����v}M4*�E2���F���)�0I?��(6%�c
�@����������i�m&>��a2� PV?�������M�JM�k�^�>�_���|�����<?<�X���]����I���7��!�����!l]��y2J���rT�h}�[�����`���� =�����7*���[9��{���';����~�}���?��[�����?�~K:0[���q�I#".��w#nJ
����������:J�]�%m"zCN�1�r�,`��g@4W8n���'cPM�F��Gu�H�]X�������kx4�.�����wF�u�i
�P�V.�����bo��f�B	��H��>��p~v�Sw�x��<csM"M_����ib����-��9OoF��}�r.~6K+U��ly��w�g_Y��^����D���#�.�������_�lg�r�[<AX9�ei�K�=6��x�9�B�S��<H��r���'�g��4O|<y����0�d���P�Cilt+����W��������q��&��a��yf�Y	��.F�T�J���t1J���<�$��;�@�O������1�z�x��*O�@����bX�E�_��W�Qf��1�����emvT�>di]���0hV���L:��itTC�dJ	�uU
\���<�����N'��Y(4���d��{��3�yO��c�|1���C�W�]H#�"��W���'[8����3f�����3�Q���^�����Q�6��ud�~K��$
�
���}��+u��Z���^P�S����O)V���&�9�������N��qQ���7`\�,���w�O^�;��C��_����+#4���������N�/'����!����s�Xp����O��d�w�=�������)�_r$����}�����5��2�
�U4�007��A1����lh^���"N�����Hg�:�,��q�KiJ���	�e��%�,������W�����/g�����ON�r�dS����~��F����X�o��Zf�&
��rE
\��������B�@�Y?*b=~k.����_����eL�21��y��i44<w6q���u��%_z��{��M5d�yU���&�6����I7[}q:���w��s[��hu�Q��[���a@w]�+wT{u�W��{:J�k�MQWhe�Q�8���:s_T<%K���3�q�n��q_T�F����~\�����#���%��?�������o�����Ox�?�}LOX@\I�:����^O�������C(��b2����eh��m�"j[V�B}�E�mY���*�wu���nY��~K���"�5��d>�0A����c�����p�qEy��h��0��&����<�#��������0#�c���[b�������7��	�a�p��^v����m�}�*j����^�K��;Y8U�k��/]1t�F�>8����#�/�*�90�?w�+K�M����|!�*n'
���Gc:#S��"nP��36�_l�!a��F~Y5/�G������ �k����^���<�_��2�'��w<
DM�B�x�Pa� ����|��!�4?��e��v���Gs�L��%s�Z��T��K�z;3��6r-�	�T#M*'l��2���2�� ����+�pK�C�������=;R����[H����L�	��dm�
#�����lav�(������7���mT#���%��Z��
;d�5{��t(!�j=%��D��0<��M9 ���|ZlF=Z*h�K(�+��($`z�/���y�?�9��a�J�Ew7�����I��7�<�����e�l�gM�k�=��-�X���?Jo\�[T
���Jh���S�W�����E��X�g���~o�A�����0��dX��a
�F��N�t�g�����]��_w/~}en�?��������'�<%���.c��!���q��Yo���D����y9fU<�5���'S�L��~|��4��cU��ll�M*��M2�w��!�����h�*��
�;����mT_��SY<w!p�_��n�������}��=�KDFZzPL����V�Ns�S��Y�`��*�2�oZ/�R��_�����w�%��z�+m}s�5/*��/j��x�Q\Q�v������6n�P8����{����V��dg�A=o���r��Z������z��V�	���5����)��x:�{�|Y�PPD\3L�[pR�V�(r�����xTz�C�^���9���c���[[m<������`��9#^�/*G���)����L8R�a�9�g������1!B��=m>z^��6�+xB��m�����u�f[�}Bc��d4&���)�2��w^����������)�����o�C��*������cS! ,�}
�s�|8�H~����St)4B�$�&���H����;�-�Z���
�BW!_@�@Z���� Z�R��Ta�nw��@�ou1�%����T��:����GS'��1�M���;I�A�[S@�!���a|c�6�U��s�L]�����I����U�h�1�mo4;��m	N������;r
<�?zB[��P��s��!�q�+7��
�d��XM����i��^@*� !-"�d��Y���?he�����}�^�YS~��hF�~�<z����_��l��$z�86�Q�]�am�wK�,
����>h�	�*�r0n��T������g�1���A�������b��L�6������`���r��]��j^��^����`����3d�����
Yr-#<���
3o�����o�����o��a9&^F�
�H��{�h�Gyl�p��D�
aH�$_��t�N��2_9�p<:�0�M�|}?����S����:$�D���y.{)�:����o�e�,�G4�y�A6�*�� ��^4�?	g��7�
;��W�U�|�Jv�]���*���D�~�,=�/������J���p����C��������J�����<�}����[Fg9�'��#�f��������y"�:*�I5$u^�$X/�����G�.\����FZ��'@������)!���z3#��N�3K���E��]�{O��#v�f�q���G./��&��l2+(��=L8u�2VH1�<��H�X���
1G*"��.������.Vi���HS���27�W�h�����ZB��J�:x��kZ����e�����]L,��f���[d��Mg���HJR��r��J��'{�\���2q������$�����tH���^���w5������?W�h
R�R4��
���0Y+��m��YJPc��YE��h���>��k�%�f������C(NZ�^\^}���������n?���W`m���@[t��i��r$y� [Z���G/&���_g���6��@����!.������ID���Nm�N��e�'��@�Z�������6u$��@<�F�}V���
�'��^x��s���*�t}@�q����}�-�����	����f-��y�@ �%i})R_��-��_�����3�<��9B��'h�����!��/�E���L�.;�@B�<��p���Xut�>�O%�&���)>%��k���U_�Q�U!M8^;K5��H����Z����n5$�h4a�/���%`,�`}��1DJ,+�T:��o���L�8�VS�W!���k��R��X.��n�P��]b�W����`p��B0/�u5�v�Lr�o���a#���X�!�&�Fb�"��n6c�+�-�>p�4b�y���E;.�SZP�%+���pT
O��Y���#���%�$���S�n����u�?g���1#���s~L�o@/���/�Gw��D>b��i2��
}�nZ�����3Co�-%g�.��X]L��'�����N����`�X�)T��$X��Dt�W}�2��"�x�����0��t9��Gw��a��X�������}�D"�d���0��y�_������~��Y���������X^Q�����'�+3�OV��K�v�&���d8��sF��0�%.�4?6��L)D�R������W��s�Q������D@"���o���j���0����a�lhb�H�Y�eNN�r���u�O����lP�Y�g��g\�.I2�f��
���e��:��NU��?��M$� [���x2A�q)%���MI�T�������Rx_���lb}�� l��#PE�T1��:���&��:|�f�����<�\A84|l6�L#=�n����s���u4�Z�"0�?����p�\���.8B��,�CL�����8�d��&���5��Ts�;3o�����RL� �9%�Ii�V,��5.g�D���d��U9]��'�E �L$m���W�T;�4w����C��`�fk���|����naZ�a*D)�����$S�	��x����^D�.��%@4c�������2|�?`�+l�bx�����<�\`���R���^��xO����E\#�\f	�I��7��OTx�P;)�Nf��{�����*l1h���}�	|������������qYc��3@��Ae#�	�L�%�q��'wb��t8������dH��Tj�i�R��4"F��L�����W��]��(��[d��)�i�)�2��"y���W
��*��r)XL�^'��!��E^J)b�A-$��Sx��v�	���F+xD��Y���3z�f��B�Z����d6A�tO��f~o�����b	bV��`y�d�	�<���X��X_�%��a��^y�H�S|2*������%�a)XS����S������|���U���PDj�&�-�7u��[�����
��)�"�7��F�������v��9�mn�^W��������o��S:�6�P�����(�D��
=��!&�}�=?�f�;�+���	}�����KhEV���8����<1'�[O2A�A%#�2����
 �/6�>�����Q���b�r�����#��*Sl4���@���Y7�]�~Y��-X�omJY��'@H��{	��j�:"���I!����O33}��>�A�?$�p��-�����s�%K%O�M���'�'�'�����{��G��+^"�:�����(@=p-R�*=��#�<O�0:6J��qa��O���.�X�Q��+-��n_w��t�{��l�%����%v��	M}�
��#r�����O�p:����Q=����1S�����P�K� 
�P`���������uFP9����b�`�@S3�A�(�)����N�>�264w6����s����%��B���
���En�P����(&!��m6�\��lL9l�
��s���2�N������4 �_�GfwQ���-�Hm�e9�r����m�:ffei�n
��f���T����-X��k>��t����CQ� ��/���M�1{;���o�����aC�Z(�~1a&Q��8eD9*�]�v�P��/���P"G��P"��`����"��xm�=m�Z��$�z!�_��TFP��\��Tj�(���#�4	Z,b	�
iy+�d�6"��6�Y����	��BMgFH�'�Tj�W���,Gw2Ae������k+��\8K�:��T�ULa���}Lf���m���r�\w��A�u���H�����'7$���x%��B�t�r���)������q��!#��Z���@3�4��\'��J����:�h?X&����3���(`<����G���}I���+���=;��1���V%�(6,Y�od'1�8S�;l�4�j����R��]�x���X��Et	2l�9w7����
1�}<������,\�c�f�D�J���"����#Y�?����~KtUt�{�������G��#vC	�TL�&����%kR2�5����5�84U�6�w��s��:p����Mq!��l5�������b�n�iFY
�Us�}"*2RKc��M���eS_�*g���p�u*@����a��=��t��'����{�+Ul5�r�5d������]�Tp�������q\y�����hk�����M����(��H�n�������dG ��������g���;�{���$P]]��Sg�? �a,K,�Q��Sis��������tf,�W�!��(��X��������ni0���d��7����'bu'R�%�i�{�
�b>-6A:�����/���uz5^�nwJ���E����=��uA�#��{�Q������I)^_��JQ���#����Oc��7�16b�R�:�Y����r_�q��n����"������_R����������*��xe4���m�L{��<s.4����<���S�+J�o�e��V�O����?yb�_�_^�b��y�mnh�
���2��0���������-=�ZZYn9���L0��
�h���������o�pi\��:�E���Ly5�eI���r���
��uEoq^�U��<(�����xG��/WFe�h5i���_`�oX��uix[��o��k~I�W�p�4��o������ZT��UDy���`|�N��
K���_Q���-���l�Y��g�|Xx��s�q�^��;�����}����r(����Oi���<���3B��_��/�*��	I8��+v2@8"�b.�u1f?WRA�\x�;���#�`�x���jw�<�	����F��lc%'��<�����j>��b�!f��0���|F��js#�ews�U�����������bsa���-�$I�hA�����������m��ZI�/w0�������:��@�����u.�FY#_B��IQ(	�$gs�h�����^G�'�H���C�^�O�`�J^���j�M�(*-�0H��m��;����/��LvH*����\���J�fjF��[�5:���Z���b��<�Qt��}���`/*^KI2���o�B��������L[��2H����j�N?���`(7��,Q�%j��N��]�I�L��1KTM1Y�\���]qh� ��'���/bY6I!j4�r����10�[�q��.��)��)�#A}q���gY�6�W���J��.k���`P����T���h*���(�a�d�X��wI�X���B�j�QM`�p�X�C�*������v�E�@�=#�Ht����6w�Q��n�Z���/����Xq��#
��6__l>Z�,������������������2�����q�z�7D�0�}c���9��j���O�~�����?�;8:# f����K�#cr,�M�W�v0����u�%\����Gq�%f��u�
'��R�m(���b�a����$>����(�H�m
��qY��R��Tt��*��W��"
y�A ���Z]�������37���b%��
wrq#/�q ��;���|+�<��8�������a ���Xe���zt|v��)T:��R�W�'���*%[+�r�@:ry6Y����a��'������h��g���5,<)b&(=}�k�@b����i/�<'v��hEDS�i<���j����k��]/�\��&	�R9U$l��pr�5�B;�oA��gE�f�VY�/��W�r7_�Ft��@�F���D��	�2{��I)��g����fT���XT���NE>���;��i(F�g�������(�6)�%�MFq�=����[D���D:����a���b��������;����	��S��)�$
I1]W�L�mU<����-�u�m.ek�e��`��W��9���yv������AHx��	�&l	&���B��->��\]�l�\]�����HJx0N����D{�1n{7�?H�=�����2"�#%����UD�	�^����rw�������*�Na�c���D>�:�$P����U�����p"W}^�������#�-�RC(w��!W!z����g=vG��JnH��@��%�Eh�!�4|gV�r0g����Yz,/��K�r^�{/����<O��k�Z�M���v��"PSy���%�G�)�O����7�H��H��[�)TU4�WBt�~����P�����R<g�/@��25���h`	�d��gi_�6@]�������]����8!��(D#�Q����-2��8���R)�0HG��l����>7Cj�M�sQ�7���3;�3��J)���s��I�	�ah�Cd�1�����C��w�BrSYt��8���3��:�|���s`�,�n���D	��u��@�.9���eF�0R,e�q�c��Pe�����x#��}s���������eW�&���>��j���t)�J�#����}q�CR@dB%4D�!���(�\���0��@������#Q�2s�b�H~O�L��!�d/�f`;�D�[�6�N>J����dn��.a.����2*����jQ��{-{S��5b��	�g����f��iY��<~�}O���V�|~���|�MQ��-��r���������&�Y���1�t��(��7�$���(����K�[Hs��B]-KW�nTo������e�sa���������Y$t�E����t��|��f(���bw���
Z��r'�E�6 '�9Gu�����'�-��g�s�Mb�`���!$hq�s���`���j�y��<�����D5����TXTy3�#�>�����}��
�]�v~�����Ko�y�B�4
���#������A�@��������:@�\��������'��Cr1����"���o�	�A8����#�S2�2A��P{�ix
D.�	V�z�|�$��u����z���bMG��
0��G�M�]r�\j\��w#E��=!��T�V�-3{x����u{?��m���C�o�P��3�0+*u���cu�d��{�3��!��L4��l�����L'�{����E�BI���&������if�P�Q�w0y@Q�������8�&_����x��0��;)]Z���J�B�.!�i��!f�xU�O+*��
g�5is�����i����y�~
��4D�*{���%V��/����{�	��?�e����aEJ�/4���ee~d#����\�#nn��(�.?5���r8�#1O�����x��+�L /2E|[z�����N��U<�#j"wK��Q�	��-oI�F�(Q�������*:yF�O�H��4#�PRVI��� �n�(39����6��c��q��T���27"h�q���OT,����b@��Z�J��w>�"�[�����02�vA`�H&A���D�E�3��
1�Y���lq����������;���2���%�:�g�G�;��hLj�B��YEG�44g�c�����06���G��L��P�O_�>�8���b��'�����*��<�;������
SV�����U	[}k�U<���W��n�,36�&��de7C	�D����8�!- j����)2�P$�t���aDd���X��w
/�	
W�_"��~4�� �q����.j���Akz��[C_W�}��h����R+DI������[�nI8�w��u�tdi�N�i�Nd1��������]��
��B�sp��OQxm=�w5��!�����C�c��C���d��	��6
���n�s��Lb�=���w.2��h�V�=T��*�.���UW�������]������Z:bs��
���"�������^�a��Ce���C��e��`���R�����)i��c-X5X��������K����w�~L�q4"6'���q�i���#w�@)�FV�Ho�0�ml�/���.��94@t'Iy�x�5_��5��	�y��a�����a((�hz?5{O1�S��v�wl�w�&����.?���{��@E��R| B����!!��F����d�����@�bU�K����>f�l�s�1�}y]��������z���J	z.z��g�Q'���]��a[����<G��
�xi���a�(L��{'����d��(`���w���pE���aRC�(���S���3s����BC��8D:[��O��1�_�[����`�gv���:wU1�e�}���)|�G3��
bB�E���_�e0A���ush),k�cP*�s/5�
�T�]s+�an���:����
���!�2n�b������h�Y[�K�����,�A����)�-�i����@8Q���#���$������]��L{��b	RX�O�l��%��$A���"�Dr+{XQ�������W]c�����W��k3�|b [��A����au�u rqM��`�CM��i91��m2��i�w4�+	m$�nB��lT��G�M�w���5�k�Y��q*
�D�C���fg�
�]z��Y�KY���n���d�4�I��������7���e�VM�*��a>��r�%^N�xJp`!�(���w��cR]�
X�`���%�=pW�i
�������P� 9��V�x�xiup�#�C(��/I!�
f�k��"�ct��M�s���x0�,�B
�|����[�0'�����m���]/wP��US}�p�t}U����v�������*�H����mt+GO/���ub�9����N=��\���
;Y�3<������Ho�p>(�#\�d��XZ��������}�U(H�k�Z���ke{V.�����3�&I��r���P�G	wd�>���$�*�����{���y��������q.�1�hEM��hG��c��9T(��2
��0�oX������BBAJ��{Y�q�i��d^����+��9��N"�)���b��������r�lqX�A��]7T^���y��>�L}T"�F���"�����@����uBAI��������	c���;�������?����"s�L��Z_?��q�,3��J����j�A7,n#OC�gHg�V��	<�o#>���D�z�|"�ld�!�^���	�$`1=�S��w4��(hO%|	�Kuq%D�43j���A�����s6B�+/1j
�98NS|8Q����AS����1C�)Mx8�'��@3>E��4���H���������t����.�K)���:�a���������V�z��?�J�p�9��%,<D����������jUQ���@ ��s�z���[l=�@B�4t������'�%���w�$�"�jY~��B����+����B�;QH�+O�n��%�r��ZE��f������Tx	ug	�mBLJIt�@��E��-���B�#T�)�����D:���r�n���Qo���&����$~b
�����t7����m������%4L�q�����5c9�w0��B�R�3OC"�
��y����-=_&��ad�<3JQ��`+��G���?�|�u<���C��z������-��#������GTn�xP�ch�[fc��3p�%��O����#��p�l641�X���{�3r+c��j0�
�3����B`���m��2OE|��<!W���$����r�o6�j�������m���M�F��$
O��S�C���j"e,��x1��=�_pevI�����Po����^���i��1G��~��|��/�K(��y�Ld����`�	�r���-�P���SR�=�UQN�!������
'������R�C���yoG����OD\�hp<�L�����N��Z"�V��kAA�w���a*����,��.�'�'��T����������#!�.j<I_��'��n����7���,��0t������
/~���E�
���[��2r!]<��{M���m-Kq�x��G~�rSYR�y�ntE��Q����$���h��G���&��$�)���a
+X�eU�������&0����^�O��`rB��D�C��dX�~����d�f�~����c
P?b����~N��m���WU���*�ngK����s�)��t��oy�����=	q���`����gJj��hC�TY��������V�_��[�a5tJJ�M�I���zh��O&�d&^���Q�T��U��bB�[��$
�i��t���� �
���9�3\;�,�xn�m���9�g��<�^-�\�OV4�!���]}.��2~�M�UV�]Z�`EL�EKYB6����X�DY���eM�%#O$P�z��#���)p*%����s��<GyP�tb�C����k��
����KR���<%�s/�:Q�O��N2]�ZJ�dc!w�?�tC�u��K���n"���,���M�%(@��-��)��SwG�F������X���M=��(��!��"!D8F��$
+���{�g��3�q������^ql;m�Y��%s��U���gr���I��)���%��U_S��|����c�?�8{-@S��Wy��v�$��{��Dw���������B�^
�G�e����(��=���D���G��Bd	��b �t�p@�����Y
o�8|����������p7�� �6�B&G��R�]aSV����-��
�6��
���G*��9P���{A�����<s18�j���f�{[�8��LKR)��~����_�|j����*\��Tj��G��]�RpI'��� )_P�����C#/�5��,HI����D\4����g��
���,�rr0��.18�$������U�]��:��p���S�Mf��\�C�8u�e<���.��['e�C�u����?N����i<�K])`������qu������9�������w���fx�gEI��������v+�NPh�����9�uQnS�8�����E,���)fQV�����us������
t���`[�&"7V#s:���K���O��T(��5�i�;��^�!"��Ti���~u1�J��7O�zeo�Rw�]�����'������Q�X�S����
� �A:�B����Gt�n�:nh	��;�X6d�����^����Ay|�3��rr���rn��"��Q*=y6(���V�z3�n.��K/���b����.%X��d�L� ����������:����l&�vDC��zK2y�z��(A�LP��uo��m��'%BO]� ������O��[V�a����R�e�'\�8Mr�[t�J'�)^�}x��[h�I��I.���7��`�#n���p"�*���xyS�?�H�*��;?�A�'��1��>{:�5z��B&K�<6;P�3.�t:O�R��%����HOT�~��'���`2`�N#���FmP��esr`�u��{@XM���L^t>�,�����$[nQ�;Q �����q��:���@��&����H������)j���UP
Q.��a��}�o��D�ad�A�D.���|C7�/���M���U�H���pX�,�#2�}�MU���N������;��Z�k�����KT���]��S�pa������Z4}|pCws��w%�{������P[^�w#����}8�X����M	{����f�������g�6���=�=Lh��_�p�����!���L~t�.�Q��G\����1!��X^y�%H7Y?����f����V��9E^]L�j�p����o�6����w�(��Q��TP��W��^��A���8.��t�UdM���{���A\��lFYS�)���y|B��&�z�^Ym���pS-{g��v/����2��-��_����$�m�)�p��� 2��w��2��6��J�L*�E_A5���]�"/�@X��	�(���w?�z�\����������'�*e���b�V?����E�	l"����,�&��g�3!�Y�'���0D�&���7{@��K^C�	���b�r�;�v6�,<}6/��"����	
E����
F�$VZ!c6��t�
}12N�����x��u�b�h~���j�-���`0��9�r����<�Q��pUB�$j�n�`�8Ab5���������}|�k���$���qb�����<�=_A�G��4���EF�R\������3�+��a���L�"��1���d1�������M�����`�j���L�
e�C�� d?UL�������[��G�&B��tuP=rE��s3o86}������J��b
C�c����	�]a@=u�>��������5���OM���\�2���!���X�W(�RDb�*"�_��	����@��	?�3g.�U
�E�9N��H\rCR�#����`�T����m�'(d.�3��'70_J.��������u����)����"���r8B�o"�6��FO0�#�*]c# _��pv��M>��er��U�����N�x��������QJ(��2����������,�n[�~�zz/�=%��MX�����s�
�����#}shO�z+qG�����G����b�r������T�2��`���~��������
kn�~�b��0�I���A�!�B��:�|P��IKN��")1�����JFbS����#����8����lJ!O�\1X�'K��6bk�L"5b�����q�N�#|
9�7tG���m!T�_��	��s�u@yLiY}�.������>HY(�@��c%=8��%2$t$`,_��+/p��q'������d�kX��e1hU���~��b�g�S���$s����(�0Y�S��O���;�{F{�F�N�` i��9�����g[�[��BYtN���E�!*�n�}�c���4�2�Q��m�p�P2�"����m�-e�e��Z7��8�-s��(L���������s
k��)������u[g�.�	0���g��H�����X+�10F���//��Q��<F��	=C|����2Bd5�;W8��_
�c�^M�s(6��������"��EC��ZvF���2g�Pe(`������M�#'��K.E����[����u�=��o��8�����:�j"
_�g���
����+rsF�Emh����FX�.%�����{6��x�����Zl/�p����9��
��wY��q�u�
�7^�/wcvc,Y������3Mk&�.��\��1@�)l|1��!�����8P~�r�*3���_�3'�K,�er���V^W4-:��J��g'��,���G9S��2�37�x�[RF�Yw5���[E�1�����Xl��
��
pSy,�=B;���������A��\��oil4J�B�����}-3����.��������Q����}a�����<4e ��0�S��F�`]�x��$�����Qm����E-\C{��f;���V_�-�����#d�
�R1�6FkA�����Zlv��P�i=�w�PY}���-����r�����e�T�c�A=���v�S�R���R��\WNL��"���T��'�`���-IA1C��6�����m�R��I�7�*B
i\���jG���>���.|�����(
�+i�k%6&�>��p�����%8�^^!2��Pb����	$�t��T�������+��P*����mzC�9��3������sU��w��AeG�����P^�Q<�%A&
�D��rKi��� _���������D~o�wwV�>hL�?\�
�DEC5���^(XZ{��W��Y�I��\a��Is�������OpAY ;!+�NO�p��O���ZSy���)c~�����;e�����1��EQV����JV�y��??����S�5y/*�S�q_'�������]p��$v���I�pu��A�n�U��@�tj�*R��P��+�jt����)�t�=�O�y�y�*����M_t4���D�����_�"����IG%�g�An�x\�|������,?R�n�M�P��[��C�\u
���T����a�GoBH$�o�
�gN�D������jE�[6��T�~~��]��vf}�n-��0���V��95����X^��\8���C�_�^��ZX;
��=��#z�\6�	l������W��1�C"}���^���8�Z�_�[��B0%�B�H��j�!u�l��P�����W~�-j�uT��\����Z��<��gAqu��iW�X��0��n�sB���������K��_�K�K__�>�z8�VOKL������h�T3��d���x[��B���=Nm6�@��V����l���x�w���j�B�o�y%�Ot�	pO�R��h����!��� N��~��H���\�D��G����$��\'�P��y`�?����_����_h5~]�+��"q.���h�
{�W�L�78?QF�2������9I����j����B�q
T�M��!�
��'e���s�]r�r�d}�?Fp���p�M~�
��G�V�v����������3������w����R� _
��q`D��dv�/����	Qg	�����x������O$��_M��$��0Q����z~��=T��gX��6;���l�5�I�������`�M�(F���=�F�����S���3����'�tA��u��Z=�P����#t�����_�x��*���@N;��6R��(Q(��%W�3�:uCIt���i���+�}M�Es��]�XyW��A��Jh��-�{��C��H��sa:�(�g��2!�"r���Y�.�����z�f�����>��V6�4���`�x*����>:�����!C�.R��%����V��8AG\_���+o�+��Ob� ��
�T����������8/[���3�!Vq!��4+�wN#Zy�ep]�����L�;�P��`4W
��QXA(��<d�e���Z�J������)<���"#����+1$��I�=��������������wwx���!���GR2�+)�J%����%��@��H�d'wM�g�-����-"S�8R�jNR���,�y�v��E��$�]F�-i��H��i�����D�����f�-
�s�C��w�u�����k��T>���(L��,����fF����uzu'��4�5������#i��h���N�,��c	wg���fE��Pq��BJ^��N�������wW]r2�s�����P�S��`>�Eo�����c8f	��s�DsX�'�������+�?���3��l�*%��$��p���D����K-�;�+��>^H�i�b�C�~A�+�H�\A����
&#B����\y�~'�D5����;)_�>��!��+�g�� ���)�l�1��oE���x�}/t0#�{`��b9���Z�U���Z�`�����"�iCZ�	���B��izqP��NA�dg�;Mk�����\�<} �am�4�a��Y���Z�;>UL���m	nR��.'�M0���������	��6�m��q�`����C��w���k���\g�;�@�E}���W��rj����g�d�������R�XG/ �\OS�v�_��)��kD/��oc�_Kf+[
V�A�R�V�>�
�"S`�y�����|\���%�|�'	�f��T>/�
��f���h�*���E��}
X��S�v,�/�q��b�����o�L�P:�g��+�QH���A�s�'���E�����2���A��x��K��������1�y��Ix|Zw[��i{;�3pZ��w�ZP8�Jp�T���L�L1Vg����>7��b����L5S*>��v�[`z�X��E{P�`�f�J�
;�3w��U�:�s�4��{�+g��/�z�f��g	m��8����I2���4�M.�]0^��<����Ribuy�������5�S�}�,��������B��������rBz��Yk��������8�r��PB����v��{�^���n��4���vKAIN�
'���=3��&��K���d�������������_��������K�9}G�%fm4����SN��'�E���M�A\����x��X�������@Aj����G��H�g����G~)I��]�9���@=q ��������2�������Q�x�
�9\m���Kg��r��SW��0��
���E�o���\q����� �<�X�Fm������z�F��d�?p������m����c���_������%�R�T*���;Bu�]#a�h�
��Rw����d�e�K�h�*S5HBc��P����`]�	2dL�� I���������)I&���QOJ���&�g��y������^��b��sn�w��	_�H0. 2���><d
6gp"}��6���8�rd��3���m�Q��a@��:��;i��k{�9������m�u��d;1�Eo%IrM!�fCS&���[i|1#�@4�Oa��La�D-)�\�z��&(W|�nn���Q��l�b	66�/|.Z��W$>,E��r��;��L(0ZZ�)��S���
�JP�t��"D�F�G�}8�^���`p��y &����&!Q������D���we��2��Ur��@��,N�.�[�y�����:kL8�����V}�y��2|v`�Fy��Gc>,�!��	%4��6�m|��1���\����s'�J���������;d;���Yiy}��l�:;���Z��X}��o����Np�
I;u��n���}2�	�+�.X�g�q0z�#��$q�,z�1�� �ph.�bB
q^���b�1�{����i�����]�r����di3��r���������9;��`&���p�����\nR'��$�h�J.�C=$��kK���V�aX5�!��Jj�od�������k�I
�j70=P]r�k���"������W��fPQ��+V���p{V�	���`{����:T�����HA���)2v��gP��X�=�����maJ�#�����J������/�vE�����2f~:�Cp��������
�k�����t�:X����7,8����>2�c.�����Dc��G	������e��0��������jg��j�t��9����z�]�����M����3r�X')?��/}�X���q�bQm{4h5S3hz1������j�����D���D
�r��0!b0Lw�KP��t���6�����E
B�i�K��;hRm����fn����D�fr-���z����M��1���n����
��8��{�V'p���x5��N
%V�����*02��������wGg���4���X`{���i dh�H�Jk���+1�z;g���k�	���pbI$n�]����i�=�tQ�
VZ&<m�&1���}�ueet���G�~@���eyGh�@��D�j�
%b����`�^��xg�.������]�U	_A+b	�������l���x��i�Z[��V�Y4�1��Q��c���E���Q���D+����z��Z����g$��P���2��%�C[\<�T<(�Hs1��%�$��Y����1������g
��kB����_��Ib�d?�\��V�����
����Cx�\/�����(IN�x�n<�A�XE����	Z�/��=����^�5�w6;lW��K�lF9�"E���G��M�7���@�h�`}����p�����d������Q��%$�Xu|����c&�HV�� �c���9k��&���MG]��P��G�%��5�u��G �����'<�o[����������B:�����B�]���@�Nu�C���T���U���G7�\�$%����m�X&����:ir��IR�9�-�}�FR�7��Qj9���\��f���U_��`�����6OM���s��]��=E�kOg��S7������,z���,z<�R�� �q_[dSn+�(
����f��f�j������{qA4�]����F�-��5��1D��Kj���
���j��� �$
{����$�w�����h���w�Q�+!R�����}���uG�_/F/��4OK��SL���8���`�3���P����	Ae��$����|��91�*���C�v��d�~�����t��t�Ha�#���#��zT��z
���n�eek
o	4�v8=S�<��E<�LBH�2Y��~�������*ts�s+��8|g�7G�����`����X7f�	�:��R�����l����]���5[��������6	�5v3J���^N�)�j��������4kmt�m��H��a&�����yD�mM������k�0�"\��'
a�%Pa��KV��5�#��$�\S��9��fM�rH�H~>T������ &L
�L����������O��&!�P9�@?
����:&�2��D�
���]�	9�j�#o�'1�q�x%���<c�!"��������fZ���L2c��)aA�=���
E\s���M�u����Y+��JI4�Gc��vr-!g�
�qU�1��;r��<.O�e���)�7��Z�U(���h���t������ X��a"�
^9�>R_����hI�:���C���>{��������1"C6�!���rB�)��~�&:�5�R���T��:�?)��x��C[3!�9�Az/���/�@�k�
D �^��W������P`!�HR�h������0����/d���G�f�^f�\(����4f�Xs
=�
8�i&�F����<,C�@QI
�"m����T��!�b�@<�ORJNjRW�N��z�������*�Q���)�"�L�N%�\�0�Yw.�Y����)��.���������?fm5�=��������>���4~��v6���i�������>kTx���7�`(���nh��{[n��&{ySQ�����k���Y���3���[9�J��s�X�� �k���,m���=v�c�V��6T#w���?�e�+�����T�{EN�B[����jIETtE�rst(O�$M�$'G@+q&�����Q8�i�b;��^q< M!Q�L"�H.I[�N�g�G�UA��M�R����1��
=��a�yf{M����	'���(��P�&R{"�����w��L*�->K�>��V�Sk�g����x���+!gs.��7�X�9�����P�%Y3��,��(,�C�p����Fd���v�E
�"��{�4���D7O�q�"�Lq{>���Ds�7��M��P�%I�\�\�	�)S�	Hg7?�R�W1���bv;!���6W���;U���9M�2#0)���(D[pk��Q��g�`��<��2*	�V�
i���W$��-N\=�J�����HqJi��]�����[�On�E���B1{
O�!�$���;~� �\�g�K(��dJ�)	��'��
z�I�5���h�DN��P����c+��Z�H	��tJP@������W�u�� R_���b��t�h�r�����P���o^`%e*���F��3*����C��G3w���_�,�/�����#,�I2��
&�(�7���:_$�
��������lc���e)W��J�\k����)sh*���j2��l��k�*�y"���/����s��������������2�Y�q���8�)B��J�q��V��	�����!���B�G)�*�w�f��`���c��Tu������������c�������H�6��?��#�*�<�|h�\�\�����Zi��X�d���_���}���)�����������i�fl5]}����&Mp<��^e���,[�[#{�n����	����
��
��a!��>�Lka����pVb�DR���j��JjL�����2��Q�1��<��+�8�E��}I��{qG�J_��Za�@7&��8:E�����)�,h�����^956%s-�]��(�l�\�E0�/I�
�1�������&��D���>�s9<T�(���19���ih��8�Gl��=���:�s���������-Gu�axj_Q���������������P�R�^��KV!&`�uML^�����em$�H��������$���&���)�\�)��_��f]<��Y|]����T*G�n3����������hZ�#��~��x����!�;����4`
��P]��9����}������Ye�����lbw���f���hO
��Dj������V���i�}���8���w��<]��<-�%9w���{JD�����T%������G�^WB�_�/%
����`
+�t����nD�
}����������b"h��	�)s�������U������z�u����e�z3c�:�r^ ��@�I=|NR4{F3��Qi��&��3�3��
���|�G�$�aq�>W���0x<���QP�_5����dt1�bb������.��6�J�
��$~��0\���|����x>��@���Y�ny�����dw��s��7��'�����5����F�J��1Q.J��d0'(��d���,�����*�2�-Y�R�&��&���3��g1A	>������~w���(O��291\�Q�� ������.����D�CqV��PV>Ogy*d
D����@f�P���t���(CcOG��s���e����d�<�<I���N{�+Ln�IY)��aB���VI���������)�
��!|���=}J9"��f�������X�L���sJ1��7�aJ1��������I�-r �T1�u�z�D��@�F7���MC������_{�Mz���ad�*�����~���7�T�W���v��}k ��K��&p�o�%�+��qy�S��x�)�QP���7��O�I6w�9��~�+4�����K�o�BK���@gN��<Fq��9��W�c ��TS�
%U����j�>������eH�cN��%x��W-J�+^��T����vd�����a���,��
��+?���P���%��
��4���i�`�y��bA�D�����D��4�2K���V�{v�qq�L��
z������"�%�lR�c)�H�./%���ZA������Z�����I��h+���>�(k�����<K2�����P�g7���������e�g����[��FH0��`��8`V_��,�N��0�Ur���O�{P�fpSMy��h���3=��4��X`���n�x'��7q��N��y�����f~�g���V��u��_�{���r����v��=�X�c���Cq?��-|�C�9O�X������{L�u�f46p�3�Ah�:��~�(��(�S��0mosT������< 2��~5����z���9�w8�;j�JV7��q�X>����R���!\Aa%�&�	�A�^��Q�$41h�k���������u�K��b���[�~����oS�o�B�Fi�'Q�L�pK0;�F�����1�F�^)�v�y��9����=��g�;�*A�����'�97����4���:�I�K$i$zZx��^3=o��Q���|���O���L�d���Rk�wJ���5x4�.���f�r(�'�a�$&cfM�o3�1����0`�sk@VF�ye�m�Z�Y>���X��sbw���{����`�p�$��mj7F}-lu|���}��Oc����?�f.����/�z�}��V�C��mw����X�SV>S�-����@��a�$6z����Xo�:�-N�G~A8ze�@mw*�����R�[��P���v�Y�,�N�*���zF��T8�G�tZap���������t��n7��Kt:M�[��]c�-F���rO�VT=:z�a��(j@9��Z\��m���
pQaA��"k�����.�������G�_`�w��~e���/H|����E_SH����
��~R
oZ��;pkS�`<Q|g�,��������
OK�Kq[��������V��%�L���y�AYFC�Y��H7��J����IL�.F��k�����6q���a�GO�P�����&��65�9����^6a��t6'��!K�?�P���59V	�z�*�����\s���axCkn�c�Xn�I��	���3����1G�9[��c�&s ��xE�/v������3Z�_���5������2�;��=��(���]�'��'�j����b$������ h�!07����y�E����`���(kv)0�9�xd�f���5���|f5��C=rd�> *�E�H���(��^��H7)����p�	�0\�4��TKD��X�)a���a85y����b�#y+s%��������Z�!�N"������h�(VxmzA�����nr��4����/*`�-��&Q���"xS��'0���d�N>�ZO���i 6\a.�c�H�R�����9(�����Cr���
�uB�R�k�3N�fl�G�T��;@KhC�Z�O�^t�m��v8�poS#���h����(�b�g�Fs��
��e�6>���������i�Y�4{uz�Q������c��Wb�a`��95o>�HzI�����-�N(���l���.�����G����N��>8��~����~������������:x��X�>	��|��S��yf��>�n�����9�C23���F�~_�����S���0�b����jmI�#8x���5�0.T��|4����6���uvk�9��)���c8�����l�$��@����w���V�x��y8JE����q��a���1�����c��+�>�d?���:���?� �������1v6-�����*���l��x��_��Jg�vu�������������+��Z��=����u��f��V�N��p�����������.�����i��6�����P�
=y�:.��*�x].#�\%�b�_b�������Z�����:d����Q�@��t2G�
��h��1�E�t�g���/�\�Ys��|\*�H�=Kl}d��65����dIB��0b�`O35@�N��m��	inP$����bI�C�7�� ��O$�~I(��(~�<�
�b�N+��i~�Q��q�r�l���?r�9v�Y��.�$�c�6�E|J���'�46Y���Kp�O�����o�L.�X+�T���W���0V�8��^�GA��:)�<�<���P&��s����=���Afww@
"P��|����|��L[�����&�t�L�N�����z�H�����S	���>��SX9^E������L�&7�=A��i<�5$�n��uH�T�/�X�b�Z�tB���A�f���*=�6�'�����%���eBEK�.���(�yC��
�%���L�;����Q6�8����a��j���c�����E���F�����.��1�!S1��~
[
�>0����=CS�$T��^�\K�������h���=��h����g��g6G���� �)�)1'��w)����`I3x��8@�\dH*y8���|�yy*��O�K.I]"r��d�p��UI�P�bd �[{0�9<j���wJ��^�dh�!O�
�O���/���l�DC)��%�G��oz��������[v��C��Y1\�JtL�����s9��v�3,$�0rqT�������Y��J��U��!*�i�m�������
��� Ig�i:��
��#{�|dc+%k$S[=��YM�����z�sr����g�KX���nmab6&|dV>N�c�+l#����:t���R�$Fy"�Z[�c��2q���3^*���eN�������� '��M(��g���7��.��H�RI(Q�5�v0���s�{>��m&s&���M�=u/�=O��Yp
gX�`�i���1^��t�(�f���kA��!���&�U���Y;��$���Y�T%�%�,(�
D
���1u���@f"R�=��N!�����Z3%5��`]1W�C0&�"*|�)��37�����+������J�f���I$���e���`�]d��G^,�n����j��V�x%��8&�M�o'6�����T��������5�pP��[��b���>��S
��l�d�(��roe@�"��tBL��Q�u�8����1�y(�5-�9s��u����|~�7)�H����3��a^
{�;��&s|�3�c���+���%�����pPr:���#_8��-�`r
�
����IIg���fO�����Sk�e��AC {��!w���?�;4�H����������b�_^����F�dHUM���o���.�;�a	9No@�	�[���Jd�!�^��>���a�m�L���Z����qM�,1�b6Z�ZE�0���M�R������2K����

�|y���o�����\��3	�����/F�<�J�IP�X������#���|z�.c,�k�r��EG}��L�7
f�k�vW�m4�|rU��oa�j�Qe��0_�������)��8T4c"�+c���P+����>�� �&v�����C~mifta0ifZ��a�~j,GO���#>
�@�W�
��fd#H�����7� �I���@l�Q�p�q��0��d8A
#�X@�r��r��X3���
�}G�6�Q�=��u��u=�,^zrd�J�e: -�;�s�c��I����9����f��'��O/�!��G?�_�X������X�t&Nw������(g��)�D��Kg9o��T�IG9�~]</���j���w���}��B����U�~;%u��[��[T�<��Qfr����7�����qk��������(`z��eB|N"��(2�0,���?k�o����?C��M��.h��r����fy`v`�g-p�9�Gz����F:a���mM����~�����!��g�2�?r�R#�G����R��Z=��s���,�:^���>�d&��8m:�.��(bKN�<@�9�}�E�F�9�^-�����A����t=��I-��
=��L������:���e=�U��m�m�A����������`��q�pn�6��Bg�8�I97��|�_��s��86)((�������r)���.H��4�q��K�j���s��mM*-�k�`�p�RSj�r"�&���S[l�
rx��]hY�<��Vs������B�Bus��u�a�E[���i�YU�U$��AO���?�����+�j��56����r:R�gl�e�/=9�|A)�VHk�Q����=�Rp���?:@\^O��p�V�v�E�&F���P��w����<g ��q�]�m20���D���c`	u��sz���Ycc��G|�nc>t@�T��N�����j�GHA07na��K�����P
t����H>1$�;F��E�Op��:��#p�������
�G+��d-��&����[\A�LQ�?�5C+]3�!�x*����d�x���s"^��{���=����k�(��Z�1}��;�.�s��U�CvA-G���/.��2�Q�w�j��z}��7��a;+�B���rIm��� )�]O�/���}|yOH�:��Q���?���_�7�k�)�Jo���%��Yg%J��@�(����vQ��f����������L�?�����1������HP3�&lrf�r]������:#�$]/_[		�W����#�M���z�Z����1RF��d8V����I��''�'�����A����*k��<�(Jape"��
�1�>�Lu\�:���H�3����H�	Q\�j���R��;�QO�y�����p[��
��\e�cJ�j��d��5�?�>�#�'�&��S*����j���A�� `�q�*w�����o�%��F�2���X����^@��3fN4��K�H�M��J����D����omk���|��3��>9<�|9l��2�|p4�EU���b\^.�P�������k�VQ�j�ak0��g�N?%��r��>�s�W�2A[^�Jn�
b] TJ����s�pO�� �D��6�t�N5Und���;e�|�8��t&R!s�V�ry���������u�|�W��^��J�����N��mSc4,�8������A�0:,���+��jh��\2���8c�	<��]�V�@�Qko�<��q,Q}���s���L��� ^�P�5i8����h9H��`����k6�/_�a�ac'�����L�\�K-���)�q���&ZQ�]�N��;x���1w9�5�!��O��q��f!C���}��������T�J�G:����4�c9�n��x���j��h<�^l�W��uiIR��9!��
�&���+I5
�[be
hb�sD�Wi���s�=~gb�mK,z�]V���Y�~�MX�(
�q��"�@N��4w��Q�`u�n�%%��,��Z�)���X3�k����9��������#N�bME]�Z��,��%"���/e�|�$�z��sLg�w��H��
Z$V���\��.�P�
������k��W}�Vq��1�~8K����M��f���)�Z#�)n>���r��Ts�)Q���!�R��b�_������0�}x g(�j�@r5������9c	�!�K�����B��-{"~]�aeG�M���(�u(��&�s
�e�����e���I�3FV�o����A7^���4N��<�D����[;�����A:1�$]���Rg��i8(��<�M����������Tu	S�\$��J��2����_�(��NIo��@�DM�d[M��ar�,v#����a��'l�'�RAQ��!���!`4a���)�1����To9O�N���lo�6����9�fO�|sx&��&�+c�Kgf#tl�/�0�T���I9��bt:�I8���$VX�v��#Ep��@-��l*��k��G���o���nh�hf�i�sF��+'���Q}�Y������� ���)I�g�I�}HCt�p�����vS�����h�����Kp�$��	�A�����:$�T��E�8��@��	��>a���ff�����]ifK��A?���P`� ���^W�.�z�]u��PE�e�j�S>|c���p>`�������z���&K)���)?l"R5�G�l(Wf����h��W�1T�(h#IQ���r�U�����G.�O	i�KTn\��y�����H�������\�����/��<���hkz�T��	�I���O��Z�R���H��g����D����?)������*�E�K��Wj���j��b�w����=�zn����,4����X�6�W�)(E�`j�I��C����%E�4�7W���*�9�
$LM?FxC�lI�\	;�&(��H-������\�����zc��N1�:�d�b����o���}1��D����z$�z�?M�����k����[a�m���W��(C�D���G��L4�LJl�h����:�����m��6+mP�4K����U��B��rQ��F������a���^�����O���7�4��h��q/�Q:K�M�&��l5�����n�f�m32����c
������k�Hs_��"���o��H����.n(lY���&�>Q%r.�}]�*z+,��"G��?��R�6-uZ}+�f��n�S����^��c��$,[�����l��D���(M6�����a�RSY�������q�����r<�����xO�
�Z�H�.��M��Wc��-e6��f�k'��:Ud3oa�RxB��+=[�q���K��a>���#�5��E��_���W%N�&B���'p�[yI�:��	�G-��������B�����0�?X��YQ�`$y��W.QJaX-W�.t�}:���I���b)�N�v�����s���I���$����~6���,�NIG��s���Z��"�����Y�}���_H�8��$�i�����Hm*O����<��������iJZ
���4/������Q3�t5C�x�
��*d��
_�-l��h0F�ud����Xx�3
�~v���axw9��^�^�8C���Z����O�_���<zR��Z������.t��
KT��;���?q"�s#�S�eC�~���������$���&���*Y�	��V���*�d�^
�58&�Zh�X�x�W��wL��D����Q���������� ��/����-��F
`���I�YF���?k���UZ�s�A��
��F��������V�����#V�Jo[�V�+���c����pU~�GR[����X�tu�zq��m�pDaN���$
�r���g'��>��M�5j��P?" t��uI����<�BX�@��Od��\hd���p,���,�n��o
���/~��%���hs(f=�����+����zzx�\���i�\��#�]^�|�r>$S��~�1]%r�Q���W��:U������.����=��)e�<#+;(/����`�}�rJ����x�
�H��v���?r���E��\*��\��E_����J���Z�Bm�Pi���������7T��F��u��c)����.��%7��X\l5.&�y��������F��l'[��9��e	�2�p���^T����� ���W������t:��%����r>C�������|m��d��O���� ��� ye�(2�u��/]|t
���q>���	�{�w��F��iW����S@�K���V�������Z{�A����s���j<���gPT�d=�+���O��t�e�j\�N��.Mq�P�\vp�3��1�P�?�W���t^��N�B��M����u�8A���^���3�0����B�|:��Z����b'wi'U��0��2mP�B&��X���
�D�#� ��{��NM�q�a��$lp�zS�O�#*:�5AmYZ����~<����txz�����,"4���k��`(,K@��d3���.5P=�����a=1U�==��")�J���C�3�<��y�"�3�a&�Ut�y����v��^Fw�f�:_�5���p�D~�R�2��Y����F.`�[s�~�&:=n��._(�\�3A�%�F���ST'�oLi�����e�)z��C�3��rW��4(%
�e~���
�;E��C�}rgTf������AUi���Xb<�: ?Y����o$��������W�?����������Wg������?������Fig��������a�V���3.=e"�Yk� ��|�;4Yo��x�qP��H�aT��-�@s��w�����qXd����!woE�����lYq{rCp���|��������u �
;�U����Qr��X�O`�&��PL �[��X�&�c�D�+�����1���E�r���~�a�}�ZrQ�-�����X����q_}���{x�����i�E�h�v��H����_^���A��Wr��h�HT��$|����x���)"�=�EU!�E(y�o�z�	_�E-��WX'	���Li�kd
:������EXy��� U,�[B����
��5�h��4u�-��Y����x�e�D��:Y��l"��j�1H
�����y������K�����r����))�P��~IX������*#��#��Y]��d�V+�|�f�4���^!e���k�M���]����0����o�'����h��pB���2u��:�
��&�������o*<*�*�w����\k�����2�M��)�e�=S3_c���B>(4J��,�������a^:{���|���#���y����>���{��
t=RUK���r��[�����`�\�		ao�I������O`���Jx�T�R������\_�7OU�
A��
A���6���
�^�V�������C���������`�6�.
JA�n�Z��fx��p5hhmN�zc#t�&�P�O8�o��(s�!%T�����N�5������Ns8���]bG� ��4�q�� 8K������q��sfrRU��=4l�G��������7)X51���_�������Tj� �\�,���{61�;Q���x�M��c�X������.�Q�nQl}���x��&�s`���?���?��K�|����7�M����W�i�R�9�lYRs��	q�����h����w����GS
.tB]�k�x���V�������[�tBK�!���6B�T�<������y?��>�*��z��{��7�WK��s����~(w�����)>dnm�7�O��~<xc;^^L0?���{���2�HG��u�x)��LbJ�T��AC���5
b��KG��?�����W�d"�}�=�n~/)������0���]5�H`���������A��U�[:�����,�������(�*|�a)�m�/J�ioH������yH��u����0E���O�)wC��bq����#$c�z���/|�������
��<���5�^�W��6,+!X~��U����V�KQ�o�����jW�r5�����V��J�)�.�F��v��5A&��x�����:�%�>N��D�-�
���u�}����)�S�!tZ�p�������h����f{�����;��BYck�*�hQI\\0�F���3��!�/�@S3:�%W�8k��@��9�A�?E9��#���4p*oP;���5��17��S�%�N7�]QZ�G�U��^�����<'���k�%G
:�]�H~
��H��U/Zd����J��(N������$[���d�l7�����������W����j/����A�F�`_%Yc�i�9�������m����v<hw�z��\g��`�9�Z��6�*�����y��&��|so����O�4�/;����|8K�*��~������j�8K��w��|�^�b=v�3���t<4/�l�7�	`+���L@1"�����i�<��?�Dx>H�6���v�d��)��Y���l������DO&�l���q����/���3�q�w�o.g�I��`�]n�
���o�[��4��=tMU��x>�7�L�x����s�t�� ����Q3j-c4����&���bs
/K\ �x>�%��;�68N���V�R"�v�2��!����%e�3���h�m�:16;�������JfXf���T�dU2�B�k��&`���R�z�,�n6����)�T.�Z�W=��)�L�K+�+�"y	��>��L��
�2������BW>F{�'~'���mec�L��Yk�p������
�Z�;N����S��vK~\C@�C!L[��$���"���n�,�����mjx�\R:����������st�=<u\t���N����������p0�����5R��30���&��^S��
X�O]�U��|�5p$��������6$���d�P���xr�-N�����{��U$�kB�F���C��b,�]\�	���e�O8R����+4�A��X�9E�����#?��0�
��h-���.�����
������t��t�|�s�[vYb�'�����sX�[�[��b	�>�^���(i�c#����g������}O���O_S*��U���7��1bS�N�%mN�����N��1���s���m��'�r��=��������0a������A��_5�h�RVS��6b��M/v&��y�V��2��-�����4��������e�9[b�^�&����:1{�g�&x�@��
'�������y�5����6��-?�e�OnYK
v�� �*��&]l��Hf]�����������G������'?p\������|8�����`�Fe��!�t��1��ur����_[���X��/���j�%������X+!qH�7��;��������vg��	Z@9���L��������v7i��B2��"Z�&�_Z����D"g�O����hv#�
�l1*�`����{~���7kR�����7��<��LA,���$��0]���g����=���b�(��A�E�8� ln���V������O�0���<�O�f
�

9�
���\Z�|m��"�WL���*���s�1�o1�KE��s�7���������h�^���}��F5w(��Fu�F��U�l|��s���b���x8�qG\V�FjA��5����Id:�����$��\����5_���A�J������3�N������Tp�7������,����yQ*�AUH3�"=���������}<|�=W��7��;�����oBf�U�
�G�WK����3~7N�M;��A`tk��>����Nm�	�}k��j#qSE�)~b�sF���>�r��\ ����\r6��b�� 
��3������]M��\��@X�O�X����Kj$u���<@�|�����y���s�m�I9w��	^������������S,Mn��~c����BC�L����X#��1��c��,��6������|C���]�RVO�4PTV�(��*j�Iw�EC.�!�q��T|��<r~���� ��!����M������J%���nu�(�b4eU�G��s��6�h,t�F��U��?z�wN_��cC/���h��"'N*�7�Zh���
��f��k�f�����h����������>T���-F3$lR�A(���U��z��+J3�1~Y�{���_�?�^u/O����HL'e���X��X�7��;����go���������=���s���*�`Qe�����b]���������m=dYU��z��.���h�J(�Z1�W���N��>���DH��,����|&$-	��B���'��L������"8�Q��	�����.�&B���P�8����!��y��E���z����.b����>���	<��Y4��4�@��w0�t{����<��1{+�7Mx<a������&��i��u�?���-� ���jqB">����W���;���-e�+��v��X�X�t"G����:�%�vj[8�{[�����m���������O�g<��ED/h��k�����1z|y��? v#:GS���@��.W"����G:���F��*����L)���%����Y&��}M�b�/�o�C���a#�~�%m:��D
��EuMHiBT���,��EJ
6��w	���fwi�(0�(0��0	^�H^�����,%N@�$�����;g�H�>����^���d��+��lb5�Fh�:�th���"�T��Er@B�E���7.������Y0a�-#g�\��?>p�	z���M�:
��I��`��/��?������R$@yi�P�J�!2�������W4%���)5�fh�`����a����c������%�@��>���'��r`�������d�)t���	+6PlD��F����f;�g�{a�{�p�g�p�;���B=��\�P|]�}�����z2����_�e��zBl�7QX$nM:��x/\W �Qc�7�vk�[0�����<&�-�jv|S�k�&�d��sU��WI4L��G����
'��_��)���I��	�0(�����b�h�$$������c�u��Bc/���j��i�F{L�t`b[�	��'c�Wp�6���p����$����
��n��!F��e������>W���5-�h���M�/�@sDg4�]Q/��D������Ls��w�7�[J�s1���������q�CI
������Ql�|!FO���8��uM���p@h�P
|�x$�%bZe�V��g���4/���F����%z	9�C�FrW8L~�j����K���6�����Z����\]Iq��?`J����ab�Y���y-;�������j��C�!H�3W0����.X���{4_G`� ��iq�|"W�*���*66�n�����=&�}89��?><:;8�i�����Sk�E����X���k?����<�_��?��c8��G����&��6��^xGDVm��r��E 9�gi���;��<Rsm��;=c�p��w�M�EjR
/{b��%�+���Gl�lQ��1l<r�P$4��'t��r#O@�N�N�~@qFQ���@7���c/�J$����4�}�[���IHz�������BBHT0#�������e�g��:��$�����3�"���?��t�w�Q^e��f,�v��FM�e�C,��K�D��l!�
��lwD2�T+�vee�|�=	>8F6�m���Tv�!R)]�r6��jJ(���d:E�B���������O�oa�^�����������l��!���C��)!x��!o��hzbb��:�V;�����f��w���9��������5���X���H��	2�k�FB,D�P���a�Z0�]��.�E��5$���18��^����'�u�[�,�@xav��-��8-T9�^*K�V�#L��0�]������x{�Y13S��.��F�2b���� LXmo�=�A~T���G�S��8������u0�|s�8f����W6l"��5���L'e�&gix�\���.I��7�W�wo�����v4�<�fd�k�|1���H�(�t>b�i-�j��.K�x[!Yo<!��ay�>vz��8��������b�r�C�������T	x�y���Y�q����bx�����*��@'�\2��W�
���0�������6]���lCgG�|�����������@�2�S�GhCLx�U��o�c5<:�~pB�]������a���bLs�|�1T�5lt�����C���rt������"���
�O��E������qm�5��u�Dd�5��P�R�Bw�wv:I�������`Y��y�,T�4�h�-[�_�%cx�i�9��M��c����]D�@<������F~d8��v��S@���Q���������5t;��jN�
#7
uJ�
cA���<�$��7�kj���Ioo��l4��{�f���Tr��L����XZ��n�a?
8��@�f�=z�l�6:��<X���.�����T3���w�<�^kX=����yT����yj�&X�"�o�f�=���8��������K��p#�PD����m^�~���������G�\��QMa�+��:Xq��)`�KF3����Yx�������;v����
���f(^Q��BG��#2�E\Rb�sH�|5"F{��'����,&JF0\��U�5U8:��|����
��X.w�
#l��W����,}*
=��di(w�sf��������)��7����*=27���U<�����d,E���-N���g�/�������s��KT����n���*D���&7~�C:|�c\�����#������]:�����MO���"E�s�aK�(��9OP���i�,l��s��PQ[k���V���K�@�� M��8B&	A�F��������'�H���4wY�(���o<[,	����-�?����,��?6����7�X�l��hJ���K|.�$���8�v����>%v�zukS�3��� ��?(vko�&�P��>s����Bc	gA	#r�2��`'Pcf���+�Ur���.tsg��I�����2/~^����A�D�9�26��SWW���*m����88�(���C��*X�O���t4K?#D�f�Rqw�[��������"�*��f`�Hmi�,I���������������L7C��
�;5�4��%������1�����G�*��8_��Lu:���=���ttn���Q�q�A(�~j�A)^[>������y�u�_���]|��|����hVaeKT���(V��M��QTx���$,������S�g
�9%8�U&D�E�9�g�i��:3��0?�!�ys}��f�j��y��Y����3*22���z('=�(?D��&��A�I��[[�bj~4�����1F5`�

 ������s���2/]����;�}Z|������f��\�h�8n��������.,z����������|8<=<;@����
}���!�$C����n�l^x�*O�,7�p���.�S��]e���+~��U� ����+~�����vo7�a�{}���%�j��T���x0p>�=s��l<������x�p�	�8M
O�0����)%H���"3 ����+�lDU5�:�Q	�T3b6��}un�K��E��]��������|������G������2�~�{��T�<$���O��~�)��|E	�I�����S0�O,9Xk{neK��.����d���0���f�O��m5Qyx��D<y+p~P��5��L���d�k�}c���$����G��8X[�]���8R�r��Sn�V+Y#^A#:����@.Y���\��UKW��\�k��'u�Z%��o�(�%�	�V�[515�Cd����
�>)�[����\��Og�2�F�Xd�Az1	
�p�	����[��V����dF��d�@�6���6�on���~<����[D�����w���o��Zk�(-8x��J�neMq��%��]U�>�����m��z�L�FT��bj��_��]r�Gq�kg?|.%����Yo\��Z��OS�t���{h����W����$dL�)�2�']��=5y����k��9��y�#%�j�8���0�\��me���m#%1�p�D\�MI�]���r�#����4�3�����5L��B/�Wb|%_��IJ��.����*����1:�^�Qg�o��

���J����yfV���%C�0�@J���rR�1@���9&o�|J������'@��:��HE+�V�a�q�r"�Ki�T�e�vw����4&��|z��"'vg\�5�)�B�f���b�fN8���H�M|�`�������WE�"�Y�Y��\�Z��;����VF;�#��-�B0��;A
�#$�!�1prI[���q�����Y��p�
������l��N������������-{(w�����w�#e���Cx�]CjXF�f�*.��h�~���4�Vs����'j�E�4c�a&��*%3��!��!�8�IqBX&Z�y�R-2W�o<����9i�w���d�d�?���3� ��q���
�8��)��S���wC)q8<�����������g?��1��x�1x�&������p,�
����t�S;l��s��b
Y������f�T��rG�x�V���!�Or'_*���6��zY"n�8�Kr��I�S�������{{]Z�ln�EY
����|U�Z�Tva�:��K�0]����a?6�7SE� U�$�X���,��������]W������v�$��x^�Gh�3�XRK�H��i!"j�JKe������4���O�-O�����c�g�i��>��l{6=�g�z������5������%ep���1LUC5^KI��@�%��?����)�zr�2P�����33��4�uS���lw!�s8iI�A�~�*����7�����Yc����~��Yc���!������I�4T��R�����D�Yui1��������(�_"PI��8��	���f&K���3|�x���F�
$4�6����8�q��k�|cA�Rf�����<���#x�B������RJ��4?�S?O�\����4��6�����J'�a�Q���q��ff|�(�J�X��Ad���:`�-'fhX,�y��T5��,k�N�������
���-�E������Y��B�VhY�7]�l<x���q+H���cM}�t��0���������l���w�F��N1U��;w�_��R�_qX!+��`�������)�
�+$T�!��_E}#�G�5V��+�8������|Yst�<_�u�I��V���gHHp(��t{X��AzSSL���q�7���/P��xG�[���^�=�e�_�%T*rI8O3R�H��~��``�/�h�E�0��6_����t��,��xL���$-,����]�etv����cT�bM22b}�#������)��'@�f8'_���P5K�����=���M���w9�A���o��W���7�S���'X��a�E��B5�a'��/XB��Xq��kk�Z�D���{�$�F���eQ_�I9����jS8;��z���a�gO�$����cM�tE�4'WIr�� :an���:�����v�)*k�A�����'i6&s�h���^���@����d������p43��	|�I�_�4�������(b�PJ|��i����A;���yv�*K������x���u��R��ObBY/���AK��k���Y��X��R��d���4�	�������d��s�i���*q�	V�R�U^��E'�5��W���nk>p���(&�uC�5����_/#�l!���V������{����_�R�������Px�P!{+���RY��B�}!w��>`� ���Tu^�Y�B�'����_�/	�L).%t�:?Y���&�'��,���"K����Dk�����^������Z^S��*r#bU)��~u��z3r7n�^Z��{��~���b#+Eq->;�<�-4����/�-��O?�'�	�r��dB��3����3�*m� �bP�g)�{s���c�"���7L�u}8���zM9�b��Z:���*
+��%����p�e���`�e#2�e�����2�UGqAe���M��`� 6�����*�������3��2t�F��mR�6b�������x7�m
0�'Tf�y@G�H�y���F���&6����-���j���D��������x���u��}X����i�{�,���v������v���om����o�����[���A�L��	XQ�Ym��i��sX
�����F��Q�7������-��Gh�L|�{^���B[�{;���rkr�M�.�)�z�j>Kn������}y#z�oH�o�SgB�L{��([�����\_���$05�$���P���pP�����%���}	q���FO��0��H�Mi�C��:Uh�}?��c�b%��;\��Vg��*�Y�J �O�fU�
j�S�-��R	J& F���91c��N�]��	;��pw6�$�%�Wj��-	�u������[K4����+�{z�Xa��_+�Ll3���|(�G�4R+R��������������u|�j	��1VM���x�V���/���u�����!���r�e�@����b�����m����2A�����F�$GE����wmD��L�:��1�U.���W� x���������or�i<�Z2�|_VYk'e���X�����t>%�@��/�KR���mg�	���=��z�<7��'��"h}t���'s��k��l�|-���T$����5*:Y}�sq�+�Ws��
1��7���n�����4F3::+G�k4u ;t�1�?����_bVXw�na��D��F5���\n����I9�	����)����D������WK�g�	~���o��jDJ�(�6�$%���
�w�/N?Y�//�M
�yNG����CoV���nC�A���m����
2��O8�j���������d����\RqwJe��'�En4��K�w���Ee2S�5�Ee2�B���b��R�v����(�`�����+�5�x�v�/_������e��r!����i�s���:��,�^��Xrk��<��$W��C������5sW6��g�,��i��:@�AY�������HxI���c����[1[����������0�`s��b��sl�`��\p6=�j6�m�<3��j�+&����6�z�����������z���F��j6���J�m��B�6m���6j�Mr����I�v����X�[�BR80��Fd�G�t4�z���Ig�e��A/%_0��x�j]	�+I�"o+vt|v�<�-�����Gu�q{����b��2�����X'S���Xn����G�u�Zsx��?������j�/�n�2�jq�rB�m��4��[���<���Y�G,,1������6H��H��qv���j"#?�(������^"E�4%��=������1�K:���&�@o<] ��NRF��kI���&l��AX�ar��<\�K7\%�(��:��p
�C��������f��D7VWpJ����?H�y$�}p�#�Z�>$Bb�L�N���Q7\<�N�NN�;b�}rk>��99����I�8�3���6�09E,��RGsu���(�.�3-A��rleJ(Q�zG���IX�E�}
�k�73�3���@'�7;���e6%4�'=�Fs����!�B1 �SO����<]n1���b��Rc�-=����9�Y���G���/�������o�6�c	0J^5Zf=Np�CI����O�-<�����.(���\�y2����*��-Oi0>7��%b{��\�~�]<�W@2�fp�_�}��_��>�?�-���p���c�G:S)��c���{������H���;uT���k�5'?����Es:��YF�����C3-��b�%��R�^���w����~��_��L6�������n���u�L+�{-�H�@E���xAqBfO��� [.�'�+�i�E�������|�����*\d���z=gm�����
��.�n�v�~��jpm���V�"�%��`VF�^9^���}2��G	���]A#*�y7k�f8�.+�}�%���ix|�OY���o#�\����kW+��
����'"v�)V;���E2��(��T��H�k��=H��Yr���q������3���!�/I�H�:�i#hev�k�-t��J�����u��lZ�����&V�k4�s*n
��4�{��hq�h0���O���������:x�=~���n��4���kT�mz�_p�|&�����u.!&�K��-��;���}Z��J�an���|���.��y��5���&i��25ab��x>�;<�P-�1][���r�E���W�����8�[�:�����FY�l	y=����[���6�l����")�{WR���/!5Kf�W����������{j���h�E��l�#�A���@Bg��������+!N^�;9�����v������w(w����v7��%���'�/����A����80�w�6q�I7���yA��������/�`"t�
J�����Q��
�
����1��Xb��&_�w�����{P>����'w�Yp@���K��E�wN����h�vG�W�|�,������[r����K�����QP`��\�9S`6��s����f�����jw8������W���Xqk��5��Hg[�-��)�K�h6�<�@&i�n���zOFf�X������F��Y���5x���{�z������W����������Nm�V��J��lg����
h��?��?��?�����Yc������hd�&��B���$&4��'���l�P>b2�#4����V���=�2
gsJ��?g���p42�:�p`N5H?T��g���gY��
"�5�x@v��	�"��Z�l�)�������{8��d��nQ���MN(�!�J�1�f����dy��5db�h������i��us�Sz��i1d�Mcx����u�S�����\K��������o_&��4jt����?��/��oz�G���~T�����_��{D�%D�?�z[l��6��*�����$s`��2 ������������?�����(��T�_��L�4�S���3����)�?���
�����E�R�H��\��6�����j����XQ?<zr���������_5y������g���o1�[���}����!�y�v��l����������jDt�2*����73��p�?��w���V������e��-h����m��a|�������|��=�������K����������s��`G�����~�g���W���Xr�}�X��Y�t���;]�u:�]:���0�2wm<ol�J0�:.y4����<s>M�O�D��/o)\���e75.7��D�cYd9dL�~�)��|,A���)��6�u}�@R0����m��P]���f�C��t�����}��N���p��FO�T�X����;����Z�������3|�V�gu����hu�M��Ts�D8�z�:s�tU��u�7���p����vUG���\>�
.��lp;[�d���j�F9�i-�pj���u�����\��������5���j���A���{(;��#�E+�q�D!���8c�7��Pzh#�^q�(J���d^���*B"��
/�\����d���VLt�t|}y&S�,?��[�~��1�j=���t9�=���0���d>��������/`=�N�Tu��H�q��u���H�h����W|�i�i��
�i������HJ,W9�:������t�@
1���F���J��g���(�|�lf�E�t;����6
R�[�c��0;}���	�N�#aF���$���
:<n�����<R�N��#��k�R���mv���F�R;�4����go�����������k�}],���V��71������iu���"e����5�q��'��A����ZI/�ytSbUu�JX�8H.�����5���5[H����V��f��(D��0m���9\���O4���KQ?��S��Q���������������S~�e����\
��Skoq|�`����	gh�B:=�X#�y]���5�����d�����������B�2
'?���G��\��j��W���cpG����`��L�*�h!&(EX	�oD���Rs�Q_Xv]<����2���Q�Cj6���f���D����D����[��r4�����?��M)�`�WR<�'d�
���J��$w����C�� �1EQ�.�5�bdqFJ���j/|�W�����]������%������J���L�f�FeA�V+J�~pwR�����;�Q�_�����%�Z4fMeTbk��B7��� �X�F`$���	�f��$���`���p��~F��7������C8F�1�DT'1���<N��'p �8�����"�N��d��ni�+����'�7b��}<�+���f\��4�!f�g�������Z
�",��4u�O��`:�V� ��6C�(����(�\��^�KjS����]���w�&LsT��>��?�k!�z�^�B�>+;����"��=,�
:��f���v�����G+"-�C6#�O����z�V��?|i�7���F��Q��xs^�#�[��N���~g��y~_�D������E�_�������^��k��j��D��3�
�������"uT�4��o�R�J^G��*�8�,o��/�TkZ{�3JS�����A�;�JCt�'G�����r=�>H�_V����.��e%�^m���&�s7b�(�������a��j�	�gos�4a����Jd<P�?.w�yl�����f"�7���{{Q�6�.�A!K������&{rn�cQl��&0����('*���!���\�g0���mW�Va��p[�W�9�J�,�����dW���u�2�]���L����[�CU�WX����d$�<�=�E
 �����;���j�!Z��-������K<N+3����d�S,uq��R:b^�_d`��hl&[-��u����xt��j�u���d��o�5�o��?������-��T�� ����a����ki`)�{�%��qe��4a63���uP'w�F��~�\A�'\=����~�E�[.T���8I�iS���$�������I��|/�@��4�{BQc�j�l��8������mX%����H\��2��3�����Qw���G]������z*�!}|����}�,�4z�	�8z}n������S;!��#���e�b7���7����0�Y2J��!r����-:��_
?�sR�+��,���:�5G�[(?����=!�z��,~���v��f���MZ�C-��<��iQ���C����B�~[`���^���BD�Zb��@n��!��Q�M��;���������5V�qM�w�<���������"��������w���z�M�����^��bg��=���_��(���hE'���u^�������G%���;��)����B|���=�������V!
��=�r���j��O�ZL[\hg�*����,��w��1�����
��>���a?r�����Xu�
8(9P$���?�:���'���"J�^2E���u�$��3��\]Q����m��+K9��.>�����WuzY�y��i�g|��vzu���m������x�%����W��]�����Y�V#�����l`e|��d���a�����|�=��l��$�=�g8��Cs����K�'�`f�B�lobAX:zT~�
�UN����D1���
~�6Do���a#/����]s7�|r;}�TJ2ZCedN3�d<��z�(��(ap!7E�+���������8�H�������&�VJN�����|��VrI���d ��[���lIK�bUq�����\O���m����]��������[��A�W���ez��R�N�D��\	^�i��]������T!;[�Iv��r=F������n|Yt����Lh�GI�:�&�:����[4�}8q&����st�����
t����E��{��-���������>gn��VL|E�N�;da���F��������HDe����v���uR�*+=����
��-vOY:7�G��}��bY�w:�����m�v�4����k�nL���I2��A��3�ua�Y�o����\����vc?���.���r9�;��U��;d�-�*'�?������_����{.�u�%����+	&����q":�(��Jh���qc�l9b�){��9�0;�c������T	���T:�����P�Mt�����Dju7�����}&�Q���_�V�����A�M�4�7Tn������I�d����/j
���3Ll����������x��hD��>��}�o��V2�s�ax��E���N�|��`
:F�A���'4�����N�v��<S�F��s|�\��)=eR!��=a����6�pg��#�ID<�O�4op]2�@��b����������P��`���y��i��fw����)[=�a��e�4���#�kn	p��L(z �[U�=sN�����?��Y��7��_O'7��g��t*����Y8}�z�Z��G��d:�:�ks�`�
L0������������Z��Ih����5���?����x�>@��73ih�n`^0�je|��?�E���O���N9��$��
��+��dJ��US���'R:��H��z��!p���N,z��l���\�0���,�&1�sX�k{Zl{r��C�r�5jI�n8Op�/��2��t�{�2�C����{��:��1������h^(a c����?E9CB����,��#X�#U@���p�-q�oM9�r0H�5_$O�0i��t�i������������s���B���������w�H&E:6��0C�JUJ����4�`L�b��>���l�����(�r�,��E�����|������F.Y���Q<�����O3Tp�iv�=�����i����q"�pP+�_��=L�����I(����8���D �[>z���X�:����Xh�+$���_�������#b(��E\���G;���3���"�k�O����=%�<\��00w]��%���)��a�1�?::u�+c���X����������3�E���0�;��x��O*|_S	�^�����=�<P������o�T����F3��Z]�~/����%��h��)�v���i"]6\�����d����,��*V�U���|�����)&7�E?\���uf�y
�Jx������G@4W1>8��A����D�#Bb��E��sZzX����og'd���a�����	������/:�*�_[K���j�	L�����q���FO�A4�`,���	(���t]����H>��i:)��;|*R9��_]r�2�_r�6���K=.���������3���w(B���&R�@� ��RN�p��� f�����A+�-c�\����!]P)�Z�X��Q�-�9jBNe��EbDA*�p5��
+�^~N8��*;��v��5����3�U�����i|�P���V�p�
��<�K22�W#y���nQa��
���f�AI�D�w,�+��K�d���4p;��	�v ^��$�Qc]"���{'FS�|��F��������K8,^�����H!���Tm^dU����j���6����]~��c���n�d��V	9�W*4��:$��i1Z4������?��y6D�(E��#��Z���69L6��y��&�B�c�?^+�X��r����<�+x,�86Jv�[Lv������S�3�����E�������
���6����w�&jmooSdT���x+�����
���6���Rj����G�=��k��%Q��}��*y�3��'���<��|����3�O�v
N�x�Y��
Lk����������������{lZ`,�zH��F�L�B{�������h���<��������4G��
�����B��0��QW^�{���:a$���B�"���E%���X���0�<�~�L����G��u�����m�U��I$��v���H_G�(��-��d0��<�����fW�?���](#��
H��E��qT�-�"@��F��c^��rkQY
5��a�-�8(�)$�zX@Q ��p|��wO�.LZ
�lH����y��������'�c��x?���~��H2.>�P04�<f��S
�8v�P��(��+wO���[����U�\)�.<���jl�T�������4�p6H�'���Uv\�k�	2�'~����3A�����@
���`��(	w�����*����v:�Qmm5��nwgJ�����}��(�����y���Q��W����-_�|�:��\���/��<��l�A-�7��ej�����R�����z�eM���-
������Sk���b��*�&���P �o����)��B�	�F����~1A�<._X5����7�,�a�,\p�3�y%���]M��%��6�2�
��i�\���{L0��A�u&+j���]>���\���0�A��ghV�G�Q6�]ry���LC	���)���c�ISL��l��,~�W*���3?HG)���7vt���TC�����KW���h�t����.-�Yu��n��� �q4�zU��#�uv��>��\��I2��A+>9?������@�!���R�������Z��y�C���|a��*-���l�US��9���d`�����~��x�R���'>��&|��������;�{�+����g���F�<._Qc>�GORCoM�qA�N�\^:�d�����p
�A5��8
��v"G�JJ\n %J\(Xi��PC�aQ����!bx�b���Q��&�06�`��ZP!�uD��d���P���S�L(j&�����<�S���aGQ�~RF�,����a{���.?p��}�����hg�w��$�F��K��V�j6�;�u��/�u�Z�.�e�v��Vu�A�^�P�B�)��S�4�>�H�3�AO.(��	
�O&�X�g����t����P��F�#����_wf[�oG��Ure>��N��ga%S���O�q��g�����8���fs������v�s���)�f�i����iE[��Y��������~������R�Qr����c��P'�+o�p����G������,�2�h�S7��_�_8�����U
z������������������S7�em�y�l.lmX7�m-l+��5j�^����K�kh��^���������t��M��-m�ji��8#�?��G���CP��I""N |�k$���E�<�����q����^�$���i4�Fh����%�V`0NgO����y�V>���o�-[M�������CX��*���,���KYq�R&#��cm��_�P��Ex��h���v��1�g�����|8\�b�od2�Z3��j��H0�i��+������d�K�G��s��($Y��O9)��xr��c��F�������S��D����6����5���+�~����%=u0)�Zgj��0J�,.�������Q�a-��t��G:!{uz�G�G�(�O�w=�>n��'�����/.=�"'.�}?-�%s�7��������1�7D�e�\z���isBB�����4�p�`�S4����?�1�Rq.0���g��jQ���YD�MH�&���F��� ]L����c��0/f���1 =����C%I��e�2�'�U��o���o�[{[�F����wv���1��R6cZ��u�My��>pWv��Q�r�9�~*� ��
�����-��@�E���$�H���;������6���I�u��q�=�O�kJ
�0g�|R�S�9a(����y\
�X-���~�@8�d���$��,���(f��� Kf���"^�4�Q��0����na��4�_W�����{�kbd�x��i7��Nsws�L�w����K�AT�����\�/�vi�9PLr�J���I<�L0�:���K[�o3��0������l��1.���}����,:������)Y�$>��:��(���DM��i��JF~��[�D�p����2���")8�7�qh9s&����Bcf"����d����!�9#�~�sc�[�������sC��gJL�Q�~)�#�����H@��#����E��������K��$�������'/[v�;V�����z\`�7�w����r��&��t�t�����5+���3�R<��)%z{z��7��F�"����KY����<oE�����V�$�������Z��R���p��^^�����O1=`�^	p��* ������E��U����=��!?���Z��v7�#r�����h����@.���aUO���4�I�J <] ��{��x�tO��Yv#C�E���I�X<��(QDL]w�nO������{�������+.K��UL���
��c}w�%=J�?��eG�S�(��4;�d�\(�d���k"�G�|wg��j46��f'Y*z����~3���8�*�`�������O�������z^wEw$��A����[�9�A��j2��|��
<W��Y:�I{q�2����4�||��~�3hZ��7'��3��G���B��	7 VA�O�O\��~�W�w�X�A�4Z!�����>)1�9�@��7D�p�j��U&��?�����������^��<��2?Ug)(�$�������C5{��B{�7��S���R��|-�mo{/�����ng;��(�����&�J���Cms$�S<M1��5���q��M->��~�����s�#�5�Cz5��OT�F0z9����<�&C��C`��|��:�C������k21����e�����x����O��$Us��iE�5[mL����y����`T���aQ�0c8�[K���G�##����lNT��*�Hn!8=�"	�n��!@i�u����B�a�0�>t	dn7v����~g��P	�Q	i!�� 9q����`H��o�,4�cz���F�s��7�tRiF(6�C��p���(�~�M��.qp8R���"��7#��.6���F��4>7#�@�h;��,��h�4|��O�C[���d	����C����JA3���Q�u7x�d�G��������w��6���H���V86t���*|=�>�������r��+�������������]Ed��u��7�]J��9����[%��>�%�K;8�����,����]p�������l���������in�0N=�y�������:8��n��><9=�����{d�2�J&q��������?r��nZ^YO2�E�����>x�����=�x=|�]�Z��c�u����d4���?/(�&�
��v]��_B`P�#���a�9�G(Np%�l�-z�V9��O��bl�O��]9~���������8����N��K����W�tM|��`�o��J���z21e+�(9�lR�E��H���G���c��w�_l��u��n����l���C'a���_��b�1�@�0�
�s]���x��s>7��(�~</�t�������L�CD�2����o��'��F�������[Y��R�O��"]\�a�w��9�<���I� ���O@�D������v�o�#�t��>R����3�.�(��5��������	m�QGn��Y�4;==�������r��Q���#�#������v��;�9��[t����s����E�����`t���S�T�����������z��x�)4L8�Ix�tRr�+�xa8�<m2���l�����-Nz�_M��^T
�0�V�hZ��������`g�h�����[���w�g�q�nQRt~l��������
��k��T���PsN>��9%�	�SP3�����6b7uGx�)VE�(LN�B��i�
���	+G�V,F��3[L����w	5��G��O<��p�F-�J��%!+Q/�e����j����*�uKc��?��F����ZD��%^��;���%���<�����v}�0�n<����N9K��K�w8�399��
y���x�=L�l�p

Z���B��d�ltqj��(��p'��^����f��c`�L�&��q�Nx��'�4��
��AD�����x�<��<��g�G�_�����t��[L�������c�����>�/�~��*��m����
7�kg����mU�f�$�������
~�	��
�?�@HaX�F9��O�zf����n��$��fg��9MO�K��D�0��c��r�������[��U�M��
�d�kcP�v���Y�l�L�)�'��q(�=����Vs��!nw��E80L�B�U*��Q��`��}Fk��e(�	FQ�uH;}������h{����1�r�E\��������f��n5��^�5�Y����[hC�.;��'?����g����_G�E�����Bx��]��n<���Q<��G�����a��pt����OhEM���z�����I�	�K?�e���f1�qN�~���M�	2�?Kkos��
F���H$����H`�x,�p��,��g�����������?J_nNXB��;�N���[?|�����)���v~��0���;���������I{�I����f��]����o�v;�v���[h����2Om��_���6����3������o��/p�S�����*���
�������t��;�P��R�
���u��/����Dt�Zo�bGq�?��/���^*��7]�vk~G�g�
��=�?{��2��o��,�O@�@��6�~��/y�ms9�U���K�z��������_rE�����8UV��77[�y�S�'�#e��P�Q����6RS��/#+��w�'2��4M�HP k\��G�l�������c����=y��p�5y�&�$\��?F#@C�,ay�A{"F������/��{�����E����Z��D�-D�����������B��v��5�f���������\�~�m�����������������c�I�0):0%�����^V>��3'��xs
��vhN���8��f�}���9��i'0�t"��"�z�9����	����NP��[�k�)'���I[u���Yk���m��^�|#S��)��L��O��Vk�����]���������&x�O.A��p\+Jd�qH�1�g���<9OG�t�����������*/-���h.7��n�����*~�������d`���l�����.��ITX9�xw��j--�I�T�d�
�?�C7|���of�p2�_����0���Hw�G%��Ef��� �Zo<�������%~�Kp'��w[O�����rx��uy����+m8KA���y�������ho������������"<�����Ik��=�(�`2�(�8�����\hM�L�	N�E�c���XV��q�Q�^��^�e��^��6���yg{k����C�^Vh�f$6�����Y\�����P��������=}$�p��x�������{p�;Z}��?��%{#�����dy��@=�2w���,��&4� �8I����!�����H�zOPyeR�s�.n��N�y:v�b5��+�(-l�k�����^��
���^<�5t���^����O����p��c9_���f��o4;�������`�X��8���PI��
���� x��t�OV?C�0x6S?o����"��1L�'6�;~�
���G1��X����~;���%��xV�%wlA��.�F�r]v�}>*����C[�DQ|�����mJ���??-/qBq�����-����:��^�Z0��)�Q�
y&�J�`�m4����l6�����^\N��^��n�$�����`q
�j<���oR��tF��[�%�(z�����)rm0i�v��i �c�k��q?D�PZ�d}o��S��+~�_>���Xsa�������n]W��]���u�2��3����=�n���9�i��^���U�A���|�����������{�lnm��Y���qsk�t����������?����>�+�m��V�b�t�����y������@t$�f��������yX���.����j���0AS<���	k��Gh�e,��rz���]�b�G�0&��E,�%M�g;eQ�Rdn�
��p�U[]�?�.����wp�	,�������]I�(JU���08+_".��W�k�o�i�B���e�C����{+Q%kl���y�����d~�H|o�Z ��rG�W!�>T9-
<��W\j�<+���)������3����&�RAy�A��lE��/b1����md�����R�*�-��~���M����i�]2�*H,�0�s��&��� �IpWb�0`�INN�q0�n=�c��$�2N[�>�����-�/'�Q?���q��9���m��0��4�&���������VQ��-,���N�3!
�}�f=��9�������T�	�p�^q���A�[Be�r������x�p���0�j>��������C����|Y^�Y��/��t�h��:��h���ET�(�Eh��9���{���?kv��������,��r���a��c[�n:h�������L���^e�,�y��7�AQ~l��0������ryrV������9�7Q]��7AB���o�8����Kk+p
#���79Fv�7m�|�]k�Q�M��8�!����x&W�+T:�����y�\X� U	��da��ZjO�q%W�Z�h����oMeM��3}����,�\�e������Q���t�Y������YM�z�c���w��mDg8��.A����@���L}�n%�>B�8k����
�U4����� ps�B����#�l���v��lAQ�(�k^"�g�������"��L�q2����!n�A*��]���"Wq���o:k~T��i(E� ���$P�b��l"�����[��L7�9��(�~�������
������(����&��~'��h�������(�e�=�?��]�dJ!�����Vxq�27�C�77�*S1�QE�+�� N�MZ��k�����"��#�_�u��?o��N��k�[[\���P�	#��9����@��@\8���c����8��~<���?<�y�,��o�����:�0$���{�S�O���~xI���o�a�	���K�!A��8�;[�������mv����=(���5`4�Z��h�$��A1����\��%�?���K4t�d:K-�F���?�?9~���:��|�qu?�>��=��PmD<~����)Z,��L����w0L	uE�
�3D�-�G
S/H��]�J��4Ny���C��(���t��Zv�+a�e��J��
yB�\Q�(�� f����J�����
��LSc�~`����W����x�nI�@��S1V���7D������!��]�e��>�����C����:pp�H02G�3��RN���?~�]�)���1;X��5�1���z����D-}�/pO|!1^��*� ����v�{6)��N���u�h���qJ`�T�3C�\��l�U}���O�������0A�S!)���5&�M��{�6��B�n��.���4P@�?������n��BH���G�u�����{��o�X�S�3��a"p]u�R���f����3L�����!��6x`��R9�:e�M�0�A���:c:��*�D X�w�����;�������@��%������}Ob��b���c���Q&`&�P(�z�ZMP����������Hn.��p8��I��
�Y�dq�U�����O�R��i�R������}�D��ZA�4U��'�9e��~[[��?NX���I2��f�Jq])��qWC^h�6�}z�
e:������F.�W�l��=B���vv�`���w�ck���&�1�
�1@�SX]#���Q����6R"t���Fx���;�8�X-Y�>��,$�s2B-Um��k0
�7������	�x��R���x���k4>�a��s��z��\��w�������z������1
�����$9_�<M/�_��h���/?	p�����0i.�w���������k&������Z/������.?Mf�4k��	ID�"���_����L���t<���Z��P*��=��'���2M.���r��%;i����N�������^��c����H���mb��h�u�W$�^���7�QK��~�_��_U���E��y����>
�E=����=�+wW�kg��@�����J�Z����!Kl����l���J/��
��KJ��d��C��e�w�
�V�^���>�`���O[� ���9&-����	=l_���A�U�.n&�4A��;u��K�2z��<��M�tl�������������KS}����|����s2�L6���?��k+9����.�QI�N2i�&VlyJ�5��/���0�Y�>9�1����|��c�Hv����f8<����Jo
t�>m�G�)Q>��_9�8z�OJ���d<-f��lG��-u�S�7�c�Tv�c����0G�%�$\:j������wPg,v���/����@�j���0�z�C�`u���RX�c��X��S��i<0qV��KP��O�	��O�K��x(t"_���%=�Bq�����!����I�s�7}fN]�T��fT�����L���>El/<?���Z6�n�Q22i����y���f�n�f�JN��u���b��8p=�+Ye�h��P����<�5��}�������H����#v�{�v��XBu�@�%��Mo���s]8�Sg(��`�����g�������@�O�C�_<����^ V�w�h������P[F��Q�xvv1Y�����M�E�!K��Q��t<L��Lm��p����\�����W\�����au����V+V:���.��"����i~Wm�.3?� tY����<��|�E�JRP���rky�Z���%/�a��?���{�S����^|Qh�������OKV����;a��d�/+���mSq�h�5��;��d��h����� �B}���V�H���~�7�<%�Z��`�QAp�A|�i+��j}e���hx�<�����%�%<��]���9��{.��Z�,@�{�Kt>�y+u���P$>��L������`���)�W�nr"��!���k�^�7�4�(&��C���~ort���N��h������8�b��a�(F����Uri�5Q�]�1�(�5�8�y0��N�l�:���h������}V��:
�]Q�P-��/��Z��A�e�������J_x��;�$�%$����-x��H��j���&��@8��~�b�)���!����������P���=�@@�X�U
`�������MA!������s�C.5��17^C��d���� ��\7��|���`B&M+�)�1��@�"�
�Un,�h�uu�Bsta�H�c�n�D��mO�{��E��FE����:%,���h&n��@/(�`$�����}$�I����I�#�Ba*��w��26���r�`�=V�r2R��_�O��s��D�{-
RZ���0w�t4�������%ec*a�D�8u��o�h{&�A��<�da��y;]�&z�b����A�e�F���v����ht���^����)�,�9�A/mS�0�h����
E� +���1�?��Jgb�RJ��	Z�7o>�|{�e������E��Bs�~	���PYp1�{T��d�7�
1U��
��PD�Us�e��\�^���#q��x4���n�`����s$��&�&��m_�v{�������G0��i�nD����\��;��&t@�=���tjS8N���p��������vcv�p�pGy������T�3��q���V�@s���\��t��k%�����l!���VgsAm��E� M���?Z�L=���a��m��d:�|�l���xh�o5���r.�iaC�Kt!>�$<�C�tkK�"��4HJ�jlp�%%J���w?�ze9,�����@����:
���2�J�o��p�D�E�������Xr�K&�4�
h�8F�r47��|�����x�x2��&%&q�����xJ�\O��IdA�t5�\F
�������(D����n8+�q�5;T:�]��	Q��F�Il&���)�JI1��[�BQ�`������M�q�m�l^��I=zCq�S�������Y��8{'1��*��)�\-����:�	�y��V:�*�9L��2"�B�BFF�UWo4>�d�G�qL�U�L���������}�I<5T�������5�iz4��La>�sZ#M@�����s�~G������7&k� 5O^H��f{�l69��������s*�-��������`��i��b�^rW���=�Qw�N�RR���/D��]&��d�&^�-��i���'����,�+�9q!q���� ������S�C�fJ,���E�sJ�/�C��2��{�����'��,��
D5B4E�%3a_Bw���KI�����6.�V��*��p�D��L�|�RD=��'�r���n�1�)�f2�1�40����KB��$���6#n-�����5;G����l#6�
\�v�@6W��0yr?h4��)���5��l�pi}q���F}���9�2����CL+n������:�hp!��Z��Q�������Ss��bI�GG���a8�Hmp���������s[<�S�>N�CS���~��.B�Ir�B�d�E����$�����s�T-A�*��>��hCl���l���P>
�g��(�W����������R<><Vla��e�j�+�A�^"y���[��-���=�I�Acz;[�E�1��e�~O^p`�se��R��rX��Tl�m����*/g��'I6�]|�*=���mkJ��C��$���G����T�q(�F}���uk��_'����e����0;�O����$��L����F��P�{6MHx��L(g��t�B.^�����m
���Fcs+������~,�-kI��C�	���N.�����N��:��
���N���
 $�;�?5�����,/�'����#���^�z�?,z������e.4�~�H�w�99~U��Q+��b'	dhdq\#r�b��f*2���.
�j�f����Q�|������7g��{|xD�Z�����m�`���f�$B��Hmc ICa�A<�����:�>���dZ�z�m�i-�0	S=�m:�5�g
�f�]f
���~�U1�/�D���%R�a�
����?h3���.C�)*G��}�+�(�eyD[g�zD��\j�q]�(_����9�="����%�?���[�#�����.�c��A�>�!��>"[�#�Q��G������_�x�P>��+��?�xpr�U��9�����>���/�|�����t�9�������K�$��c"�
(���09��N�3�_����v���[q�|�1��`��|�e��1�q�1����x�`���,8��L�1Vw�?�a�����xT���qS�b?1Kz�{/1��B+(����(w�";������$��c�gpV�<nB�]���
M�nv��~:��^��d�����9�;��_�k���%�&��G�a����#��h��6+\�q�!��������FY�2A�^�+*�X
g���L�t���I��^�����t�L7
�����-�,��A��K�rG�I�]���d�k�ez"}6^�g	A�}�j?om>���s��|�A����W���v��N�����c�^iE�����8��}���Z�����[����J_�M�w���x�������%��x��o��C��;NN�O�c�%���#����g������{�y���{�^{xt��GK����vh����vV�;��D�|�qI����i����6u��g�(��Xi�.���Z
��X�X����jq����5K�&\��X������z�6�����U�~��f=�`)-��>�)�sX3|l���@�Q�k������:���?09����v�������MD�D�����]kQ�����Z����]����8p�-h���Z��?h�%b���
�]�
��f�U���K���&�%o�heWq�����L8���M��w���-�����5�9�[��~��T��������]�m�WW>n��I����j~���g���J~���k�@�Z������p��g�W �@���
g>L'%D�_��G�����k7��6�������=�!�����!��=�����#��CT�N���n?���~������P���~�����������<�j>���Z�A��-wx�A��Lwx�A��z�����~��Dk��Z�A��~��Dk��Z�A��~��Dk�����h����GC�VA�T����+eV1FOB4v�U����L<���.����x�nmD������)VG�4����^/�v������e~�/�dK?��Ov��]�jO>!�
?i�����6[�H>���d���5i��bY��K�d����L6�k�����M�[���e~�6����v�o��L��[K���6��Mjv�of��������)L��V��f��2����2����2�h�wXjh�wzh��;��m���yG���m��6�h�w��;6�;6�;6-=o��Eon��Mo���5��������#�1���wt�;:����yG���c��e��e��e�j����Y�-�V[f���Zm���2k�-��������.���������
���n#{��uY����W�?{j+j���^��_�u~3SDwB�
W�Y������5�Z��
�"��|�\>`�T��J��u�>D�Dwb�s��F0E>�Q�����s��V��&�t�6�Ybl�����G2��I���r�4K7��|w��gi7�mg�5S�
��a��b3oY�t�uu��i��^�����nnn�nuvv�v�;�z
P��� ]�{����Xvy,����;t|4w������L����>dO0�r��+D���V�|��8��m��7�x��J��^�G�-^m���{��WwH��2^����s�n��%������|+z��G��Ex�~/�}}/�]�w����!����N8�c��c���[f!�w��#F�����M	<~��S
/�[���-�5T-6k�U4��
z�'v�Y$A��u5�i�=

#69

Ants Aasma

ants.aasma@eesti.ee

over 9 years ago

In reply to: Markus Wanner (#7)

Re: Proposal for CSN based snapshots

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(Reviving an old thread)

I spent some time dusting off this old patch, to implement CSN snapshots.
Attached is a new patch, rebased over current master, and with tons of
comments etc. cleaned up. There's more work to be done here, I'm posting
this to let people know I'm working on this again. And to have a backup on
the 'net :-).

Great to hear. I hope to find some time to review this. In the
meanwhile here is a brain dump of my thoughts on the subject.

I switched to using a separate counter for CSNs. CSN is no longer the same
as the commit WAL record's LSN. While I liked the conceptual simplicity of
CSN == LSN a lot, and the fact that the standby would see the same commit
order as master, I couldn't figure out how to make async commits to work.

I had an idea how that would work. In short lastCommitSeqNo would be a
vector clock. Each tier of synchronization requiring different amount
of waiting between xlog insert and visibility would get a position in
the vector. Commits then need to be ordered only with respect to other
commits within the same tier. The vector position also needs to be
stored in the CSN log, probably by appropriating a couple of bits from
the CSN value.

I'm not sure if doing it this way would be worth the complexity, but
as far as I can see it would work. However I think keeping CSN and LSN
separate is a good idea, as it keeps the door open to using an
external source for the CSN value, enabling stuff like cheap multi
master globally consistent snapshots.

Next steps:

* Hot standby feedback is broken, now that CSN != LSN again. Will have to
switch this back to using an "oldest XID", rather than a CSN.

* I plan to replace pg_subtrans with a special range of CSNs in the csnlog.
Something like, start the CSN counter at 2^32 + 1, and use CSNs < 2^32 to
mean "this is a subtransaction, parent is XXX". One less SLRU to maintain.

That's a great idea. I had a similar idea to enable lock-free writing
into data structures, have 2^32 COMMITSEQNO_INPROGRESS values that
embed the xid owning the slot. This way a writer can just blindly CAS
into data structures without worrying about overwriting useful stuff
due to race conditions.

* Put per-proc xmin back into procarray. I removed it, because it's not
necessary for snapshots or GetOldestSnapshot() (which replaces
GetOldestXmin()) anymore. But on second thoughts, we still need it for
deciding when it's safe to truncate the csnlog.

* In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an "oldest
CSN", instead of "oldest xmin", but that's not strictly necessary. To limit
the size of the patch, I might revert those changes for now.

* Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in his
review comments two years ago, that was quite complicated. And I'm worried
that the lazy scheme I had might not allow pruning fast enough. I plan to
make it more aggressive, so that whenever the currently oldest transaction
finishes, it's responsible for advancing the "global xmin" in shared memory.
And the way it does that, is by scanning the csnlog, starting from the
current "global xmin", until the next still in-progress XID. That could be a
lot, if you have a very long-running transaction that ends, but we'll see
how it performs.

I had a similar idea. This would be greatly improved by a hybrid
snapshot scheme that could make the tail end of the CSN log a lot
sparser. More on that below.

* Performance testing. Clearly this should have a performance benefit, at
least under some workloads, to be worthwhile. And not regress.

I guess this is the main question. GetSnapshotData is probably going
to be faster, but the second important benefit that I saw was the
possibility of reducing locking around common activities. This of
course needs to guided by profiling - evils of premature optimization
and all that. But that said, my gut tells me that the most important
spots are:

* Looking up CSNs for XidVisibleInSnashot. Current version gets a
shared lock on CSNLogControlLock. That might get nasty, for example
when two concurrent sequential scans running on different cores or
sockets hit a patch of transactions within their xmin-xmax interval.

* GetSnapshotData. With tps on read only queries potentially pushing
into the millions, it would be awfully nice if contending on a single
cache line could be avoided. Currently it acquires ProcArrayLock and
CommitSeqNoLock to atomically update MyPgXact->snapshotcsn. The
pattern I had in mind to make this lock free would be to read a CSN,
publish value, check against latest OldestGlobalSnapshot value and
loop if necessary, and on the other side, calculate optimistic oldest
snapshot, publish and then recheck, if necessary, republish.

* While commits are not as performance critical it may still be
beneficial to defer clog updates and perform them in larger batches to
reduce clog lock traffic.

I think I have achieved some clarity on my idea of snapshots that
migrate between being CSN and XID based. The essential problem with
old CSN based snapshots is that the CSN log for their xmin-xmax
interval needs to be kept around in a quick to access datastructure.
And the problem with long running transactions is twofold, first is
that they need a well defined location where to publish the visibility
info for their xids and secondly, they enlarge the xmin-xmax interval
of all concurrent snapshots, needing a potentially huge amount of CSN
log to be kept around. One way to deal with it is to just accept it
and use a SLRU as in this patch.

My new insight is that a snapshot doesn't need to be either-or CSN or
XIP (xid in progress) array based, but it can also be a hybrid. There
would be a sorted array of in progress xids and a non-overlapping CSN
xmin-xmax interval where CSN log needs to be consulted. As snapshots
age they scan the CSN log and incrementally build their XIP array,
basically lazily constructing same data structure used in snapshots
today. Old in progress xids need a slot for themselves to be kept
around, but all new snapshots being taken would immediately classify
them as old, store them in XIP and not include them in their CSN xmin.
Under this scheme the amount of CSN log strictly needed is a
reasonably sized ring buffer for recent xids (probably sized based on
max conn), a sparse map for long transactions (bounded by max conn)
and some slack for snapshots still being converted. A SLRU or similar
is still needed because there is no way to ensure timely conversion of
all snapshots, but by properly timing the notification the probability
of actually using it should be extremely low. Datastructure
maintenance operations occur naturally on xid assignment and are
easily batched. Long running transactions still affect all snapshots,
but the effect is small and not dependent on transaction age, old
snapshots only affect themselves. Subtransactions are still a pain,
and would need something similar to the current suboverflowed scheme.
On the plus side, it seems like the number of subxids to overflow
could be global, not per backend.

In short:

* A directly mapped ring buffer for recent xids could be accessed lock
free, would take care most of the traffic in most of the cases. Looks
like it would be a good trade-off for complexity/performance.

* To keep workloads with wildly varying transaction lengths in bounded
amount of memory, a significantly more complex hybrid snapshot scheme
is needed. It remains to be seen if these workloads are actually a
significant issue. But if they are, the approach described might
provide a way out.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 5071914.280616.1470745048914@RIA

#70

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Ants Aasma (#69)

Re: Proposal for CSN based snapshots

On 08/10/2016 01:03 PM, Ants Aasma wrote:

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(Reviving an old thread)

I spent some time dusting off this old patch, to implement CSN snapshots.
Attached is a new patch, rebased over current master, and with tons of
comments etc. cleaned up. There's more work to be done here, I'm posting
this to let people know I'm working on this again. And to have a backup on
the 'net :-).

Great to hear. I hope to find some time to review this. In the
meanwhile here is a brain dump of my thoughts on the subject.

Thanks!

* Performance testing. Clearly this should have a performance benefit, at
least under some workloads, to be worthwhile. And not regress.

I guess this is the main question. GetSnapshotData is probably going
to be faster, but the second important benefit that I saw was the
possibility of reducing locking around common activities. This of
course needs to guided by profiling - evils of premature optimization
and all that.

Yep.

For the record, one big reason I'm excited about this is that a snapshot
can be represented as small fixed-size value. And the reason that's
exciting is that it makes it more feasible for each backend to publish
their snapshots, which in turn makes it possible to vacuum old tuples
more aggressively. In particular, if you have one very long-running
transaction (think pg_dump), and a stream of transactions, you could
vacuum away tuples that were inserted after the long-running transaction
started and deleted before any of the shorter transactions started. You
could do that without CSN snapshots, but it's a lot simpler with them.
But that's a separate story.

But that said, my gut tells me that the most important spots are:

* Looking up CSNs for XidVisibleInSnashot. Current version gets a
shared lock on CSNLogControlLock. That might get nasty, for example
when two concurrent sequential scans running on different cores or
sockets hit a patch of transactions within their xmin-xmax interval.

I'm not too worried about that particular case. Shared LWLocks scale
pretty well these days, thanks to Andres Freund's work in this area. A
mix of updates and reads on the csnlog might become a scalability
problem though. We'll find out once we start testing.

* GetSnapshotData. With tps on read only queries potentially pushing
into the millions, it would be awfully nice if contending on a single
cache line could be avoided. Currently it acquires ProcArrayLock and
CommitSeqNoLock to atomically update MyPgXact->snapshotcsn. The
pattern I had in mind to make this lock free would be to read a CSN,
publish value, check against latest OldestGlobalSnapshot value and
loop if necessary, and on the other side, calculate optimistic oldest
snapshot, publish and then recheck, if necessary, republish.

Yeah, I haven't spent any serious effort into optimizing this yet. For
the final version, should definitely do something like that.

* While commits are not as performance critical it may still be
beneficial to defer clog updates and perform them in larger batches to
reduce clog lock traffic.

Hmm. I doubt batching them is going to help much. But it should be
straightforward to use atomic operations here too, to reduce locking.

I think I have achieved some clarity on my idea of snapshots that
migrate between being CSN and XID based. The essential problem with
old CSN based snapshots is that the CSN log for their xmin-xmax
interval needs to be kept around in a quick to access datastructure.
And the problem with long running transactions is twofold, first is
that they need a well defined location where to publish the visibility
info for their xids and secondly, they enlarge the xmin-xmax interval
of all concurrent snapshots, needing a potentially huge amount of CSN
log to be kept around. One way to deal with it is to just accept it
and use a SLRU as in this patch.

My new insight is that a snapshot doesn't need to be either-or CSN or
XIP (xid in progress) array based, but it can also be a hybrid. There
would be a sorted array of in progress xids and a non-overlapping CSN
xmin-xmax interval where CSN log needs to be consulted. As snapshots
age they scan the CSN log and incrementally build their XIP array,
basically lazily constructing same data structure used in snapshots
today. Old in progress xids need a slot for themselves to be kept
around, but all new snapshots being taken would immediately classify
them as old, store them in XIP and not include them in their CSN xmin.
Under this scheme the amount of CSN log strictly needed is a
reasonably sized ring buffer for recent xids (probably sized based on
max conn), a sparse map for long transactions (bounded by max conn)
and some slack for snapshots still being converted. A SLRU or similar
is still needed because there is no way to ensure timely conversion of
all snapshots, but by properly timing the notification the probability
of actually using it should be extremely low. Datastructure
maintenance operations occur naturally on xid assignment and are
easily batched. Long running transactions still affect all snapshots,
but the effect is small and not dependent on transaction age, old
snapshots only affect themselves. Subtransactions are still a pain,
and would need something similar to the current suboverflowed scheme.
On the plus side, it seems like the number of subxids to overflow
could be global, not per backend.

Yeah, if the csnlog access turns out to be too expensive, we could do
something like this. In theory, you can always convert a CSN snapshot
into an old-style list of XIDs, by scanning the csnlog between the xmin
and xmax. That could be expensive if the distance between xmin and xmax
is large, of course. But as you said, you can have various hybrid forms,
where you use a list of XIDs of some range as a cache, for example.

I'm hopeful that we can simply make the csnlog access fast enough,
though. Looking up an XID in a sorted array is O(log n), while looking
up an XID in the csnlog is O(1). That ignores all locking and different
constant factors, of course, but it's not a given that accessing the
csnlog has to be slower than a binary search of an XID array.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Heikki Linnakangas (#68)

Re: Proposal for CSN based snapshots

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

(Reviving an old thread)

I spent some time dusting off this old patch, to implement CSN snapshots.
Attached is a new patch, rebased over current master, and with tons of
comments etc. cleaned up. There's more work to be done here, I'm posting
this to let people know I'm working on this again. And to have a backup on
the 'net :-).

Great! It's very nice seeing that you're working on this patch.
After PGCON I've your patch rebased to the master, but you already did much
more.

I switched to using a separate counter for CSNs. CSN is no longer the same
as the commit WAL record's LSN. While I liked the conceptual simplicity of
CSN == LSN a lot, and the fact that the standby would see the same commit
order as master, I couldn't figure out how to make async commits to work.

I didn't get async commits problem at first glance. AFAICS, the difference
between sync commit and async is only that async commit doesn't wait WAL
log to flush. But async commit still receives LSN.
Could you describe it in more details?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#72

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Heikki Linnakangas (#70)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 2:10 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Yeah, if the csnlog access turns out to be too expensive, we could do
something like this. In theory, you can always convert a CSN snapshot into
an old-style list of XIDs, by scanning the csnlog between the xmin and
xmax. That could be expensive if the distance between xmin and xmax is
large, of course. But as you said, you can have various hybrid forms, where
you use a list of XIDs of some range as a cache, for example.

I'm hopeful that we can simply make the csnlog access fast enough, though.
Looking up an XID in a sorted array is O(log n), while looking up an XID in
the csnlog is O(1). That ignores all locking and different constant
factors, of course, but it's not a given that accessing the csnlog has to
be slower than a binary search of an XID array.

FYI, I'm still fan of idea to rewrite XID with CSN in tuple in the same way
we're writing hint bits now.
I'm going to make prototype of this approach which would be enough for
performance measurements.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#73

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Alexander Korotkov (#71)

Re: Proposal for CSN based snapshots

On 08/10/2016 04:34 PM, Alexander Korotkov wrote:

On Tue, Aug 9, 2016 at 3:16 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

I switched to using a separate counter for CSNs. CSN is no longer the same
as the commit WAL record's LSN. While I liked the conceptual simplicity of
CSN == LSN a lot, and the fact that the standby would see the same commit
order as master, I couldn't figure out how to make async commits to work.

I didn't get async commits problem at first glance. AFAICS, the difference
between sync commit and async is only that async commit doesn't wait WAL
log to flush. But async commit still receives LSN.
Could you describe it in more details?

Imagine that you have a stream of normal, synchronous, commits. They get
assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in
that order.

The way I described this scheme in the first emails on this thread, was
to use the current WAL insertion position as the snapshot. That's not
correct, though: you have to use the current WAL *flush* position as the
snapshot. Otherwise you would see the results of a transaction that
hasn't been flushed to disk yet, i.e. which might still get lost, if you
pull the power plug before the flush happens. So you have to use the
last flush position as the snapshot.

Now, if you do an asynchronous commit, the effects of that should become
visible immediately, without waiting for the next flush. You can't do
that, if its CSN == LSN.

Perhaps you could make an exception for async commits, so that the CSN
of an async commit is not the commit record's LSN, but the current WAL
flush position, so that it becomes visible to others immediately. But
then, how do you distinguish two async transactions that commit roughly
at the same time, so that the flush position at time of commit is the
same for both? And now you've given up on the CSN == LSN property, for
async commits, anyway.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#74

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Heikki Linnakangas (#73)

Re: Proposal for CSN based snapshots

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Imagine that you have a stream of normal, synchronous, commits. They get
assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in
that order.

The way I described this scheme in the first emails on this thread, was
to use the current WAL insertion position as the snapshot. That's not
correct, though: you have to use the current WAL *flush* position as the
snapshot. Otherwise you would see the results of a transaction that
hasn't been flushed to disk yet, i.e. which might still get lost, if you
pull the power plug before the flush happens. So you have to use the
last flush position as the snapshot.

Uh, what? That's not the semantics we have today, and I don't see why
it's necessary or a good idea. Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream. So what's the problem?

Now, if you do an asynchronous commit, the effects of that should become
visible immediately, without waiting for the next flush. You can't do
that, if its CSN == LSN.

This distinction is completely arbitrary, and unlike the way it works now.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#75

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Tom Lane (#74)

Re: Proposal for CSN based snapshots

On 08/10/2016 05:09 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

Imagine that you have a stream of normal, synchronous, commits. They get
assigned LSNs: 1, 2, 3, 4. They become visible to other transactions in
that order.

The way I described this scheme in the first emails on this thread, was
to use the current WAL insertion position as the snapshot. That's not
correct, though: you have to use the current WAL *flush* position as the
snapshot. Otherwise you would see the results of a transaction that
hasn't been flushed to disk yet, i.e. which might still get lost, if you
pull the power plug before the flush happens. So you have to use the
last flush position as the snapshot.

Uh, what? That's not the semantics we have today, and I don't see why
it's necessary or a good idea. Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream. So what's the problem?

I was talking about synchronous commits in the above. A synchronous
commit is not made visible to other transactions, until the commit WAL
record is flushed to disk.

You could argue that that doesn't need to be so, because indeed any
action taken on the basis of seeing the commit must be later in the WAL
stream. But that's what asynchronous commits are for. For synchronous
commits, we have a higher standard.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#76

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Heikki Linnakangas (#75)

Re: Proposal for CSN based snapshots

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 08/10/2016 05:09 PM, Tom Lane wrote:

Uh, what? That's not the semantics we have today, and I don't see why
it's necessary or a good idea. Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream. So what's the problem?

I was talking about synchronous commits in the above. A synchronous
commit is not made visible to other transactions, until the commit WAL
record is flushed to disk.

[ thinks for a bit... ] Oh, OK, that's because we don't treat a
transaction as committed until its clog bit is set *and* it's not
marked as running in the PGPROC array. And sync transactions will
flush WAL in between.

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first. If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs. I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#77

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Tom Lane (#76)

Re: Proposal for CSN based snapshots

On 08/10/2016 05:51 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 08/10/2016 05:09 PM, Tom Lane wrote:

Uh, what? That's not the semantics we have today, and I don't see why
it's necessary or a good idea. Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream. So what's the problem?

I was talking about synchronous commits in the above. A synchronous
commit is not made visible to other transactions, until the commit WAL
record is flushed to disk.

[ thinks for a bit... ] Oh, OK, that's because we don't treat a
transaction as committed until its clog bit is set *and* it's not
marked as running in the PGPROC array. And sync transactions will
flush WAL in between.

Right.

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first. If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs. I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

I don't think that's a very popular opinion (I disagree, for one).
Asynchronous commits are a huge performance boost for some applications.
The alternative is fsync=off, and I don't want to see more people doing
that. SSDs have made the penalty of an fsync much smaller, but it's
still there.

Hmm. There's one more possible way this could all work. Let's have CSN
== LSN, also for asynchronous commits. A snapshot is the current insert
position, but also make note of the current flush position, when you
take a snapshot. Now, when you use the snapshot, if you ever see an XID
that committed between the snapshot's insert position and the flush
position, wait for the WAL to be flushed up to the snapshot's insert
position at that point. With that scheme, an asynchronous commit could
return to the application without waiting for a flush, but if someone
actually looks at the changes the transaction made, then that
transaction would have to wait. Furthermore, we could probably skip that
waiting too, if the reading transaction is also using
synchronous_commit=off.

That's slightly different from the current behaviour. A transaction that
runs with synchronous_commit=on, and reads data that was modified by an
asynchronous transaction, would take a hit. But I think that would be
acceptable.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#78

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#77)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first. If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs. I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

I don't think that's a very popular opinion (I disagree, for one).
Asynchronous commits are a huge performance boost for some applications. The
alternative is fsync=off, and I don't want to see more people doing that.
SSDs have made the penalty of an fsync much smaller, but it's still there.

Uh, yeah. Asynchronous commit can be 100 times faster on some
realistic workloads. If we remove it, many people will have to decide
between running with fsync=off and abandoning PostgreSQL altogether.
That doesn't strike me as a remotely possible line of attack.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#79

Joshua D. Drake

jd@commandprompt.com

over 9 years ago

In reply to: Robert Haas (#78)

Re: Proposal for CSN based snapshots

On 08/10/2016 08:28 AM, Robert Haas wrote:

On Wed, Aug 10, 2016 at 11:09 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Still, having to invent CSNs seems like a huge loss for this design.
Personally I'd give up async commit first. If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs. I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

I don't think that's a very popular opinion (I disagree, for one).
Asynchronous commits are a huge performance boost for some applications. The
alternative is fsync=off, and I don't want to see more people doing that.
SSDs have made the penalty of an fsync much smaller, but it's still there.

Uh, yeah. Asynchronous commit can be 100 times faster on some
realistic workloads. If we remove it, many people will have to decide
between running with fsync=off and abandoning PostgreSQL altogether.
That doesn't strike me as a remotely possible line of attack.

+1 for Robert here, removing async commit is a non-starter. It is
PostgreSQL performance 101 that you disable synchronous commit unless
you have a specific data/business requirement that needs it.
Specifically because of how much faster Pg is with async commit.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#80

Stephen Frost

sfrost@snowman.net

over 9 years ago

In reply to: Joshua D. Drake (#79)

Re: Proposal for CSN based snapshots

* Joshua D. Drake (jd@commandprompt.com) wrote:

+1 for Robert here, removing async commit is a non-starter. It is
PostgreSQL performance 101 that you disable synchronous commit
unless you have a specific data/business requirement that needs it.
Specifically because of how much faster Pg is with async commit.

I agree that we don't want to get rid of async commit, but, for the
archive, I wouldn't recommend using it unless you specifically understand
and accept that trade-off, so I wouldn't lump it into a "PostgreSQL
performance 101" group- that's increasing work_mem, shared_buffers, WAL
size, etc. Accepting that you're going to lose *committed* transactions
on a crash requires careful thought and consideration of what you're
going to do when that happens, not the other way around.

Thanks!

Stephen

#81

Joshua D. Drake

jd@commandprompt.com

over 9 years ago

In reply to: Stephen Frost (#80)

Re: Proposal for CSN based snapshots

On 08/10/2016 09:04 AM, Stephen Frost wrote:

* Joshua D. Drake (jd@commandprompt.com) wrote:

+1 for Robert here, removing async commit is a non-starter. It is
PostgreSQL performance 101 that you disable synchronous commit
unless you have a specific data/business requirement that needs it.
Specifically because of how much faster Pg is with async commit.

I agree that we don't want to get rid of async commit, but, for the
archive, I wouldn't recommend using it unless you specifically understand
and accept that trade-off, so I wouldn't lump it into a "PostgreSQL
performance 101" group- that's increasing work_mem, shared_buffers, WAL
size, etc. Accepting that you're going to lose *committed* transactions
on a crash requires careful thought and consideration of what you're
going to do when that happens, not the other way around.

Yes Stephen, you are correct which is why I said, "unless you have a
specific data/business requirement that needs it".

Thanks!

Thanks!

Stephen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#82

Ants Aasma

ants.aasma@eesti.ee

over 9 years ago

In reply to: Markus Wanner (#7)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Hmm. There's one more possible way this could all work. Let's have CSN ==
LSN, also for asynchronous commits. A snapshot is the current insert
position, but also make note of the current flush position, when you take a
snapshot. Now, when you use the snapshot, if you ever see an XID that
committed between the snapshot's insert position and the flush position,
wait for the WAL to be flushed up to the snapshot's insert position at that
point. With that scheme, an asynchronous commit could return to the
application without waiting for a flush, but if someone actually looks at
the changes the transaction made, then that transaction would have to wait.
Furthermore, we could probably skip that waiting too, if the reading
transaction is also using synchronous_commit=off.

That's slightly different from the current behaviour. A transaction that
runs with synchronous_commit=on, and reads data that was modified by an
asynchronous transaction, would take a hit. But I think that would be
acceptable.

My proposal of vector clocks would allow for pretty much exactly
current behavior.

To simplify, there would be lastSyncCommitSeqNo and
lastAsyncCommitSeqNo variables in ShmemVariableCache. Transaction
commit would choose which one to update based on synchronous_commit
setting and embed the value of the setting into CSN log. Snapshots
would contain both values, when checking for CSN visibility use the
value of the looked up synchronous_commit setting to decide which
value to compare against. Standby's replaying commit records would
just update both values, resulting in transactions becoming visible in
xlog order, as they do today. The scheme would allow for inventing a
new xlog record/replication message communicating visibility ordering.

However I don't see why inventing a separate CSN concept is a large
problem. Quite the opposite, unless there is a good reason that I'm
missing, it seems better to not unnecessarily conflate commit record
durability and transaction visibility ordering. Not having them tied
together allows for an external source to provide CSN values, allowing
for interesting distributed transaction implementations. E.g. using a
timestamp as the CSN a'la Google Spanner and the TrueTime API.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 1262464946.304995.1470841778431@RIA

#83

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Heikki Linnakangas (#77)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 6:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 08/10/2016 05:51 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnaka@iki.fi> writes:

On 08/10/2016 05:09 PM, Tom Lane wrote:

Uh, what? That's not the semantics we have today, and I don't see why
it's necessary or a good idea. Once the commit is in the WAL stream,
any action taken on the basis of seeing the commit must be later in
the WAL stream. So what's the problem?

I was talking about synchronous commits in the above. A synchronous

commit is not made visible to other transactions, until the commit WAL
record is flushed to disk.

[ thinks for a bit... ] Oh, OK, that's because we don't treat a
transaction as committed until its clog bit is set *and* it's not
marked as running in the PGPROC array. And sync transactions will
flush WAL in between.

Right.

Still, having to invent CSNs seems like a huge loss for this design.

Personally I'd give up async commit first. If we had only sync commit,
the rule could be "xact LSN less than snapshot threshold and less than
WAL flush position", and we'd not need CSNs. I know some people like
async commit, but it's there because it was easy and cheap in our old
design, not because it's the world's greatest feature and worth giving
up performance for.

I don't think that's a very popular opinion (I disagree, for one).
Asynchronous commits are a huge performance boost for some applications.
The alternative is fsync=off, and I don't want to see more people doing
that. SSDs have made the penalty of an fsync much smaller, but it's still
there.

Hmm. There's one more possible way this could all work. Let's have CSN ==
LSN, also for asynchronous commits. A snapshot is the current insert
position, but also make note of the current flush position, when you take a
snapshot. Now, when you use the snapshot, if you ever see an XID that
committed between the snapshot's insert position and the flush position,
wait for the WAL to be flushed up to the snapshot's insert position at that
point. With that scheme, an asynchronous commit could return to the
application without waiting for a flush, but if someone actually looks at
the changes the transaction made, then that transaction would have to wait.
Furthermore, we could probably skip that waiting too, if the reading
transaction is also using synchronous_commit=off.

That's slightly different from the current behaviour. A transaction that
runs with synchronous_commit=on, and reads data that was modified by an
asynchronous transaction, would take a hit. But I think that would be
acceptable.

Oh, I found that I underestimated complexity of async commit... :)

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.
2) Other transactions sees effect of async transaction only when its WAL
flushed.
3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#84

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Alexander Korotkov (#83)

Re: Proposal for CSN based snapshots

On 08/10/2016 07:54 PM, Alexander Korotkov wrote:

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.

Yes.

2) Other transactions sees effect of async transaction only when its WAL
flushed.

No. Other transactions also see the effects of the async transaction
before it's flushed.

3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?

Yes. (The same session is not a special case, per previous point. All
sessions see the async transaction as committed, even before it's flushed.)

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#85

Greg Stark

stark@mit.edu

over 9 years ago

In reply to: Alexander Korotkov (#83)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Oh, I found that I underestimated complexity of async commit... :)

Indeed. I think Tom's attitude was right even if the specific
conclusion was wrong. While I don't think removing async commit is
viable I think it would be a laudable goal if we can remove some of
the complication in it. I normally describe async commit as "just like
a normal commit but don't block on the commit" but it's actually a bit
more complicated.

AIUI the additional complexity is that while async commits are visible
to everyone immediately other non-async transactions can be committed
but then be in limbo for a while before they are visible to others. So
other sessions will see the async commit "jump ahead" of any non-async
transactions even if those other transactions were committed first.
Any standbys will see the non-async transaction in the logs before the
async transaction and in a crash it's possible to lose the async
transaction even though it was visible but not the sync transaction
that wasn't.

Complexity like this makes it hard to implement other features such as
CSNs. IIRC this already bit hot standby as well. I think it would be a
big improvement if we had a clear, well defined commit order that was
easy to explain and easy to reason about when new changes are being
made.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#86

Ants Aasma

ants.aasma@eesti.ee

over 9 years ago

In reply to: Markus Wanner (#7)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 7:54 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Oh, I found that I underestimated complexity of async commit... :)

Do I understand right that now async commit right as follows?
1) Async transaction confirms commit before flushing WAL.
2) Other transactions sees effect of async transaction only when its WAL
flushed.
3) In the session which just committed async transaction, effect of this
transaction is visible immediately (before WAL flushed). Is it true?

Current code simplified:

XactLogCommitRecord()
if (synchronous_commit)
XLogFlush()
ProcArrayEndTransaction() // Become visible

The issue we are discussing is that with CSNs, the "become visible"
portion must occur in CSN order. If CSN == LSN, then async
transactions that have their commit record after a sync record must
wait for the sync record to flush and become visible. Simplest
solution is to not require CSN == LSN and just assign a CSN value
immediately before becoming visible.

Regards,
Ants Aasma

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: 370000634.306005.1470848083558@RIA

#87

Kevin Grittner

kgrittn@gmail.com

over 9 years ago

In reply to: Greg Stark (#85)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 12:26 PM, Greg Stark <stark@mit.edu> wrote:

Complexity like this makes it hard to implement other features such as
CSNs. IIRC this already bit hot standby as well. I think it would be a
big improvement if we had a clear, well defined commit order that was
easy to explain and easy to reason about when new changes are being
made.

And here I was getting concerned that there was no mention of
"apparent order of execution" for serializable transactions --
which does not necessarily match either the order of LSNs from
commit records nor CSNs. The order in which transactions become
visible is clearly a large factor in determining AOoE, but it is
secondary to looking at whether a transaction modified data based
on reading the "before" image of a data set modified by a
concurrent transaction.

I still think that our best bet for avoiding anomalies when using
logical replication in complex environments is for logical
replication to apply transactions in apparent order of execution.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#88

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Greg Stark (#85)

Re: Proposal for CSN based snapshots

On Wed, Aug 10, 2016 at 8:26 PM, Greg Stark <stark@mit.edu> wrote:

On Wed, Aug 10, 2016 at 5:54 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

Oh, I found that I underestimated complexity of async commit... :)

Indeed. I think Tom's attitude was right even if the specific
conclusion was wrong. While I don't think removing async commit is
viable I think it would be a laudable goal if we can remove some of
the complication in it. I normally describe async commit as "just like
a normal commit but don't block on the commit" but it's actually a bit
more complicated.

AIUI the additional complexity is that while async commits are visible
to everyone immediately other non-async transactions can be committed
but then be in limbo for a while before they are visible to others. So
other sessions will see the async commit "jump ahead" of any non-async
transactions even if those other transactions were committed first.
Any standbys will see the non-async transaction in the logs before the
async transaction and in a crash it's possible to lose the async
transaction even though it was visible but not the sync transaction
that wasn't.

Complexity like this makes it hard to implement other features such as
CSNs. IIRC this already bit hot standby as well. I think it would be a
big improvement if we had a clear, well defined commit order that was
easy to explain and easy to reason about when new changes are being
made.

Heikki, Ants, Greg, thank you for the explanation. You restored order in
my personal world.
Now I see that introduction of own sequence of CSN which is not equal to
LSN makes sense.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#89

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Heikki Linnakangas (#68)

Re: Proposal for CSN based snapshots

And here's a new patch version. Still lots of work to do, especially in
performance testing, and minimizing the worst-case performance hit.

On 08/09/2016 03:16 PM, Heikki Linnakangas wrote:

Next steps:

* Hot standby feedback is broken, now that CSN != LSN again. Will have
to switch this back to using an "oldest XID", rather than a CSN.

* I plan to replace pg_subtrans with a special range of CSNs in the
csnlog. Something like, start the CSN counter at 2^32 + 1, and use CSNs
< 2^32 to mean "this is a subtransaction, parent is XXX". One less SLRU
to maintain.

* Put per-proc xmin back into procarray. I removed it, because it's not
necessary for snapshots or GetOldestSnapshot() (which replaces
GetOldestXmin()) anymore. But on second thoughts, we still need it for
deciding when it's safe to truncate the csnlog.

* In this patch, HeapTupleSatisfiesVacuum() is rewritten to use an
"oldest CSN", instead of "oldest xmin", but that's not strictly
necessary. To limit the size of the patch, I might revert those changes
for now.

I did all of the above. This patch is now much smaller, as I didn't
change all the places that used to deal with global-xmin's, like I did
earlier. The oldest-xmin is now computed pretty like it always has been.

* Rewrite the way RecentGlobalXmin is updated. As Alvaro pointed out in
his review comments two years ago, that was quite complicated. And I'm
worried that the lazy scheme I had might not allow pruning fast enough.
I plan to make it more aggressive, so that whenever the currently oldest
transaction finishes, it's responsible for advancing the "global xmin"
in shared memory. And the way it does that, is by scanning the csnlog,
starting from the current "global xmin", until the next still
in-progress XID. That could be a lot, if you have a very long-running
transaction that ends, but we'll see how it performs.

I ripped out all that, and created a GetRecentGlobalXmin() function that
computes a global-xmin value when needed, like GetOldestXmin() does.
Seems most straightforward. Since we no longer get a RecentGlobalXmin
value essentially for free in GetSnapshotData(), as we no longer scan
the proc array, it's better to compute the value only when needed.

* Performance testing. Clearly this should have a performance benefit,
at least under some workloads, to be worthwhile. And not regress.

I wrote a little C module to create a "worst-case" table. Every row in
the table has a different xmin, and the xmin values are shuffled across
the table, to defeat any caching.

A sequential scan of a table like that with 10 million rows took about
700 ms on my laptop, when the hint bits are set, without this patch.
With this patch, if there's a snapshot holding back the xmin horizon, so
that we need to check the CSN log for every XID, it took about 30000 ms.
So we have some optimization work to do :-). I'm not overly worried
about that right now, as I think there's a lot of room for improvement
in the SLRU code. But that's the next thing I'm going to work.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#90

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Heikki Linnakangas (#89)

1 attachment(s)

Re: Proposal for CSN based snapshots

On 08/22/2016 07:35 PM, Heikki Linnakangas wrote:

And here's a new patch version...

And here's the attachment I forgot, *sigh*..

- Heikki

#91

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#89)

Re: Proposal for CSN based snapshots

Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

A sequential scan of a table like that with 10 million rows took about 700
ms on my laptop, when the hint bits are set, without this patch. With this
patch, if there's a snapshot holding back the xmin horizon, so that we need
to check the CSN log for every XID, it took about 30000 ms. So we have some
optimization work to do :-). I'm not overly worried about that right now, as
I think there's a lot of room for improvement in the SLRU code. But that's
the next thing I'm going to work.

So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case? If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#92

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Robert Haas (#91)

Re: Proposal for CSN based snapshots

On 08/22/2016 07:49 PM, Robert Haas wrote:

Nice to see you working on this again.

On Mon, Aug 22, 2016 at 12:35 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

A sequential scan of a table like that with 10 million rows took about 700
ms on my laptop, when the hint bits are set, without this patch. With this
patch, if there's a snapshot holding back the xmin horizon, so that we need
to check the CSN log for every XID, it took about 30000 ms. So we have some
optimization work to do :-). I'm not overly worried about that right now, as
I think there's a lot of room for improvement in the SLRU code. But that's
the next thing I'm going to work.

So the worst case for this patch is obviously bad right now and, as
you say, that means that some optimization work is needed.

But what about the best case? If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

I ran some quick pgbench tests on my laptop, but didn't see any
meaningful benefit. I think the best I could see is about 5% speedup,
when running "pgbench -S", with 900 idle connections sitting in the
background. On the positive side, I didn't see much slowdown either.
(Sorry, I didn't record the details of those tests, as I was testing
many different options and I didn't see a clear difference either way.)

It seems that Amit's PGPROC batch clearing patch was very effective. I
remember seeing ProcArrayLock contention very visible earlier, but I
can't hit that now. I suspect you'd still see contention on bigger
hardware, though, my laptop has oly 4 cores. I'll have to find a real
server for the next round of testing.

Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.

Aside from the potential performance gains, this slashes a lot of
complicated code:

70 files changed, 2429 insertions(+), 6066 deletions(-)

That removed code is quite mature at this point, and I'm sure we'll add
some code back to this patch as it evolves, but still.

Also, I'm looking forward for a follow-up patch, to track snapshots in
backends at a finer level, so that vacuum could remove tuples more
aggressively, if you have pg_dump running for days. CSN snapshots isn't
a strict requirement for that, but it makes it simpler, when you can
represent a snapshot with a small fixed-size integer.

Yes, seeing some direct performance gains would be nice too.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#93

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Heikki Linnakangas (#92)

Re: Proposal for CSN based snapshots

Hi,

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I ran some quick pgbench tests on my laptop, but didn't see any meaningful
benefit. I think the best I could see is about 5% speedup, when running
"pgbench -S", with 900 idle connections sitting in the background. On the
positive side, I didn't see much slowdown either. (Sorry, I didn't record
the details of those tests, as I was testing many different options and I
didn't see a clear difference either way.)

Hm. Does the picture change if those are idle in transaction, after
assigning an xid.

It seems that Amit's PGPROC batch clearing patch was very effective.

It usually breaks down if you have a mixed read/write workload - might
be worthehile prototyping that.

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#94

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Heikki Linnakangas (#92)

Re: Proposal for CSN based snapshots

On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

But what about the best case? If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

I ran some quick pgbench tests on my laptop, but didn't see any meaningful
benefit. I think the best I could see is about 5% speedup, when running
"pgbench -S", with 900 idle connections sitting in the background. On the
positive side, I didn't see much slowdown either. (Sorry, I didn't record
the details of those tests, as I was testing many different options and I
didn't see a clear difference either way.)

That's not very exciting.

It seems that Amit's PGPROC batch clearing patch was very effective. I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

It's good that those patches were effective, and I bet that approach
could be further refined, too. However, I think Amit may have
mentioned in an internal meeting that he was able to generate some
pretty serious ProcArrayLock contention with some of his hash index
patches applied. I don't remember the details, though.

Because there's only a purpose to trying to minimize the losses if
there are some gains to which we can look forward.

Aside from the potential performance gains, this slashes a lot of
complicated code:

70 files changed, 2429 insertions(+), 6066 deletions(-)

That removed code is quite mature at this point, and I'm sure we'll add some
code back to this patch as it evolves, but still.

That's interesting, but it might just mean we're replacing well-tested
code with new, buggy code. By the time you fix all the performance
regressions, those numbers could be a lot closer together.

Also, I'm looking forward for a follow-up patch, to track snapshots in
backends at a finer level, so that vacuum could remove tuples more
aggressively, if you have pg_dump running for days. CSN snapshots isn't a
strict requirement for that, but it makes it simpler, when you can represent
a snapshot with a small fixed-size integer.

That would certainly be nice, but I think we need to be careful not to
sacrifice too much trying to get there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#95

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Robert Haas (#94)

Re: Proposal for CSN based snapshots

On 2016-08-22 13:41:57 -0400, Robert Haas wrote:

On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

But what about the best case? If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

I ran some quick pgbench tests on my laptop, but didn't see any meaningful
benefit. I think the best I could see is about 5% speedup, when running
"pgbench -S", with 900 idle connections sitting in the background. On the
positive side, I didn't see much slowdown either. (Sorry, I didn't record
the details of those tests, as I was testing many different options and I
didn't see a clear difference either way.)

That's not very exciting.

I think it's neither exciting nor worrying - the benefit of the pgproc
batch clearing itself wasn't apparent on small hardware either...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#96

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Robert Haas (#94)

Re: Proposal for CSN based snapshots

On Mon, Aug 22, 2016 at 11:11 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 22, 2016 at 1:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

But what about the best case? If we create a scenario where there are
no open read-write transactions at all and (somehow) lots and lots of
ProcArrayLock contention, how much does this help?

I ran some quick pgbench tests on my laptop, but didn't see any meaningful
benefit. I think the best I could see is about 5% speedup, when running
"pgbench -S", with 900 idle connections sitting in the background. On the
positive side, I didn't see much slowdown either. (Sorry, I didn't record
the details of those tests, as I was testing many different options and I
didn't see a clear difference either way.)

That's not very exciting.

It seems that Amit's PGPROC batch clearing patch was very effective. I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

It's good that those patches were effective, and I bet that approach
could be further refined, too. However, I think Amit may have
mentioned in an internal meeting that he was able to generate some
pretty serious ProcArrayLock contention with some of his hash index
patches applied. I don't remember the details, though.

Yes, thats right. We are seeing ProcArrayLock as a bottleneck at
high-concurrency for hash-index reads. On a quick look, it seems
patch still acquires ProcArrayLock in shared mode in GetSnapshotData()
although duration is reduced. I think it is worth to try the workload
where we are seeing ProcArrayLock as bottleneck, but not sure if it
can drastically improves the situation, because as per our initial
analysis, lock acquisition and release itself causes problem.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#97

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Andres Freund (#93)

1 attachment(s)

Re: Proposal for CSN based snapshots

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.

As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.

- Heikki

Attachments:

clients-sets.pngimage/png; name=clients-sets.pngDownload

�PNG


IHDR��,� PLTE���������������@��Ai��� �@���0`��@�������**��@��333MMMfff�������������������22�������U����������������d�"�".�W��p��������������P����E��r��z�����k������� �����������P@Uk/���@�@��`��`�����@��@��`��p������������������������|�@�� �����K	pHYs���+ZIDATx�������EM����������6�9�M�����d^������+�g��F��&	0���	^�0���"^|�v��w�����yGO�|?�|O�	�L�:�j�e��n����\|���=�G?�*���iv��u[f�__{��\����.��"z�t��Y��k�.�k��a�.���B��y��u�:���nU������A�,������:.��Z?��y�,���N��/����u��_b�����Ss> ���p��}�V! <�����e�M�C$|H8�V����Ii�/]�.r����v����%���_�}�%%��gI#�d��4���  i��L����������!��A@������9n'�������~���T��l�yf�F��j�����������d=��	fM��&+�����������J!+�n��[*j!7.��Yw���!gW2�
�n��-�E�<o	���`�%���]
?�pI��p��:�|����v	�*A@(
BQL	�W5���V�z������@@�i�9��c��;����P�������C�NX|"��(n�I�p�:����7��y�$�t����Z@�����]+��`de���j@����4��
��<�5
��������*D�g:�2�`���P�2"-HZ����w���P�{�����S��*J�������P�mi'�����{J���Y�nui]pe���	�����f4�K��w���{;�kl�:��9<��B�wj�~�?��$��;K�=��y��'����$��W\���O������	Q.�kI;�H�(�y�6]�qn*�NH#�I�3y��J�����a�Y��a&/
&�L������y��@����?�=JK�����������{�v `CD^/��]�;���n���M��}�;L	h��KK��������s��S�����e=���S���K�- \c�k36}?<kM?��a�d4o�W��&�Q0t�n��&/��;����u�R�c�9$m��m������@�*�����J��8O��'!�b������2/|>%����?�������\������P��H����]9���;�6\V&oZ������=��kX@�#����a�V|{��E���><�p���pM��1��d:�4L	h�j���m�\��6|
��&������i9�Y�qK���,���f��o�gy4���Ol��S��2�����i�3E@i���n�E�LM	�&�c�^w����&Z����5��Ok��S*�2�����khc���"�d������}�g����!��{R�	a��[8��j�����K�'�����i�a]�?�	qn��>!i}�����+�m�eN5����O��>�	I��N	�M���~H�'�.8?�k�Z�i>�}B�8�~�>[3�'0
���>���h&�U�V�9���bg�5�6��6M�6���������>X�C����C����[�2*��3&�r-��l�~{qdk�$���I�w�txg��@�u3%`E|wL�����w������&��r|��|gG�P��	J;����������!`�4�
c�W���cJ�J�^����c����}�0T@��9*�Z����������oD�U�������Up��1%�uN��M��������������)�k��4O�������������8��~v�4SZe��U��K:x?�����k����O�*�T�e|i��>��8�x�q ��i��^�.������4u���������D�	�U�~7�P�����o�����3M�H���v��3s���.R>�O�]�2%�\�����E����D��~_R�9I[#�5���+#�s���;��O���~�+CX�[����	�c��'d��(���_g��,�/`�D� 7dh��"�d��=�
�!�����k�E�Ls��9�<��G�eC9����L�M�I;3 `>�����bGb���������P���N?���|�
���/d�NK]�^�L[P�m���:�i `\�a�%0��_/S��2�*-�~�gzC&����4�gzG�)%YO������l��!L�1%�\-��������;���0�e��������}i�Tb{�%�o���sF�� `$��_�/S��2�����4��!`����
��4S�����? �`��.�����,o>��DL	(Q�������<�����|���MN��N�)�5���b1�E�, `��0%`�Zf=���\g)�!�e��/k��d�4�����7���K�^���&��	iF������}B�[4T�OH�O�W�	Ix8�����}B��U�O���O��WN�>e�_���!g�~��
0
sL�����D�!��_�:K9
},/{���*���i�|����	�G���	����_��S>X�L��_m���,�4t2-=t��S~���L	�}?|d���A�=mw=n�����|�����S����p�4��������fJ����c���@��������)pb���c���Z&q�?�:K9
��~�v��L;i���T� ��_��h��+�)o�e����S�����0W���&+��0/����4M��_qZ�{>0%`��c��(W��\g)��Y�%E"Y�O��q�{�7Y9����jk)�v��
������F������
8�w�9�����e�&[������@�:K9�d���
�������o�rZ�"���n�%%�^�/�������y^�����R\�]�����% �iqe���R9;����+?�^)��p:��pz��}B�E��\���v�;%]6q�����W�(�F���Q����N��\���Y
xg|0i�G���i���O�mi'-n�'�Qv�����������?UL	]}�.��+�v���)������N�)������H�'M���icJ�����?�����jw{��viq�0���,P��w���p�j��~�v�YL	V}�����Q;i�
8�]������fJ���+�MP����������Z}\�O�2j'�F/��oK;i���~��:~��6���<�>B�S���I�K���O�mi'���'0�0HER�Y���$�<�>��S���IK�����N=��S~[�I�\���4j�R��R����%���wn��y����<D<$��7.��MCl��\��;������*�-��E����6
��1�M������1t�gSn��4��+�v�,���)�-���p���/`V@�u`U@��`J������re�N�M���oK;i������W��.��`J�������?����4kf�~���v�L	���>l	��aI@��bH��{g�O�2j'�����_���v����_�X0��HaC@��bB���U�eH����S���B��~�����W5�~W��nt�����?�Z��8��o�_��h��_H��P����zn>�a������vt��oc�r-CZ�kDw'�T&��/>��B�G�F5��J�I����	���]���	�3n��.��]��{����o��_���}B���e�����iqd��.�9��B�G�.8��\�
����H+hND�6���n��k�����+�����( pCH
�� ( 
`K(
��O��!-=:`���8L	�!' ������@CN�#��k����V~!I����PZ�_sH	��=�<�O��!-%O;`���8L	�!$ ��BQt���r-CZ2^��Q~!I�CG@:�&Q��("^�O��!-$�h���t��"! ��BQ���r-CZIK�fZ�NK�'$��t�-��OHc*!q�r�
a��|@�]+x��O����0�V�eH���6
�vZi�4N�}B>���!H������Z��'!C��r-CZe?���_H��((`����q����@j��)(����1�m������r-CZe�aQ~!I��d�����E)'`D�\��GA���_H��(& C������a0BQJ	��r-CZHZ��B2�/������H?
	��\���V4���T�0��P�2��"��)�2�����M+   �  �y��P��!-��������(�����&��������$�D@H!��'d����
;T�����O�D�>!��B+�2���E��]���(��IZy��k�}?���K�Z��}B[@��G'�Y�<)`�����iq<�&�)����aJ@��g�`�
EyP�t��k��@@������9�  �1s��\���V4�	���w%��	l�H�z%4��3]0���'�v%�r-CZ��}�P��$-��j@� ���Q0����~��1�k��@@������(E1%�r-CZHZ�4SB}  �(�$������$E�F�i��6
I���v���K��}���/�r���Yw���P�;�u��2%�b$��q6+G��y���f>6�_�7+8�y�D���"i��yR�c�D_�_��s�9�����-��U���$��"�>�������or��$����%m��Z���g�8���xi�X|dke���U��L���=u�go�$�#�.�W��[�b��l�>_R�vJ�����L�7���+9��]yr�(zl�����y����c�x�@@����.�8��6&��vp��Z\U�������k�������Y<�.uf|��m;��`(��B<]p���&������B�z*>���ijIEND�B`�

#98

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Heikki Linnakangas (#97)

Re: Proposal for CSN based snapshots

On 2016-08-23 18:18:57 +0300, Heikki Linnakangas wrote:

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

To make the problem more apparent, how are the differences for something
as extreme as just a
SELECT 1;
or a
SELECT 1 FROM pgbench_accounts WHERE aid = 1;

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#99

Heikki Linnakangas

hlinnaka@iki.fi

over 9 years ago

In reply to: Heikki Linnakangas (#97)

1 attachment(s)

Re: Proposal for CSN based snapshots

On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for the
next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.

As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.

And here are the results on the 72 core machine (thanks again,
Alexander!). The test setup was the same as on the 32-core machine,
except that I ran it with more clients since the system has more CPU
cores. In summary, in the best case, the patch increases throughput by
about 10%. That peak is with 64 clients. Interestingly, as the number of
clients increases further, the gain evaporates, and the CSN version
actually performs worse than unpatched master. I don't know why that is.
One theory that by eliminating one bottleneck, we're now hitting another
bottleneck which doesn't degrade as gracefully when there's contention.

Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-72core-results/.

- Heikki

Attachments:

clients-sets.pngimage/png; name=clients-sets.pngDownload

�PNG


IHDR��,� PLTE���������������@��Ai��� �@���0`��@�������**��@��333MMMfff�������������������22�������U����������������d�"�".�W��p��������������P����E��r��z�����k������� �����������P@Uk/���@�@��`��`�����@��@��`��p������������������������|�@�� �����KIDATx�����������nb�����tC�9/swwH�
�"�^��UG�Z�$F��@@8�	�	��f�9��?���?m������zK�6�5����_z���f
>)����IN\�;�_���U���������_=T�V��	�6Ypd�* �a�:v)�;��B��y��
A��W�4������Ua�(�z\SxFG'��
�V��9u�XOkX���m��_b7�����^.\Aq"�L�BUV�^g?���Dh���>G�$�+��j�M��}(���:0(8��C�L�C�^��0=}(�iun����[/�^�������
����Y@���Z��p��!�E� �i�����Y9�7��?������bW@����V�����r7�����?�[o����k�L	6��Y�����
���}����
8���cw���[�f;o;�����M�������r���p�n1���s7�����$5�l����,�:7��a�D�e��R������[z:�*��@�����#�o3�R��KK��v���e;p]���.�(�������� ��������/N��U�4\��Z@�S#���e�s���������j��a��[d;k��/^l.�9:c����v�������g�L��e�������k�S��pw=�}9J=x�axT�M�T�a��L���c�+���Qu[��9���|8����V�����������FW� dUg�����fo��������%���H
��y��Y@�����5w�j�����rHo@�� �
X����|��.Y39`n,8�7X�b�Vd���OU[�2�eUl;���_>��N@�/�_W
�@����%�#�Yp�0H����["�)�y�C��v/�7�a�����$���0���6���U���� �qP T��pE��
<+��=F+}y����1&l����/����sg_��x^r��	>]��F�y
.���/i�<��0
�!���|�$�9��y�y/�a`*�7��}s�Mq(��f��%��WEU�/�0���|�*����Qh0;���5�X����.�a��~,:N}
GC����D�'�x�����D����[� `����������/�sVd���������V<����h[@���R���Qh0�z��`��ak%d$�14+`M
00��s��=��lJZ59`nj-B�����V����5g��o�Y)���iS�l��;h��
@������g�2��/B�%D��Ec���ar��TpL��T�9��x�mX��%��E ������yk�����Mi����_��U��An������L��3<����r����Po���g/�g�kr��</��c�O�
��n��e�����6J�eA���S�p�����n�o�U�ZP���[-��&�"-,�<�F�K��%����+k�Y����Q�hP�[����X2������w"������#�����ck��.S�����UT������"`n����8:��������������T���a���t���'�������n��c�f�����])H�������M�nbc��L]yv���@�TwN�I��[u��j�<�"5����Nuv�k����4����*:�pt�8����w��?�o��m=��$N���L
�g���������p�����������������a�������Y����+�.?��4N�dC�^�W-9��'��7�b��v+��"���
���o������A+h���`�:�(R�4�s�q�\B����P}6����<���F�8��r�g�~c�	X��s�?�4CB�����e7?RgN���{ck'� �u���e��r��4�y��`��h,�Y����;Z�g��3����!�]|��O���r���5BB|����PmO\���$����y�o8�M��y4�iJ��g<y����h�Y���b���{,��s���dn2Br����������O����A��������O�����A>���^����ck��������Y����������p������������o}o������84xK�	���o�����p�����g,��C����g4x�[,�b�*.��
|��_����n���M	�
�����g�mT�-�QO3�t��i�|�Y8�M�3w����&>�LD�dbU�Y�����mZQlR���bo�;�l��z�|c����I��]v])x�\��3�Z� `,�<u������=�n�M�������N����x��7%��>^�v��w2�1��]�E�f/S������d���`�!���G��y>��/�i6�xc�`��0T@wF���|����c�9��
xA�3�������u��g�34��}�-��hS@��0��}1���w�s�-� 6�+�+b�R9d�Q�A�C���N��Yq���~������mx�{������s����8'����{���`��O��#�P����.4�0�����z��P�W�����Mym�e>>o���=�N�">+����# `$�+w�L��$s��%09���5���{���V,�;���_������u�h��9�X�8F��<����?)1;��7eJz�b,� �((�sP����"�?r@/�O~Kl�|g5?#�o_&�����h�p&rO���I�/9\s�������_6�TTt��j���D����$x$�����*�}���PSp�����?�e��<BX�����
��G��#`���s��W��)8�A���Cn! 9`w9�����y�x��� f>u�Y�0�fu0��5��2�_9�����J��7�F����"jT�M��8.x�?�l��z����7+�2@M|���^������_=���#�����.���E��v�)'��_9nD�!n���f[���	��F���w���+����.#�r��F�N�:y�8F�g6�2<�
��;Z�Jk��_	��'���p}��
G���.��`	�a6e��F�;�J���,������=aA�����{B~P�����s�a4���-b����I?�a���zB@��aW�lu�A5O��L+��/����<wV�d+�	Xv
�a	�4����
�fS6[������v��V���fn�_�h��q4��-�v�M�l�~e`�]i������G����6U�,�_��a6e�����7�+����������KC��[RL`����Z����	��}���s3oo��\�	v�M�l��"$qL0e��+m�Z�63p�����;  ,V�
��!��R��0�%�a6e���)��V�N����0wBU�n� `���%����D)fS6[����;l�����
���u
F�NP+�D6e%���a> ]i#�Y��
���S@�n�, �����r��HW�nV@�����R�)���R_�	�	H6e#�F�**�w�+mW�w,\��
�w�E�b�E|��2�9���'������)��2� ��������0h-�����g�
u��m���\�#���T��t���v�3����T	8}����=�J����7�_W�p�\���l���v��J�
	8N����K��~���>30��*w����l���n��+m7+ t�&��CT	���Y@�)�����a�	��#UA@�����M�n$D�.[�Y@�5�_�  TE��dS6���} ]i#�Y���D@����M�n!D���. 3p�H�9:��!��-i��:��E�9!'��M�����l���Fe�����W)vnw��.�E|V���g�1���)�:F�";F�6�����6��oD��t���f�@@����g�x>�M�6�"���{��k���o�R]����l���@��Fp�B��:������Y@�)��s�����lX@f��A@�J]��{�
x3�M���"���;�j
��RS�;���l[�o@��Fp�BT��2������l��@��Fp�BT�50��& ����M�6����t���
&?��fj�0}DA*�2���M�II������M�8�>����/���~I8+�{X�������t�Fp�E��UvT��3�pH���>�,���W�C���0���l���
MW��' �/�@@�J
�5��l�F���U���)$��
��h�l�F��0|�+m7+ t���8  T�q#f`�)��r���������O�6  T�ic�#��l*D@�w# 30lyX@��-UyV���l�F���V�������*�
�&��, ���`;9`�HW�6# 30@@�����, ���`9��x.��Fp
�?%���#�)M��T3�sN����9!�@@� �!e
�;��=j8!/��/�+~I:�p�<��=s%Dk�6!1S0�s���$/�V�����+!���l�����+n_@f`����`�#U�, ���`9�(�+m[��  T�*�(��l�F���W�����#A7<0N��?������������+��/d�M�V��G@��F�v��Gg?O���D�@���\���f��l����t���f�@@���\�sJ8
r���l��.��6�U(�}P~��pA���p��`�F���`�9`J0]i#�Y����P��W��V?�/���rII�bi������%(�R,�6R��@4X��V{<�pN~���f;<B]J������������X<��0<C� ����:�������I��� 
�<�>�z������4i����^����}dN�C�'����Z6�&��%�e���C�/	���K�pw�>�e���iw
N����	��	
p�Q� �l��+E�� &���Q\pJlZpD�r��cJL��+'�G�g%���QL������,,W�Xl�;P�����������Y���
�{�q���%��~Q��#�����������k�"%c����2�b��8
�L������>(�N@��o	��\s"���c�'P��p�� �g�9��K��]�,4�w�;r�P�?��G�P��IEND�B`�

#100

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Heikki Linnakangas (#99)

Re: Proposal for CSN based snapshots

On Wed, Aug 24, 2016 at 11:54 AM, Heikki Linnakangas <hlinnaka@iki.fi>
wrote:

On 08/23/2016 06:18 PM, Heikki Linnakangas wrote:

On 08/22/2016 08:38 PM, Andres Freund wrote:

On 2016-08-22 20:32:42 +0300, Heikki Linnakangas wrote:

I
remember seeing ProcArrayLock contention very visible earlier, but I
can't
hit that now. I suspect you'd still see contention on bigger hardware,
though, my laptop has oly 4 cores. I'll have to find a real server for
the
next round of testing.

Yea, I think that's true. I can just about see ProcArrayLock contention
on my more powerful laptop, to see it really bad you need bigger
hardware / higher concurrency.

As soon as I sent my previous post, Vladimir Borodin kindly offered
access to a 32-core server for performance testing. Thanks Vladimir!

I installed Greg Smith's pgbench-tools kit on that server, and ran some
tests. I'm seeing some benefit on "pgbench -N" workload, but only after
modifying the test script to use "-M prepared", and using Unix domain
sockets instead of TCP to connect. Apparently those things add enough
overhead to mask out the little difference.

Attached is a graph with the results. Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-results/. In short, the patch
improved throughput, measured in TPS, with >= 32 or so clients. The
biggest difference was with 44 clients, which saw about 5% improvement.

So, not phenomenal, but it's something. I suspect that with more cores,
the difference would become more clear.

Like on a cue, Alexander Korotkov just offered access to a 72-core
system :-). Thanks! I'll run the same tests on that.

And here are the results on the 72 core machine (thanks again,
Alexander!). The test setup was the same as on the 32-core machine, except
that I ran it with more clients since the system has more CPU cores. In
summary, in the best case, the patch increases throughput by about 10%.
That peak is with 64 clients. Interestingly, as the number of clients
increases further, the gain evaporates, and the CSN version actually
performs worse than unpatched master. I don't know why that is. One theory
that by eliminating one bottleneck, we're now hitting another bottleneck
which doesn't degrade as gracefully when there's contention.

Did you try to identify this second bottleneck with perf or something?
It would be nice to also run pgbench -S. Also, it would be nice to check
something like 10% of writes, 90% of reads (which is quite typical workload
in real life I believe).

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#101

Amit Kapila

amit.kapila16@gmail.com

about 9 years ago

In reply to: Heikki Linnakangas (#99)

Re: Proposal for CSN based snapshots

On Wed, Aug 24, 2016 at 2:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

And here are the results on the 72 core machine (thanks again, Alexander!).
The test setup was the same as on the 32-core machine, except that I ran it
with more clients since the system has more CPU cores. In summary, in the
best case, the patch increases throughput by about 10%. That peak is with 64
clients. Interestingly, as the number of clients increases further, the gain
evaporates, and the CSN version actually performs worse than unpatched
master. I don't know why that is. One theory that by eliminating one
bottleneck, we're now hitting another bottleneck which doesn't degrade as
gracefully when there's contention.

Quite possible and I think it could be either due CLOGControlLock or
WALWriteLock. Are you planning to work on this for this release, if
not do you think meanwhile we should pursue "caching the snapshot"
idea of Andres? Last time Mithun did some benchmarks [1]/messages/by-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=OgX+3JtfGk9g@mail.gmail.com after fixing
some issues in the patch and he found noticeable performance increase.

[1]: /messages/by-id/CAD__Ouic1Tvnwqm6Wf6j7Cz1Kk1DQgmy0isC7=OgX+3JtfGk9g@mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#102

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Amit Kapila (#101)

Re: Proposal for CSN based snapshots

On Wed, Aug 24, 2016 at 5:54 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

And here are the results on the 72 core machine (thanks again, Alexander!).
The test setup was the same as on the 32-core machine, except that I ran it
with more clients since the system has more CPU cores. In summary, in the
best case, the patch increases throughput by about 10%. That peak is with 64
clients. Interestingly, as the number of clients increases further, the gain
evaporates, and the CSN version actually performs worse than unpatched
master. I don't know why that is. One theory that by eliminating one
bottleneck, we're now hitting another bottleneck which doesn't degrade as
gracefully when there's contention.

Full results are available at
https://hlinnaka.iki.fi/temp/csn-4-72core-results/.

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#103

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 8 years ago

In reply to: Michael Paquier (#102)

1 attachment(s)

Re: Proposal for CSN based snapshots

On 21.06.2017 04:48, Michael Paquier wrote:

There has not been much activity on this thread for some time, and I
mentioned my intentions to some developers at the last PGCon. But I am
planning to study more the work that has been done here, with as
envisaged goal to present a patch for the first CF of PG11. Lots of
fun ahead.

Hi Michael,

Glad to see you working on this! I've been studying this topic too.
Attached you can find a recently rebased version of Heikki's v4 patch.
I also fixed a bug that appeared on report-receipts isolation test:
XidIsConcurrent should report that a transaction is concurrent to ours
when its csn equals our snapshotcsn.

There is another bug where multiple tuples with the same primary key
appear after a pgbench read-write test. Querying pgbench_branches with
disabled index scans, I see several tuples with the same 'bid' value.
Usually they do not touch the index so there are no errors. But
sometimes HOT update is not possible and it errors out with violation of
unique constraint, which is how I noticed it. I can't reproduce it on my
machine and have to use a 72-core one.

For now I can conclude that the oldestActiveXid is not always updated
correctly. In TransactionIdAsyncCommitTree, just before the transaction
sets clog status, its xid can be less than oldestActiveXid. Other
transactions are seeing it as aborted for some time before it writes to
clog (see TransactionIdGetCommitSeqNo). They can update a tuple it
deleted, and that leads to duplication. Unfortunately, I didn't have
time yet to investigate this further.
--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#104

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Alexander Kuzmenkov (#103)

Re: Proposal for CSN based snapshots

On Wed, Jun 21, 2017 at 1:48 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Glad to see you working on this! I've been studying this topic too. Attached
you can find a recently rebased version of Heikki's v4 patch.
I also fixed a bug that appeared on report-receipts isolation test:
XidIsConcurrent should report that a transaction is concurrent to ours when
its csn equals our snapshotcsn.

I am finally looking at rebasing things properly now, and getting more
familiar in order to come up with a patch for the upcoming coming fest
(I can see some diff simplifications as well to impact less existing
applications), and this v5 is very suspicious. You are adding code
that should actually be removed. One such block can be found at the
beginning of the patch:
    <indexterm>
+    <primary>txid_snapshot_xip</primary>
+   </indexterm>
+
+   <indexterm>
That's not actually a problem as I am reusing an older v4 from Heikki
now for the future, but I wanted to let you know about that first.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#105

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 8 years ago

In reply to: Michael Paquier (#104)

Re: Proposal for CSN based snapshots

That's not actually a problem as I am reusing an older v4 from Heikki
now for the future, but I wanted to let you know about that first.

Thank you, I'll look into that.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#106

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 8 years ago

In reply to: Alexander Kuzmenkov (#105)

2 attachment(s)

Re: Proposal for CSN based snapshots

Hi all,

So I did some more experiments on this patch.

* I fixed the bug with duplicate tuples I mentioned in the previous
letter. Indeed, the oldestActiveXid could be advanced past the
transaction's xid before it set the clog status. This happened because
the oldestActiveXid is calculated based on the CSN log contents, and we
wrote to CSN log before writing to clog. The fix is to write to clog
before CSN log (TransactionIdAsyncCommitTree)

* We can remove the exclusive locking on CSNLogControlLock when setting
the CSN for a transaction (CSNLogSetPageStatus). When we assign a CSN to
a transaction and its children, the atomicity is guaranteed by using an
intermediate state (COMMITSEQNO_COMMITTING), so it doesn't matter if
this function is not atomic in itself. The shared lock should suffice here.

* On throughputs of about 100k TPS, we allocate ~1k CSN log pages per
second. This is done with exclusive locking on CSN control lock, and
noticeably increases contention. To alleviate this, I allocate new pages
in batches (ExtendCSNLOG).

* When advancing oldestActiveXid, we scan CSN log to find an xid that is
still in progress. To do that, we increment the xid and query its CSN
using the high level function, acquiring and releasing the lock and
looking up the log page for each xid. I wrote some code to acquire the
lock only once and then scan the pages (CSNLogSetPageStatus).

* On bigger buffers the linear page lookup code that the SLRU uses now
becomes slow. I added a shared dynahash table to speed up this lookup.

* I merged in recent changes from master (up to 7e1fb4). Unfortunately I
didn't have enough time to fix the logical replication and snapshot
import, so now it's completely broken.

I ran some pgbench with these tweaks (tpcb-like, 72 cores, scale 500).
The throughput is good on lower number of clients (on 50 clients it's
35% higher than on the master), but then it degrades steadily. After 250
clients it's already lower than master; see the attached graph. In perf
reports the CSN-related things have almost vanished, and I see lots of
time spent working with clog. This is probably the situation where by
making some parts faster, the contention in other parts becomes worse
and overall we have a performance loss. Hilariously, at some point I saw
a big performance increase after adding some debug printfs. I wanted to
try some things with the clog next, but for now I'm out of time.

The new version of the patch is attached. Last time I apparently diff'ed
it the other way around, now it should apply fine.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

master-7e1fb4-csn-v6.pngimage/png; name=master-7e1fb4-csn-v6.pngDownload

csn-v6.patchtext/x-diff; name=csn-v6.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 3631922..729934b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17633,10 +17633,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
    </indexterm>
 
    <indexterm>
-    <primary>txid_snapshot_xip</primary>
-   </indexterm>
-
-   <indexterm>
     <primary>txid_snapshot_xmax</primary>
    </indexterm>
 
@@ -17683,11 +17679,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
        <entry>get current snapshot</entry>
       </row>
       <row>
-       <entry><literal><function>txid_snapshot_xip(<parameter>txid_snapshot</parameter>)</function></literal></entry>
-       <entry><type>setof bigint</type></entry>
-       <entry>get in-progress transaction IDs in snapshot</entry>
-      </row>
-      <row>
        <entry><literal><function>txid_snapshot_xmax(<parameter>txid_snapshot</parameter>)</function></literal></entry>
        <entry><type>bigint</type></entry>
        <entry>get <literal>xmax</literal> of snapshot</entry>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 011f2b9..c84bc69 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2009,8 +2009,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	if (all_dead)
 		*all_dead = first_call;
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = first_call;
@@ -2106,7 +2104,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 * transactions.
 		 */
 		if (all_dead && *all_dead &&
-			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
+			!HeapTupleIsSurelyDead(heapTuple, GetRecentGlobalXmin()))
 			*all_dead = false;
 
 		/*
@@ -3765,9 +3763,8 @@ l2:
 				update_xact = InvalidTransactionId;
 
 			/*
-			 * There was no UPDATE in the MultiXact; or it aborted. No
-			 * TransactionIdIsInProgress() call needed here, since we called
-			 * MultiXactIdWait() above.
+			 * There was no UPDATE in the MultiXact; or it aborted. It cannot
+			 * be in-progress anymore, since we called MultiXactIdWait() above.
 			 */
 			if (!TransactionIdIsValid(update_xact) ||
 				TransactionIdDidAbort(update_xact))
@@ -5247,7 +5244,7 @@ heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode,
  * either here, or within MultiXactIdExpand.
  *
  * There is a similar race condition possible when the old xmax was a regular
- * TransactionId.  We test TransactionIdIsInProgress again just to narrow the
+ * TransactionId.  We test TransactionIdGetStatus again just to narrow the
  * window, but it's still possible to end up creating an unnecessary
  * MultiXactId.  Fortunately this is harmless.
  */
@@ -5258,6 +5255,7 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 						  TransactionId *result_xmax, uint16 *result_infomask,
 						  uint16 *result_infomask2)
 {
+	TransactionIdStatus xidstatus;
 	TransactionId new_xmax;
 	uint16		new_infomask,
 				new_infomask2;
@@ -5393,7 +5391,7 @@ l5:
 		new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_status);
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
-	else if (TransactionIdIsInProgress(xmax))
+	else if ((xidstatus = TransactionIdGetStatus(xmax)) == XID_INPROGRESS)
 	{
 		/*
 		 * If the XMAX is a valid, in-progress TransactionId, then we need to
@@ -5422,8 +5420,9 @@ l5:
 				/*
 				 * LOCK_ONLY can be present alone only when a page has been
 				 * upgraded by pg_upgrade.  But in that case,
-				 * TransactionIdIsInProgress() should have returned false.  We
-				 * assume it's no longer locked in this case.
+				 * TransactionIdGetStatus() should not have returned
+				 * XID_INPROGRESS.  We assume it's no longer locked in this
+				 * case.
 				 */
 				elog(WARNING, "LOCK_ONLY found for Xid in progress %u", xmax);
 				old_infomask |= HEAP_XMAX_INVALID;
@@ -5476,7 +5475,7 @@ l5:
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
 	else if (!HEAP_XMAX_IS_LOCKED_ONLY(old_infomask) &&
-			 TransactionIdDidCommit(xmax))
+			 xidstatus == XID_COMMITTED)
 	{
 		/*
 		 * It's a committed update, so we gotta preserve him as updater of the
@@ -5505,7 +5504,7 @@ l5:
 		/*
 		 * Can get here iff the locking/updating transaction was running when
 		 * the infomask was extracted from the tuple, but finished before
-		 * TransactionIdIsInProgress got to run.  Deal with it as if there was
+		 * TransactionIdGetStatus got to run.  Deal with it as if there was
 		 * no locker at all in the first place.
 		 */
 		old_infomask |= HEAP_XMAX_INVALID;
@@ -5536,15 +5535,11 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 						   LockTupleMode mode, bool *needwait)
 {
 	MultiXactStatus wantedstatus;
+	TransactionIdStatus xidstatus;
 
 	*needwait = false;
 	wantedstatus = get_mxact_status_for_lock(mode, false);
 
-	/*
-	 * Note: we *must* check TransactionIdIsInProgress before
-	 * TransactionIdDidAbort/Commit; see comment at top of tqual.c for an
-	 * explanation.
-	 */
 	if (TransactionIdIsCurrentTransactionId(xid))
 	{
 		/*
@@ -5553,7 +5548,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleSelfUpdated;
 	}
-	else if (TransactionIdIsInProgress(xid))
+	xidstatus = TransactionIdGetStatus(xid);
+
+	if (xidstatus == XID_INPROGRESS)
 	{
 		/*
 		 * If the locking transaction is running, what we do depends on
@@ -5573,9 +5570,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleMayBeUpdated;
 	}
-	else if (TransactionIdDidAbort(xid))
+	else if (xidstatus == XID_ABORTED)
 		return HeapTupleMayBeUpdated;
-	else if (TransactionIdDidCommit(xid))
+	else if (xidstatus == XID_COMMITTED)
 	{
 		/*
 		 * The other transaction committed.  If it was only a locker, then the
@@ -5588,7 +5585,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 * Note: the reason we worry about ISUPDATE here is because as soon as
 		 * a transaction ends, all its locks are gone and meaningless, and
 		 * thus we can ignore them; whereas its updates persist.  In the
-		 * TransactionIdIsInProgress case, above, we don't need to check
+		 * XID_INPROGRESS case, above, we don't need to check
 		 * because we know the lock is still "alive" and thus a conflict needs
 		 * always be checked.
 		 */
@@ -5602,9 +5599,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 
 		return HeapTupleMayBeUpdated;
 	}
-
-	/* Not in progress, not aborted, not committed -- must have crashed */
-	return HeapTupleMayBeUpdated;
+	return 0; /* not reached */
 }
 
 
@@ -6107,8 +6102,8 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	 * RecentGlobalXmin.  That's not pretty, but it doesn't seem worth
 	 * inventing a nicer API for this.
 	 */
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-	PageSetPrunable(page, RecentGlobalXmin);
+	Assert(TransactionIdIsValid(GetRecentGlobalXmin()));
+	PageSetPrunable(page, GetRecentGlobalXmin());
 
 	/* store transaction information of xact deleting the tuple */
 	tp.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
@@ -6359,7 +6354,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 */
 			if (TransactionIdPrecedes(xid, cutoff_xid))
 			{
-				Assert(!TransactionIdDidCommit(xid));
+				Assert(TransactionIdGetStatus(xid) == XID_ABORTED);
 				*flags |= FRM_INVALIDATE_XMAX;
 				xid = InvalidTransactionId; /* not strictly necessary */
 			}
@@ -6430,6 +6425,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		if (ISUPDATE_from_mxstatus(members[i].status))
 		{
 			TransactionId xid = members[i].xid;
+			TransactionIdStatus xidstatus;
 
 			/*
 			 * It's an update; should we keep it?  If the transaction is known
@@ -6437,18 +6433,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 * Note that an updater older than cutoff_xid cannot possibly be
 			 * committed, because HeapTupleSatisfiesVacuum would have returned
 			 * HEAPTUPLE_DEAD and we would not be trying to freeze the tuple.
-			 *
-			 * As with all tuple visibility routines, it's critical to test
-			 * TransactionIdIsInProgress before TransactionIdDidCommit,
-			 * because of race conditions explained in detail in tqual.c.
 			 */
-			if (TransactionIdIsCurrentTransactionId(xid) ||
-				TransactionIdIsInProgress(xid))
+			xidstatus = TransactionIdGetStatus(xid);
+			if (xidstatus == XID_INPROGRESS)
 			{
 				Assert(!TransactionIdIsValid(update_xid));
 				update_xid = xid;
 			}
-			else if (TransactionIdDidCommit(xid))
+			else if (xidstatus == XID_COMMITTED)
 			{
 				/*
 				 * The transaction committed, so we can tell caller to set
@@ -6486,8 +6478,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		else
 		{
 			/* We only keep lockers if they are still running */
-			if (TransactionIdIsCurrentTransactionId(members[i].xid) ||
-				TransactionIdIsInProgress(members[i].xid))
+			if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 			{
 				/* running locker cannot possibly be older than the cutoff */
 				Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
@@ -6961,6 +6952,7 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 		{
 			TransactionId memxid;
 			LOCKMODE	memlockmode;
+			TransactionIdStatus	xidstatus;
 
 			memlockmode = LOCKMODE_from_mxstatus(members[i].status);
 
@@ -6973,16 +6965,18 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 			if (TransactionIdIsCurrentTransactionId(memxid))
 				continue;
 
+			xidstatus = TransactionIdGetStatus(memxid);
+
 			if (ISUPDATE_from_mxstatus(members[i].status))
 			{
 				/* ignore aborted updaters */
-				if (TransactionIdDidAbort(memxid))
+				if (xidstatus == XID_ABORTED)
 					continue;
 			}
 			else
 			{
 				/* ignore lockers-only that are no longer in progress */
-				if (!TransactionIdIsInProgress(memxid))
+				if (xidstatus != XID_INPROGRESS)
 					continue;
 			}
 
@@ -7062,7 +7056,7 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 			if (!DoLockModesConflict(LOCKMODE_from_mxstatus(memstatus),
 									 LOCKMODE_from_mxstatus(status)))
 			{
-				if (remaining && TransactionIdIsInProgress(memxid))
+				if (remaining && TransactionIdGetStatus(memxid) == XID_INPROGRESS)
 					remain++;
 				continue;
 			}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52231ac..6db44f6 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -23,6 +23,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
@@ -101,10 +102,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
-		OldestXmin = RecentGlobalXmin;
+		OldestXmin = GetRecentGlobalXmin();
 	else
 		OldestXmin =
-			TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+			TransactionIdLimitedForOldSnapshots(GetRecentGlobalDataXmin(),
 												relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index bef4255..eff0d48 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -530,8 +530,6 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amgettuple);
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	/*
 	 * The AM's amgettuple proc finds the next index entry matching the scan
 	 * keys, and puts the TID into scan->xs_ctup.t_self.  It should also set
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index a3f11da..db92670 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -321,6 +321,9 @@ older than RecentGlobalXmin.  As collateral damage, this implementation
 also waits for running XIDs with no snapshots and for snapshots taken
 until the next transaction to allocate an XID commits.
 
+XXX: now that we use CSNs as snapshots, it would be more
+straightforward to use something based on CSNs instead of RecentGlobalXmin.
+
 Reclaiming a page doesn't actually change its state on disk --- we simply
 record it in the shared-memory free space map, from which it will be
 handed out the next time a new page is needed for a page split.  The
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 5c817b6..18f6c05 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 static bool _bt_mark_page_halfdead(Relation rel, Buffer buf, BTStack stack);
@@ -760,7 +761,7 @@ _bt_page_recyclable(Page page)
 	 */
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	if (P_ISDELETED(opaque) &&
-		TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
+		TransactionIdPrecedes(opaque->btpo.xact, GetRecentGlobalXmin()))
 		return true;
 	return false;
 }
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a..39dda72 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -19,21 +19,10 @@
 static void
 standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
-	int			i;
-
 	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
 					 xlrec->nextXid,
 					 xlrec->latestCompletedXid,
 					 xlrec->oldestRunningXid);
-	if (xlrec->xcnt > 0)
-	{
-		appendStringInfo(buf, "; %d xacts:", xlrec->xcnt);
-		for (i = 0; i < xlrec->xcnt; i++)
-			appendStringInfo(buf, " %u", xlrec->xids[i]);
-	}
-
-	if (xlrec->subxid_overflow)
-		appendStringInfoString(buf, "; subxid ovf");
 }
 
 void
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3aafa79..ef09f3c 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -255,17 +255,6 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 	}
 }
 
-static void
-xact_desc_assignment(StringInfo buf, xl_xact_assignment *xlrec)
-{
-	int			i;
-
-	appendStringInfoString(buf, "subxacts:");
-
-	for (i = 0; i < xlrec->nsubxacts; i++)
-		appendStringInfo(buf, " %u", xlrec->xsub[i]);
-}
-
 void
 xact_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -285,18 +274,6 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 
 		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) rec;
-
-		/*
-		 * Note that we ignore the WAL record's xid, since we're more
-		 * interested in the top-level xid that issued the record and which
-		 * xids are being reported here.
-		 */
-		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
-		xact_desc_assignment(buf, xlrec);
-	}
 }
 
 const char *
@@ -321,9 +298,6 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ABORT_PREPARED:
 			id = "ABORT_PREPARED";
 			break;
-		case XLOG_XACT_ASSIGNMENT:
-			id = "ASSIGNMENT";
-			break;
 	}
 
 	return id;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index d7d5e90..20aed57 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -26,6 +26,7 @@
 #include "storage/bufmgr.h"
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 
@@ -521,7 +522,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
 		dt = (SpGistDeadTuple) PageGetItem(page, PageGetItemId(page, i));
 
 		if (dt->tupstate == SPGIST_REDIRECT &&
-			TransactionIdPrecedes(dt->xid, RecentGlobalXmin))
+			TransactionIdPrecedes(dt->xid, GetRecentGlobalXmin()))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47..fea6d28 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,8 +12,8 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
+OBJS = clog.o commit_ts.o csnlog.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
+	timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
 	xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index e7dd19f..55e2584 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -244,44 +244,24 @@ transaction Y as committed, then snapshot A must consider transaction Y as
 committed".
 
 What we actually enforce is strict serialization of commits and rollbacks
-with snapshot-taking: we do not allow any transaction to exit the set of
-running transactions while a snapshot is being taken.  (This rule is
-stronger than necessary for consistency, but is relatively simple to
-enforce, and it assists with some other issues as explained below.)  The
-implementation of this is that GetSnapshotData takes the ProcArrayLock in
-shared mode (so that multiple backends can take snapshots in parallel),
-but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
-while clearing MyPgXact->xid at transaction end (either commit or abort).
-(To reduce context switching, when multiple transactions commit nearly
-simultaneously, we have one backend take ProcArrayLock and clear the XIDs
-of multiple processes at once.)
-
-ProcArrayEndTransaction also holds the lock while advancing the shared
-latestCompletedXid variable.  This allows GetSnapshotData to use
-latestCompletedXid + 1 as xmax for its snapshot: there can be no
-transaction >= this xid value that the snapshot needs to consider as
-completed.
-
-In short, then, the rule is that no transaction may exit the set of
-currently-running transactions between the time we fetch latestCompletedXid
-and the time we finish building our snapshot.  However, this restriction
-only applies to transactions that have an XID --- read-only transactions
-can end without acquiring ProcArrayLock, since they don't affect anyone
-else's snapshot nor latestCompletedXid.
-
-Transaction start, per se, doesn't have any interlocking with these
-considerations, since we no longer assign an XID immediately at transaction
-start.  But when we do decide to allocate an XID, GetNewTransactionId must
-store the new XID into the shared ProcArray before releasing XidGenLock.
-This ensures that all top-level XIDs <= latestCompletedXid are either
-present in the ProcArray, or not running anymore.  (This guarantee doesn't
-apply to subtransaction XIDs, because of the possibility that there's not
-room for them in the subxid array; instead we guarantee that they are
-present or the overflow flag is set.)  If a backend released XidGenLock
-before storing its XID into MyPgXact, then it would be possible for another
-backend to allocate and commit a later XID, causing latestCompletedXid to
-pass the first backend's XID, before that value became visible in the
-ProcArray.  That would break GetOldestXmin, as discussed below.
+with snapshot-taking. Each commit is assigned a Commit Sequence Number, or
+CSN for short, using a monotonically increasing counter. A snapshot is
+represented by the value of the CSN counter, at the time the snapshot was
+taken. All (committed) transactions with a CSN <= the snapshot's CSN are
+considered as visible to the snapshot.
+
+When checking the visibility of a tuple, we need to look up the CSN
+of the xmin/xmax. For that purpose, we store the CSN of each
+transaction in the Commit Sequence Number log (csnlog).
+
+So, a snapshot is simply a CSN, such that all transactions that committed
+before that LSN are visible, and everything later is still considered as
+in-progress. However, to avoid consulting the csnlog every time the visibilty
+of a tuple is checked, we also record a lower and upper bound of the XIDs
+considered visible by the snapshot, in SnapshotData. When a snapshot is
+taken, xmax is set to the current nextXid value; any transaction that begins
+after the snapshot is surely still running. The xmin is tracked lazily in
+shared memory, by AdvanceRecentGlobalXmin().
 
 We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the
 subxid array) without taking ProcArrayLock.  This was once necessary to
@@ -293,42 +273,29 @@ once, rather than assume they can read it multiple times and get the same
 answer each time.  (Use volatile-qualified pointers when doing this, to
 ensure that the C compiler does exactly what you tell it to.)
 
-Another important activity that uses the shared ProcArray is GetOldestXmin,
-which must determine a lower bound for the oldest xmin of any active MVCC
-snapshot, system-wide.  Each individual backend advertises the smallest
-xmin of its own snapshots in MyPgXact->xmin, or zero if it currently has no
+Another important activity that uses the shared ProcArray is GetOldestSnapshot
+which must determine a lower bound for the oldest of any active MVCC
+snapshots, system-wide.  Each individual backend advertises the earliest
+of its own snapshots in MyPgXact->snapshotcsn, or zero if it currently has no
 live snapshots (eg, if it's between transactions or hasn't yet set a
-snapshot for a new transaction).  GetOldestXmin takes the MIN() of the
-valid xmin fields.  It does this with only shared lock on ProcArrayLock,
-which means there is a potential race condition against other backends
-doing GetSnapshotData concurrently: we must be certain that a concurrent
-backend that is about to set its xmin does not compute an xmin less than
-what GetOldestXmin returns.  We ensure that by including all the active
-XIDs into the MIN() calculation, along with the valid xmins.  The rule that
-transactions can't exit without taking exclusive ProcArrayLock ensures that
-concurrent holders of shared ProcArrayLock will compute the same minimum of
-currently-active XIDs: no xact, in particular not the oldest, can exit
-while we hold shared ProcArrayLock.  So GetOldestXmin's view of the minimum
-active XID will be the same as that of any concurrent GetSnapshotData, and
-so it can't produce an overestimate.  If there is no active transaction at
-all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
-for the xmin that might be computed by concurrent or later GetSnapshotData
-calls.  (We know that no XID less than this could be about to appear in
-the ProcArray, because of the XidGenLock interlock discussed above.)
-
-GetSnapshotData also performs an oldest-xmin calculation (which had better
-match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
-for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
-too expensive.  Note that while it is certain that two concurrent
-executions of GetSnapshotData will compute the same xmin for their own
-snapshots, as argued above, it is not certain that they will arrive at the
-same estimate of RecentGlobalXmin.  This is because we allow XID-less
-transactions to clear their MyPgXact->xmin asynchronously (without taking
-ProcArrayLock), so one execution might see what had been the oldest xmin,
-and another not.  This is OK since RecentGlobalXmin need only be a valid
-lower bound.  As noted above, we are already assuming that fetch/store
-of the xid fields is atomic, so assuming it for xmin as well is no extra
-risk.
+snapshot for a new transaction).  GetOldestSnapshot takes the MIN() of the
+snapshots.
+
+For freezing tuples, vacuum needs to know the oldest XID that is still
+considered running by any active transaction. That is, the oldest XID still
+considered running by the oldest active snapshot, as returned by
+GetOldestSnapshotCSN(). This value is somewhat expensive to calculate, so
+the most recently calculated value is kept in shared memory
+(SharedVariableCache->recentXmin), and is recalculated lazily by
+AdvanceRecentGlobalXmin() function. AdvanceRecentGlobalXmin() first scans
+the proc array, and makes note of the oldest active XID. That XID - 1 will
+become the new xmin. It then waits until all currently active snapshots have
+finished. Any snapshot that begins later will see the xmin as finished, so
+after all the active snapshots have finished, xmin will be visible to
+everyone. However, AdvanceRecentGlobalXmin() does not actually block waiting
+for anything; instead it contains a state machine that advances if possible,
+when AdvanceRecentGlobalXmin() is called. AdvanceRecentGlobalXmin() is
+called periodically by the WAL writer, so that it doesn't get very stale.
 
 
 pg_xact and pg_subtrans
@@ -343,21 +310,10 @@ from disk.  They also allow information to be permanent across server restarts.
 
 pg_xact records the commit status for each transaction that has been assigned
 an XID.  A transaction can be in progress, committed, aborted, or
-"sub-committed".  This last state means that it's a subtransaction that's no
-longer running, but its parent has not updated its state yet.  It is not
-necessary to update a subtransaction's transaction status to subcommit, so we
-can just defer it until main transaction commit.  The main role of marking
-transactions as sub-committed is to provide an atomic commit protocol when
-transaction status is spread across multiple clog pages. As a result, whenever
-transaction status spreads across multiple pages we must use a two-phase commit
-protocol: the first phase is to mark the subtransactions as sub-committed, then
-we mark the top level transaction and all its subtransactions committed (in
-that order).  Thus, subtransactions that have not aborted appear as in-progress
-even when they have already finished, and the subcommit status appears as a
-very short transitory state during main transaction commit.  Subtransaction
-abort is always marked in clog as soon as it occurs.  When the transaction
-status all fit in a single CLOG page, we atomically mark them all as committed
-without bothering with the intermediate sub-commit state.
+"committing". For committed transactions, the clog stores the commit WAL
+record's LSN. This last state means that the transaction is just about to
+write its commit WAL record, or just did so, but it hasn't yet updated the
+clog with the record's LSN.
 
 Savepoints are implemented using subtransactions.  A subtransaction is a
 transaction inside a transaction; its commit or abort status is not only
@@ -370,7 +326,7 @@ transaction.
 The "subtransaction parent" (pg_subtrans) mechanism records, for each
 transaction with an XID, the TransactionId of its parent transaction.  This
 information is stored as soon as the subtransaction is assigned an XID.
-Top-level transactions do not have a parent, so they leave their pg_subtrans
+Top-level transactions do not have a parent, so they leave their pg_csnlog
 entries set to the default value of zero (InvalidTransactionId).
 
 pg_subtrans is used to check whether the transaction in question is still
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index c34e7e1..07879bb 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -33,6 +33,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
@@ -84,18 +85,16 @@ static int	ZeroCLOGPage(int pageno, bool writeXlog);
 static bool CLOGPagePrecedes(int page1, int page2);
 static void WriteZeroPageXlogRec(int pageno);
 static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
-					 Oid oldestXidDb);
-static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
+								 Oid oldestXidDb);
+static void CLogSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids, CLogXidStatus status,
 						   XLogRecPtr lsn, int pageno);
-static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
+static void CLogSetStatusBit(TransactionId xid, CLogXidStatus status,
 						  XLogRecPtr lsn, int slotno);
-static void set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn);
 
 
 /*
- * TransactionIdSetTreeStatus
+ * CLogSetTreeStatus
  *
  * Record the final state of transaction entries in the commit log for
  * a transaction and its subtransaction tree. Take care to ensure this is
@@ -113,30 +112,13 @@ static void set_status_by_pages(int nsubxids, TransactionId *subxids,
  * caller guarantees the commit record is already flushed in that case.  It
  * should be InvalidXLogRecPtr for abort cases, too.
  *
- * In the commit case, atomicity is limited by whether all the subxids are in
- * the same CLOG page as xid.  If they all are, then the lock will be grabbed
- * only once, and the status will be set to committed directly.  Otherwise
- * we must
- *	 1. set sub-committed all subxids that are not on the same page as the
- *		main xid
- *	 2. atomically set committed the main xid and the subxids on the same page
- *	 3. go over the first bunch again and set them committed
- * Note that as far as concurrent checkers are concerned, main transaction
- * commit as a whole is still atomic.
- *
- * Example:
- *		TransactionId t commits and has subxids t1, t2, t3, t4
- *		t is on page p1, t1 is also on p1, t2 and t3 are on p2, t4 is on p3
- *		1. update pages2-3:
- *					page2: set t2,t3 as sub-committed
- *					page3: set t4 as sub-committed
- *		2. update page1:
- *					set t1 as sub-committed,
- *					then set t as committed,
-					then set t1 as committed
- *		3. update pages2-3:
- *					page2: set t2,t3 as committed
- *					page3: set t4 as committed
+ * The atomicity is limited by whether all the subxids are in the same CLOG
+ * page as xid.  If they all are, then the lock will be grabbed only once,
+ * and the status will be set to committed directly.  Otherwise there is
+ * a window that the parent will be seen as committed, while (some of) the
+ * children are still seen as in-progress. That's OK with the current use,
+ * as visibility checking code will not rely on the CLOG for recent
+ * transactions (CSNLOG will be used instead).
  *
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; functions in transam.c are the intended callers.
@@ -146,102 +128,45 @@ static void set_status_by_pages(int nsubxids, TransactionId *subxids,
  * cache yet.
  */
 void
-TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
+CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn)
 {
-	int			pageno = TransactionIdToPage(xid);	/* get page of parent */
+	TransactionId topXid;
+	int			pageno;
 	int			i;
+	int			offset;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED);
-
-	/*
-	 * See how many subxids, if any, are on the same page as the parent, if
-	 * any.
-	 */
-	for (i = 0; i < nsubxids; i++)
-	{
-		if (TransactionIdToPage(subxids[i]) != pageno)
-			break;
-	}
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	/*
-	 * Do all items fit on a single page?
+	 * Update the clog page-by-page. On first iteration, we will set the
+	 * status of the top-XID, and any subtransactions on the same page.
 	 */
-	if (i == nsubxids)
-	{
-		/*
-		 * Set the parent and all subtransactions in a single call
-		 */
-		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
-								   pageno);
-	}
-	else
-	{
-		int			nsubxids_on_first_page = i;
-
-		/*
-		 * If this is a commit then we care about doing this correctly (i.e.
-		 * using the subcommitted intermediate status).  By here, we know
-		 * we're updating more than one page of clog, so we must mark entries
-		 * that are *not* on the first page so that they show as subcommitted
-		 * before we then return to update the status to fully committed.
-		 *
-		 * To avoid touching the first page twice, skip marking subcommitted
-		 * for the subxids on that first page.
-		 */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-			set_status_by_pages(nsubxids - nsubxids_on_first_page,
-								subxids + nsubxids_on_first_page,
-								TRANSACTION_STATUS_SUB_COMMITTED, lsn);
-
-		/*
-		 * Now set the parent and subtransactions on same page as the parent,
-		 * if any
-		 */
-		pageno = TransactionIdToPage(xid);
-		TransactionIdSetPageStatus(xid, nsubxids_on_first_page, subxids, status,
-								   lsn, pageno);
-
-		/*
-		 * Now work through the rest of the subxids one clog page at a time,
-		 * starting from the second page onwards, like we did above.
-		 */
-		set_status_by_pages(nsubxids - nsubxids_on_first_page,
-							subxids + nsubxids_on_first_page,
-							status, lsn);
-	}
-}
-
-/*
- * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
- * transactions, chunking in the separate CLOG pages involved. We never
- * pass the whole transaction tree to this function, only subtransactions
- * that are on different pages to the top level transaction id.
- */
-static void
-set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn)
-{
-	int			pageno = TransactionIdToPage(subxids[0]);
-	int			offset = 0;
-	int			i = 0;
-
-	while (i < nsubxids)
+	pageno = TransactionIdToPage(xid);		/* get page of parent */
+	topXid = xid;
+	offset = 0;
+	i = 0;
+	for (;;)
 	{
 		int			num_on_page = 0;
 
-		while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
+		while (i < nsubxids && TransactionIdToPage(subxids[i]) == pageno)
 		{
 			num_on_page++;
 			i++;
 		}
 
-		TransactionIdSetPageStatus(InvalidTransactionId,
-								   num_on_page, subxids + offset,
-								   status, lsn, pageno);
+		CLogSetPageStatus(topXid,
+						  num_on_page, subxids + offset,
+						  status, lsn, pageno);
+
+		if (i == nsubxids)
+			break;
+
 		offset = i;
 		pageno = TransactionIdToPage(subxids[offset]);
+		topXid = InvalidTransactionId;
 	}
 }
 
@@ -249,19 +174,18 @@ set_status_by_pages(int nsubxids, TransactionId *subxids,
  * Record the final state of transaction entries in the commit log for
  * all entries on a single page.  Atomic only on this page.
  *
- * Otherwise API is same as TransactionIdSetTreeStatus()
+ * Otherwise API is same as CLogSetTreeStatus()
  */
 static void
-TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
-						   XLogRecPtr lsn, int pageno)
+CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno)
 {
 	int			slotno;
 	int			i;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED ||
-		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
 
@@ -276,38 +200,15 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 	 */
 	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
 
-	/*
-	 * Set the main transaction id, if any.
-	 *
-	 * If we update more than one xid on this page while it is being written
-	 * out, we might find that some of the bits go to disk and others don't.
-	 * If we are updating commits on the page with the top-level xid that
-	 * could break atomicity, so we subcommit the subxids first before we mark
-	 * the top-level commit.
-	 */
+	/* Set the main transaction id, if any. */
 	if (TransactionIdIsValid(xid))
-	{
-		/* Subtransactions first, if needed ... */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-		{
-			for (i = 0; i < nsubxids; i++)
-			{
-				Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-				TransactionIdSetStatusBit(subxids[i],
-										  TRANSACTION_STATUS_SUB_COMMITTED,
-										  lsn, slotno);
-			}
-		}
-
-		/* ... then the main transaction */
-		TransactionIdSetStatusBit(xid, status, lsn, slotno);
-	}
+		CLogSetStatusBit(xid, status, lsn, slotno);
 
 	/* Set the subtransactions */
 	for (i = 0; i < nsubxids; i++)
 	{
 		Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-		TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+		CLogSetStatusBit(subxids[i], status, lsn, slotno);
 	}
 
 	ClogCtl->shared->page_dirty[slotno] = true;
@@ -321,7 +222,7 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
  * Must be called with CLogControlLock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+CLogSetStatusBit(TransactionId xid, CLogXidStatus status, XLogRecPtr lsn, int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -333,22 +234,12 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
 	/*
-	 * When replaying transactions during recovery we still need to perform
-	 * the two phases of subcommit and then commit. However, some transactions
-	 * are already correctly marked, so we just treat those as a no-op which
-	 * allows us to keep the following Assert as restrictive as possible.
-	 */
-	if (InRecovery && status == TRANSACTION_STATUS_SUB_COMMITTED &&
-		curval == TRANSACTION_STATUS_COMMITTED)
-		return;
-
-	/*
 	 * Current state change should be from 0 or subcommitted to target state
 	 * or we should already be there when replaying changes during recovery.
 	 */
 	Assert(curval == 0 ||
-		   (curval == TRANSACTION_STATUS_SUB_COMMITTED &&
-			status != TRANSACTION_STATUS_IN_PROGRESS) ||
+		   (curval == CLOG_XID_STATUS_SUB_COMMITTED &&
+			status != CLOG_XID_STATUS_IN_PROGRESS) ||
 		   curval == status);
 
 	/* note this assumes exclusive access to the clog page */
@@ -389,8 +280,8 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; TransactionLogFetch() in transam.c is the intended caller.
  */
-XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+CLogXidStatus
+CLogGetStatus(TransactionId xid, XLogRecPtr *lsn)
 {
 	int			pageno = TransactionIdToPage(xid);
 	int			byteno = TransactionIdToByte(xid);
@@ -398,7 +289,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	int			slotno;
 	int			lsnindex;
 	char	   *byteptr;
-	XidStatus	status;
+	CLogXidStatus	status;
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
@@ -434,7 +325,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 Size
 CLOGShmemBuffers(void)
 {
-	return Min(128, Max(4, NBuffers / 512));
+	return Min(8192, Max(4, NBuffers / 512));
 }
 
 /*
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 827d976..ab4b977 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -26,6 +26,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
diff --git a/src/backend/access/transam/csnlog.c b/src/backend/access/transam/csnlog.c
new file mode 100644
index 0000000..e2b98e6
--- /dev/null
+++ b/src/backend/access/transam/csnlog.c
@@ -0,0 +1,672 @@
+/*-------------------------------------------------------------------------
+ *
+ * csnlog.c
+ *		Tracking Commit-Sequence-Numbers and in-progress subtransactions
+ *
+ * The pg_csnlog manager is a pg_clog-like manager that stores the commit
+ * sequence number, or parent transaction Id, for each transaction.  It is
+ * a fundamental part of MVCC.
+ *
+ * The csnlog serves two purposes:
+ *
+ * 1. While a transaction is in progress, it stores the parent transaction
+ * Id for each in-progress subtransaction. A main transaction has a parent
+ * of InvalidTransactionId, and each subtransaction has its immediate
+ * parent. The tree can easily be walked from child to parent, but not in
+ * the opposite direction.
+ *
+ * 2. After a transaction has committed, it stores the Commit Sequence
+ * Number of the commit.
+ *
+ * We can use the same structure for both, because we don't care about the
+ * parent-child relationships subtransaction after commit.
+ *
+ * This code is based on clog.c, but the robustness requirements
+ * are completely different from pg_clog, because we only need to remember
+ * pg_csnlog information for currently-open and recently committed
+ * transactions.  Thus, there is no need to preserve data over a crash and
+ * restart.
+ *
+ * There are no XLOG interactions since we do not care about preserving
+ * data across crashes.  During database startup, we simply force the
+ * currently-active page of CSNLOG to zeroes.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/access/transam/csnlog.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
+#include "access/slru.h"
+#include "access/subtrans.h"
+#include "access/transam.h"
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "utils/snapmgr.h"
+
+/*
+ * Defines for CSNLOG page sizes.  A page is the same BLCKSZ as is used
+ * everywhere else in Postgres.
+ *
+ * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
+ * CSNLOG page numbering also wraps around at 0xFFFFFFFF/CSNLOG_XACTS_PER_PAGE,
+ * and CSNLOG segment numbering at
+ * 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
+ * explicit notice of that fact in this module, except when comparing segment
+ * and page numbers in TruncateCSNLOG (see CSNLOGPagePrecedes).
+ */
+
+/* We store the commit LSN for each xid */
+#define CSNLOG_XACTS_PER_PAGE (BLCKSZ / sizeof(CommitSeqNo))
+
+#define TransactionIdToPage(xid)	((xid) / (TransactionId) CSNLOG_XACTS_PER_PAGE)
+#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CSNLOG_XACTS_PER_PAGE)
+
+/* We allocate new log pages in batches */
+#define BATCH_SIZE 128
+
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
+static SlruCtlData CsnlogCtlData;
+
+#define CsnlogCtl (&CsnlogCtlData)
+
+
+static int	ZeroCSNLOGPage(int pageno);
+static bool CSNLOGPagePrecedes(int page1, int page2);
+static void CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids,
+						   CommitSeqNo csn, int pageno);
+static void CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno);
+
+/*
+ * CSNLogSetCommitSeqNo
+ *
+ * Record the status and CSN of transaction entries in the commit log for a
+ * transaction and its subtransaction tree. Take care to ensure this is
+ * efficient, and as atomic as possible.
+ *
+ * xid is a single xid to set status for. This will typically be the
+ * top level transactionid for a top level commit or abort. It can
+ * also be a subtransaction when we record transaction aborts.
+ *
+ * subxids is an array of xids of length nsubxids, representing subtransactions
+ * in the tree of xid. In various cases nsubxids may be zero.
+ *
+ * csn is the commit sequence number of the transaction. It should be
+ * InvalidCommitSeqNo for abort cases.
+ *
+ * Note: This doesn't guarantee atomicity. The caller can use the
+ * COMMITSEQNO_COMMITTING special value for that.
+ */
+void
+CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn)
+{
+	int			pageno;
+	int			i = 0;
+	int			offset = 0;
+
+	if (csn == InvalidCommitSeqNo || xid == BootstrapTransactionId)
+	{
+		if (IsBootstrapProcessingMode())
+			csn = COMMITSEQNO_FROZEN;
+		else
+			elog(ERROR, "cannot mark transaction committed without CSN");
+	}
+
+	pageno = TransactionIdToPage(xid);		/* get page of parent */
+	for (;;)
+	{
+		int			num_on_page = 0;
+
+		while (i < nsubxids && TransactionIdToPage(subxids[i]) == pageno)
+		{
+			num_on_page++;
+			i++;
+		}
+
+		CSNLogSetPageStatus(xid,
+							num_on_page, subxids + offset,
+							csn, pageno);
+		if (i >= nsubxids)
+			break;
+
+		offset = i;
+		pageno = TransactionIdToPage(subxids[offset]);
+		xid = InvalidTransactionId;
+	}
+}
+
+/*
+ * Record the final state of transaction entries in the csn log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as TransactionIdSetTreeStatus()
+ */
+static void
+CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids,
+						   CommitSeqNo csn, int pageno)
+{
+	int			slotno;
+	int			i;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+	/* Subtransactions first, if needed ... */
+	for (i = 0; i < nsubxids; i++)
+	{
+		Assert(CsnlogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+		CSNLogSetCSN(subxids[i],	csn, slotno);
+	}
+
+	/* ... then the main transaction */
+	if (TransactionIdIsValid(xid))
+		CSNLogSetCSN(xid, csn, slotno);
+
+	CsnlogCtl->shared->page_dirty[slotno] = true;
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+
+/*
+ * Record the parent of a subtransaction in the subtrans log.
+ *
+ * In some cases we may need to overwrite an existing value.
+ */
+void
+SubTransSetParent(TransactionId xid, TransactionId parent)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+	CommitSeqNo *ptr;
+	CommitSeqNo newcsn;
+
+	Assert(TransactionIdIsValid(parent));
+	Assert(TransactionIdFollows(xid, parent));
+
+	newcsn = CSN_SUBTRANS_BIT | (uint64) parent;
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	slotno = SimpleLruReadPage(CsnlogCtl, pageno, true, xid);
+	ptr = (CommitSeqNo *) CsnlogCtl->shared->page_buffer[slotno];
+	ptr += entryno;
+
+   /*
+	* It's possible we'll try to set the parent xid multiple times
+	* but we shouldn't ever be changing the xid from one valid xid
+	* to another valid xid, which would corrupt the data structure.
+	*/
+   if (*ptr != newcsn)
+   {
+	   Assert(*ptr == COMMITSEQNO_INPROGRESS);
+	   *ptr = newcsn;
+	   CsnlogCtl->shared->page_dirty[slotno] = true;
+   }
+
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Interrogate the parent of a transaction in the csnlog.
+ */
+TransactionId
+SubTransGetParent(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	csn = CSNLogGetCommitSeqNo(xid);
+
+	if (COMMITSEQNO_IS_SUBTRANS(csn))
+		return (TransactionId) (csn & 0xFFFFFFFF);
+	else
+		return InvalidTransactionId;
+}
+
+/*
+ * SubTransGetTopmostTransaction
+ *
+ * Returns the topmost transaction of the given transaction id.
+ *
+ * Because we cannot look back further than TransactionXmin, it is possible
+ * that this function will lie and return an intermediate subtransaction ID
+ * instead of the true topmost parent ID.  This is OK, because in practice
+ * we only care about detecting whether the topmost parent is still running
+ * or is part of a current snapshot's list of still-running transactions.
+ * Therefore, any XID before TransactionXmin is as good as any other.
+ */
+TransactionId
+SubTransGetTopmostTransaction(TransactionId xid)
+{
+	TransactionId parentXid = xid,
+				previousXid = xid;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	while (TransactionIdIsValid(parentXid))
+	{
+		previousXid = parentXid;
+		if (TransactionIdPrecedes(parentXid, TransactionXmin))
+			break;
+		parentXid = SubTransGetParent(parentXid);
+
+	   /*
+		* By convention the parent xid gets allocated first, so should
+		* always precede the child xid. Anything else points to a corrupted
+		* data structure that could lead to an infinite loop, so exit.
+		*/
+	   if (!TransactionIdPrecedes(parentXid, previousXid))
+		   elog(ERROR, "pg_csnlog contains invalid entry: xid %u points to parent xid %u",
+						   previousXid, parentXid);
+	}
+
+	Assert(TransactionIdIsValid(previousXid));
+
+	return previousXid;
+}
+
+
+
+
+/*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CSNLogControlLock held
+ */
+static void
+CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno)
+{
+	int			entryno = TransactionIdToPgIndex(xid);
+	CommitSeqNo *ptr;
+
+	ptr = (CommitSeqNo *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+	/*
+	 * Current state change should be from 0 to target state. (Allow
+	 * setting it again to same value.)
+	 */
+	Assert(COMMITSEQNO_IS_INPROGRESS(*ptr) ||
+		   COMMITSEQNO_IS_COMMITTING(*ptr) ||
+		   COMMITSEQNO_IS_SUBTRANS(*ptr) ||
+		   *ptr == csn);
+
+	*ptr = csn;
+}
+
+/*
+ * Interrogate the state of a transaction in the commit log.
+ *
+ * Aside from the actual commit status, this function returns (into *lsn)
+ * an LSN that is late enough to be able to guarantee that if we flush up to
+ * that LSN then we will have flushed the transaction's commit record to disk.
+ * The result is not necessarily the exact LSN of the transaction's commit
+ * record!	For example, for long-past transactions (those whose clog pages
+ * already migrated to disk), we'll return InvalidXLogRecPtr.  Also, because
+ * we group transactions on the same clog page to conserve storage, we might
+ * return the LSN of a later transaction that falls into the same group.
+ *
+ * NB: this is a low-level routine and is NOT the preferred entry point
+ * for most uses; TransactionLogFetch() in transam.c is the intended caller.
+ */
+CommitSeqNo
+CSNLogGetCommitSeqNo(TransactionId xid)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+	XLogRecPtr *ptr;
+	XLogRecPtr	commitlsn;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	if (!TransactionIdIsNormal(xid))
+	{
+		if (xid == InvalidTransactionId)
+			return COMMITSEQNO_ABORTED;
+		if (xid == FrozenTransactionId || xid == BootstrapTransactionId)
+			return COMMITSEQNO_FROZEN;
+	}
+
+	/* lock is acquired by SimpleLruReadPage_ReadOnly */
+
+	slotno = SimpleLruReadPage_ReadOnly(CsnlogCtl, pageno, xid);
+	ptr = (XLogRecPtr *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+	commitlsn = *ptr;
+
+	LWLockRelease(CSNLogControlLock);
+
+	return commitlsn;
+}
+
+/* Find the next xid that is in progress */
+TransactionId CSNLogGetNextInProgressXid(TransactionId xid,
+										 TransactionId end)
+{
+	Assert(TransactionIdIsValid(TransactionXmin));
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	for (;;)
+	{
+		int pageno;
+		int slotno;
+		int entryno;
+
+		if (!TransactionIdPrecedes(xid, end))
+		{
+			goto end;
+		}
+		pageno = TransactionIdToPage(xid);
+		slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+		for (entryno = TransactionIdToPgIndex(xid); entryno < CSNLOG_XACTS_PER_PAGE;
+			 entryno++)
+		{
+			CommitSeqNo csn;
+
+			if (!TransactionIdPrecedes(xid, end))
+			{
+				goto end;
+			}
+			csn = * (XLogRecPtr *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+			if (!COMMITSEQNO_IS_ABORTED(csn) && !COMMITSEQNO_IS_COMMITTED(csn))
+			{
+				goto end;
+			}
+			TransactionIdAdvance(xid);
+		}
+	}
+
+end:
+	LWLockRelease(CSNLogControlLock);
+
+	return xid;
+}
+
+/*
+ * Number of shared CSNLOG buffers.
+ */
+Size
+CSNLOGShmemBuffers(void)
+{
+	return Min(8192, Max(BATCH_SIZE, NBuffers / 512));
+}
+
+/*
+ * Initialization of shared memory for CSNLOG
+ */
+Size
+CSNLOGShmemSize(void)
+{
+	return SimpleLruShmemSize(CSNLOGShmemBuffers(), 0);
+}
+
+void
+CSNLOGShmemInit(void)
+{
+	CsnlogCtl->PagePrecedes = CSNLOGPagePrecedes;
+	SimpleLruInit(CsnlogCtl, "CSNLOG Ctl", CSNLOGShmemBuffers(), 0,
+				  CSNLogControlLock, "pg_csnlog", LWTRANCHE_CSNLOG_BUFFERS);
+}
+
+/*
+ * This func must be called ONCE on system install.  It creates
+ * the initial CSNLOG segment.  (The pg_csnlog directory is assumed to
+ * have been created by initdb, and CSNLOGShmemInit must have been
+ * called already.)
+ */
+void
+BootStrapCSNLOG(void)
+{
+	int			slotno;
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Create and zero the first page of the commit log */
+	slotno = ZeroCSNLOGPage(0);
+
+	/* Make sure it's written out */
+	SimpleLruWritePage(CsnlogCtl, slotno);
+	Assert(!CsnlogCtl->shared->page_dirty[slotno]);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Initialize (or reinitialize) a page of CLOG to zeroes.
+ * If writeXlog is TRUE, also emit an XLOG record saying we did this.
+ *
+ * The page is not actually written, just set up in shared memory.
+ * The slot number of the new page is returned.
+ *
+ * Control lock must be held at entry, and will be held at exit.
+ */
+static int
+ZeroCSNLOGPage(int pageno)
+{
+	return SimpleLruZeroPage(CsnlogCtl, pageno);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend startup,
+ * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+ *
+ * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
+ * if there are none.
+ */
+void
+StartupCSNLOG(TransactionId oldestActiveXID)
+{
+	int			startPage;
+	int			endPage;
+
+	/*
+	 * Since we don't expect pg_csnlog to be valid across crashes, we
+	 * initialize the currently-active page(s) to zeroes during startup.
+	 * Whenever we advance into a new page, ExtendCSNLOG will likewise zero
+	 * the new page without regard to whatever was previously on disk.
+	 */
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	startPage = TransactionIdToPage(oldestActiveXID);
+	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
+	endPage = ((endPage + BATCH_SIZE - 1) / BATCH_SIZE) * BATCH_SIZE;
+
+	while (startPage != endPage)
+	{
+		(void) ZeroCSNLOGPage(startPage);
+		startPage++;
+		/* must account for wraparound */
+		if (startPage > TransactionIdToPage(MaxTransactionId))
+			startPage = 0;
+	}
+	(void) ZeroCSNLOGPage(startPage);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend shutdown
+ */
+void
+ShutdownCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely as a debugging aid.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(false);
+	SimpleLruFlush(CsnlogCtl, false);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(false);
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ */
+void
+TrimCSNLOG(void)
+{
+	TransactionId xid = ShmemVariableCache->nextXid;
+	int			pageno = TransactionIdToPage(xid);
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Re-Initialize our idea of the latest page number.
+	 */
+	CsnlogCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Zero out the remainder of the current clog page.  Under normal
+	 * circumstances it should be zeroes already, but it seems at least
+	 * theoretically possible that XLOG replay will have settled on a nextXID
+	 * value that is less than the last XID actually used and marked by the
+	 * previous database lifecycle (since subtransaction commit writes clog
+	 * but makes no WAL entry).  Let's just be safe. (We need not worry about
+	 * pages beyond the current one, since those will be zeroed when first
+	 * used.  For the same reason, there is no need to do anything when
+	 * nextXid is exactly at a page boundary; and it's likely that the
+	 * "current" page doesn't exist yet in that case.)
+	 */
+	if (TransactionIdToPgIndex(xid) != 0)
+	{
+		int			entryno = TransactionIdToPgIndex(xid);
+		int			byteno = entryno * sizeof(XLogRecPtr);
+		int			slotno;
+		char	   *byteptr;
+
+		slotno = SimpleLruReadPage(CsnlogCtl, pageno, false, xid);
+
+		byteptr = CsnlogCtl->shared->page_buffer[slotno] + byteno;
+
+		/* Zero the rest of the page */
+		MemSet(byteptr, 0, BLCKSZ - byteno);
+
+		CsnlogCtl->shared->page_dirty[slotno] = true;
+	}
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Perform a checkpoint --- either during shutdown, or on-the-fly
+ */
+void
+CheckPointCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely to improve the odds that writing of dirty pages is done by
+	 * the checkpoint process and not by backends.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(true);
+	SimpleLruFlush(CsnlogCtl, true);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(true);
+}
+
+
+/*
+ * Make sure that CSNLOG has room for a newly-allocated XID.
+ *
+ * NB: this is called while holding XidGenLock.  We want it to be very fast
+ * most of the time; even when it's not so fast, no actual I/O need happen
+ * unless we're forced to write out a dirty clog or xlog page to make room
+ * in shared memory.
+ */
+void
+ExtendCSNLOG(TransactionId newestXact)
+{
+	int i;
+	int			pageno;
+
+	/*
+	 * No work except at first XID of a page.  But beware: just after
+	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
+	 */
+	if (TransactionIdToPgIndex(newestXact) != 0 &&
+		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
+		return;
+
+	pageno = TransactionIdToPage(newestXact);
+
+	if (pageno % BATCH_SIZE) {
+		return;
+	}
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Zero the page and make an XLOG entry about it */
+	for (i = pageno; i < pageno + BATCH_SIZE; i++) {
+		ZeroCSNLOGPage(i);
+	}
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Remove all CSNLOG segments before the one holding the passed transaction ID
+ *
+ * This is normally called during checkpoint, with oldestXact being the
+ * oldest TransactionXmin of any running transaction.
+ */
+void
+TruncateCSNLOG(TransactionId oldestXact)
+{
+	int			cutoffPage;
+
+	/*
+	 * The cutoff point is the start of the segment containing oldestXact. We
+	 * pass the *page* containing oldestXact to SimpleLruTruncate.
+	 */
+	cutoffPage = TransactionIdToPage(oldestXact);
+
+	SimpleLruTruncate(CsnlogCtl, cutoffPage);
+}
+
+
+/*
+ * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ *
+ * We need to use comparison of TransactionIds here in order to do the right
+ * thing with wraparound XID arithmetic.  However, if we are asked about
+ * page number zero, we don't want to hand InvalidTransactionId to
+ * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
+ * offset both xids by FirstNormalTransactionId to avoid that.
+ */
+static bool
+CSNLOGPagePrecedes(int page1, int page2)
+{
+	TransactionId xid1;
+	TransactionId xid2;
+
+	xid1 = ((TransactionId) page1) * CSNLOG_XACTS_PER_PAGE;
+	xid1 += FirstNormalTransactionId;
+	xid2 = ((TransactionId) page2) * CSNLOG_XACTS_PER_PAGE;
+	xid2 += FirstNormalTransactionId;
+
+	return TransactionIdPrecedes(xid1, xid2);
+}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 682eef4..a08ab45 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -69,6 +69,7 @@
 #include "postgres.h"
 
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -513,9 +514,11 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
 
 	for (i = 0, j = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid) ||
+		TransactionIdStatus xidstatus = TransactionIdGetStatus(members[i].xid);
+
+		if (xidstatus == XID_INPROGRESS ||
 			(ISUPDATE_from_mxstatus(members[i].status) &&
-			 TransactionIdDidCommit(members[i].xid)))
+			 xidstatus == XID_COMMITTED))
 		{
 			newMembers[j].xid = members[i].xid;
 			newMembers[j++].status = members[i].status;
@@ -590,7 +593,7 @@ MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly)
 	 */
 	for (i = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid))
+		if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 		{
 			debug_elog4(DEBUG2, "IsRunning: member %d (%u) is running",
 						i, members[i].xid);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index d037c36..5eb6366 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -57,6 +57,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/hsearch.h"
 #include "miscadmin.h"
 
 
@@ -81,6 +82,13 @@ typedef struct SlruFlushData
 
 typedef struct SlruFlushData *SlruFlush;
 
+/* An entry of page-to-slot hash map */
+typedef struct PageSlotEntry
+{
+	int page;
+	int slot;
+} PageSlotEntry;
+
 /*
  * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
  * of arguments!
@@ -166,11 +174,24 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  LWLock *ctllock, const char *subdir, int tranche_id)
 {
 	SlruShared	shared;
+	char *hashName;
+	HTAB *htab;
 	bool		found;
+	HASHCTL info;
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
 										  &found);
+	hashName = psprintf("%s_hash", name);
+
+	MemSet(&info, 0, sizeof(info));
+	info.keysize = sizeof(((PageSlotEntry*)0)->page);
+	info.entrysize = sizeof(PageSlotEntry);
+
+	htab = ShmemInitHash(hashName, nslots, nslots, &info,
+						 HASH_ELEM | HASH_BLOBS | HASH_FIXED_SIZE);
+
+	pfree(hashName);
 
 	if (!IsUnderPostmaster)
 	{
@@ -233,7 +254,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 		}
 
 		/* Should fit to estimated shmem size */
-		Assert(ptr - (char *) shared  <= SimpleLruShmemSize(nslots, nlsns));
+		Assert(ptr - (char *) shared <= SimpleLruShmemSize(nslots, nlsns));
 	}
 	else
 		Assert(found);
@@ -247,6 +268,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 * assume caller set PagePrecedes.
 	 */
 	ctl->shared = shared;
+	ctl->pageToSlot = htab;
 	ctl->do_fsync = true;		/* default behavior */
 	StrNCpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -264,6 +286,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	PageSlotEntry *entry = NULL;
 
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
@@ -273,7 +296,16 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 		   shared->page_number[slotno] == pageno);
 
 	/* Mark the slot as containing this page */
+	if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+	{
+		int oldpageno = shared->page_number[slotno];
+		entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+		Assert(entry != NULL);
+	}
+
 	shared->page_number[slotno] = pageno;
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+	entry->slot = slotno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
 	SlruRecentlyUsed(shared, slotno);
@@ -343,8 +375,14 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				int oldpageno = shared->page_number[slotno];
+				PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+
+				Assert(entry != NULL);
 				shared->page_status[slotno] = SLRU_PAGE_EMPTY;
-			else				/* write_in_progress */
+			}
+			else	/* write_in_progress */
 			{
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
 				shared->page_dirty[slotno] = true;
@@ -382,6 +420,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	{
 		int			slotno;
 		bool		ok;
+		PageSlotEntry *entry;
 
 		/* See if page already is in memory; if not, pick victim slot */
 		slotno = SlruSelectLRUPage(ctl, pageno);
@@ -413,7 +452,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				!shared->page_dirty[slotno]));
 
 		/* Mark the slot read-busy */
+		if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+		}
+
 		shared->page_number[slotno] = pageno;
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+		entry->slot = slotno;
 		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
 		shared->page_dirty[slotno] = false;
 
@@ -436,7 +484,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
 			   !shared->page_dirty[slotno]);
 
-		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
+		if (ok)
+		{
+			shared->page_status[slotno] = SLRU_PAGE_VALID;
+		}
+		else
+		{
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &pageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+		}
 
 		LWLockRelease(&shared->buffer_locks[slotno].lock);
 
@@ -467,19 +524,22 @@ int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
-	int			slotno;
+	PageSlotEntry *entry = NULL;
+	int slotno;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+	if (entry != NULL)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
-			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+		slotno = entry->slot;
+		Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+			&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
-			/* See comments for SlruRecentlyUsed macro */
+			Assert(shared->page_number[slotno] == pageno);
 			SlruRecentlyUsed(shared, slotno);
 			return slotno;
 		}
@@ -493,6 +553,42 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 }
 
 /*
+ * Same as SimpleLruReadPage_ReadOnly, but the shared lock must be held by the caller
+ * and will be held at exit.
+ */
+int
+SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno, TransactionId xid)
+{
+	SlruShared	shared = ctl->shared;
+	int slotno;
+	PageSlotEntry *entry;
+
+	for (;;)
+	{
+		/* See if page is already in a buffer */
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+		if (entry != NULL)
+		{
+			slotno = entry->slot;
+			if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+				&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				Assert(shared->page_number[slotno] == pageno);
+				SlruRecentlyUsed(shared, slotno);
+				return slotno;
+			}
+		}
+
+		/* No luck, so switch to normal exclusive lock and do regular read */
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		SimpleLruReadPage(ctl, pageno, true, xid);
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_SHARED);
+	}
+}
+
+/*
  * Write a page from a shared buffer, if necessary.
  * Does nothing if the specified slot is not dirty.
  *
@@ -976,9 +1072,9 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestvalidslot = 0;	/* keep compiler quiet */
 		int			best_valid_delta = -1;
 		int			best_valid_page_number = 0; /* keep compiler quiet */
-		int			bestinvalidslot = 0;	/* keep compiler quiet */
+		int			bestinvalidslot = 0;		/* keep compiler quiet */
 		int			best_invalid_delta = -1;
-		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			best_invalid_page_number = 0;		/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
 		for (slotno = 0; slotno < shared->num_slots; slotno++)
@@ -1214,6 +1310,9 @@ restart:;
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1285,6 +1384,9 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
deleted file mode 100644
index f640661..0000000
--- a/src/backend/access/transam/subtrans.c
+++ /dev/null
@@ -1,394 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * subtrans.c
- *		PostgreSQL subtransaction-log manager
- *
- * The pg_subtrans manager is a pg_xact-like manager that stores the parent
- * transaction Id for each transaction.  It is a fundamental part of the
- * nested transactions implementation.  A main transaction has a parent
- * of InvalidTransactionId, and each subtransaction has its immediate parent.
- * The tree can easily be walked from child to parent, but not in the
- * opposite direction.
- *
- * This code is based on xact.c, but the robustness requirements
- * are completely different from pg_xact, because we only need to remember
- * pg_subtrans information for currently-open transactions.  Thus, there is
- * no need to preserve data over a crash and restart.
- *
- * There are no XLOG interactions since we do not care about preserving
- * data across crashes.  During database startup, we simply force the
- * currently-active page of SUBTRANS to zeroes.
- *
- * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/backend/access/transam/subtrans.c
- *
- *-------------------------------------------------------------------------
- */
-#include "postgres.h"
-
-#include "access/slru.h"
-#include "access/subtrans.h"
-#include "access/transam.h"
-#include "pg_trace.h"
-#include "utils/snapmgr.h"
-
-
-/*
- * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
- * SubTrans page numbering also wraps around at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
- * them in StartupSUBTRANS.
- */
-
-/* We need four bytes per xact */
-#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
-
-#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-
-
-/*
- * Link to shared-memory data structures for SUBTRANS control
- */
-static SlruCtlData SubTransCtlData;
-
-#define SubTransCtl  (&SubTransCtlData)
-
-
-static int	ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
-
-
-/*
- * Record the parent of a subtransaction in the subtrans log.
- */
-void
-SubTransSetParent(TransactionId xid, TransactionId parent)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-
-	Assert(TransactionIdIsValid(parent));
-	Assert(TransactionIdFollows(xid, parent));
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	/*
-	 * It's possible we'll try to set the parent xid multiple times but we
-	 * shouldn't ever be changing the xid from one valid xid to another valid
-	 * xid, which would corrupt the data structure.
-	 */
-	if (*ptr != parent)
-	{
-		Assert(*ptr == InvalidTransactionId);
-		*ptr = parent;
-		SubTransCtl->shared->page_dirty[slotno] = true;
-	}
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Interrogate the parent of a transaction in the subtrans log.
- */
-TransactionId
-SubTransGetParent(TransactionId xid)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-	TransactionId parent;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	/* Bootstrap and frozen XIDs have no parent */
-	if (!TransactionIdIsNormal(xid))
-		return InvalidTransactionId;
-
-	/* lock is acquired by SimpleLruReadPage_ReadOnly */
-
-	slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	parent = *ptr;
-
-	LWLockRelease(SubtransControlLock);
-
-	return parent;
-}
-
-/*
- * SubTransGetTopmostTransaction
- *
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID.  This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
- */
-TransactionId
-SubTransGetTopmostTransaction(TransactionId xid)
-{
-	TransactionId parentXid = xid,
-				previousXid = xid;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	while (TransactionIdIsValid(parentXid))
-	{
-		previousXid = parentXid;
-		if (TransactionIdPrecedes(parentXid, TransactionXmin))
-			break;
-		parentXid = SubTransGetParent(parentXid);
-
-		/*
-		 * By convention the parent xid gets allocated first, so should always
-		 * precede the child xid. Anything else points to a corrupted data
-		 * structure that could lead to an infinite loop, so exit.
-		 */
-		if (!TransactionIdPrecedes(parentXid, previousXid))
-			elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
-				 previousXid, parentXid);
-	}
-
-	Assert(TransactionIdIsValid(previousXid));
-
-	return previousXid;
-}
-
-
-/*
- * Initialization of shared memory for SUBTRANS
- */
-Size
-SUBTRANSShmemSize(void)
-{
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
-}
-
-void
-SUBTRANSShmemInit(void)
-{
-	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransControlLock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFERS);
-	/* Override default assumption that writes should be fsync'd */
-	SubTransCtl->do_fsync = false;
-}
-
-/*
- * This func must be called ONCE on system install.  It creates
- * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
- * have been created by the initdb shell script, and SUBTRANSShmemInit
- * must have been called already.)
- *
- * Note: it's not really necessary to create the initial segment now,
- * since slru.c would create it on first write anyway.  But we may as well
- * do it to be sure the directory is set up correctly.
- */
-void
-BootStrapSUBTRANS(void)
-{
-	int			slotno;
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Create and zero the first page of the subtrans log */
-	slotno = ZeroSUBTRANSPage(0);
-
-	/* Make sure it's written out */
-	SimpleLruWritePage(SubTransCtl, slotno);
-	Assert(!SubTransCtl->shared->page_dirty[slotno]);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Initialize (or reinitialize) a page of SUBTRANS to zeroes.
- *
- * The page is not actually written, just set up in shared memory.
- * The slot number of the new page is returned.
- *
- * Control lock must be held at entry, and will be held at exit.
- */
-static int
-ZeroSUBTRANSPage(int pageno)
-{
-	return SimpleLruZeroPage(SubTransCtl, pageno);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend startup,
- * after StartupXLOG has initialized ShmemVariableCache->nextXid.
- *
- * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
- * if there are none.
- */
-void
-StartupSUBTRANS(TransactionId oldestActiveXID)
-{
-	int			startPage;
-	int			endPage;
-
-	/*
-	 * Since we don't expect pg_subtrans to be valid across crashes, we
-	 * initialize the currently-active page(s) to zeroes during startup.
-	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
-	 * the new page without regard to whatever was previously on disk.
-	 */
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	startPage = TransactionIdToPage(oldestActiveXID);
-	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
-
-	while (startPage != endPage)
-	{
-		(void) ZeroSUBTRANSPage(startPage);
-		startPage++;
-		/* must account for wraparound */
-		if (startPage > TransactionIdToPage(MaxTransactionId))
-			startPage = 0;
-	}
-	(void) ZeroSUBTRANSPage(startPage);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend shutdown
- */
-void
-ShutdownSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely as a debugging aid.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
-	SimpleLruFlush(SubTransCtl, false);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
-}
-
-/*
- * Perform a checkpoint --- either during shutdown, or on-the-fly
- */
-void
-CheckPointSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely to improve the odds that writing of dirty pages is done by
-	 * the checkpoint process and not by backends.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
-	SimpleLruFlush(SubTransCtl, true);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
-}
-
-
-/*
- * Make sure that SUBTRANS has room for a newly-allocated XID.
- *
- * NB: this is called while holding XidGenLock.  We want it to be very fast
- * most of the time; even when it's not so fast, no actual I/O need happen
- * unless we're forced to write out a dirty subtrans page to make room
- * in shared memory.
- */
-void
-ExtendSUBTRANS(TransactionId newestXact)
-{
-	int			pageno;
-
-	/*
-	 * No work except at first XID of a page.  But beware: just after
-	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
-	 */
-	if (TransactionIdToEntry(newestXact) != 0 &&
-		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
-		return;
-
-	pageno = TransactionIdToPage(newestXact);
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Zero the page */
-	ZeroSUBTRANSPage(pageno);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-
-/*
- * Remove all SUBTRANS segments before the one holding the passed transaction ID
- *
- * This is normally called during checkpoint, with oldestXact being the
- * oldest TransactionXmin of any running transaction.
- */
-void
-TruncateSUBTRANS(TransactionId oldestXact)
-{
-	int			cutoffPage;
-
-	/*
-	 * The cutoff point is the start of the segment containing oldestXact. We
-	 * pass the *page* containing oldestXact to SimpleLruTruncate.  We step
-	 * back one transaction to avoid passing a cutoff page that hasn't been
-	 * created yet in the rare case that oldestXact would be the first item on
-	 * a page and oldestXact == next XID.  In that case, if we didn't subtract
-	 * one, we'd trigger SimpleLruTruncate's wraparound detection.
-	 */
-	TransactionIdRetreat(oldestXact);
-	cutoffPage = TransactionIdToPage(oldestXact);
-
-	SimpleLruTruncate(SubTransCtl, cutoffPage);
-}
-
-
-/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
- */
-static bool
-SubTransPagePrecedes(int page1, int page2)
-{
-	TransactionId xid1;
-	TransactionId xid2;
-
-	xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
-	xid1 += FirstNormalTransactionId;
-	xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
-	xid2 += FirstNormalTransactionId;
-
-	return TransactionIdPrecedes(xid1, xid2);
-}
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 968b232..16fb32c 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -3,6 +3,15 @@
  * transam.c
  *	  postgres transaction (commit) log interface routines
  *
+ * This module contains high level functions for managing the status
+ * of transactions. It sits on top of two lower level structures: the
+ * CLOG, and the CSNLOG. The CLOG is a permanent on-disk structure that
+ * tracks the committed/aborted status for each transaction ID. The CSNLOG
+ * tracks *when* each transaction ID committed (or aborted). The CSNLOG
+ * is used when checking the status of recent transactions that might still
+ * be in-progress, and it is reset at server startup. The CLOG is used for
+ * older transactions that are known to have completed (or crashed).
+ *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -10,56 +19,49 @@
  * IDENTIFICATION
  *	  src/backend/access/transam/transam.c
  *
- * NOTES
- *	  This file contains the high level access-method interface to the
- *	  transaction system.
- *
  *-------------------------------------------------------------------------
  */
 
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "storage/lmgr.h"
 #include "utils/snapmgr.h"
 
 /*
- * Single-item cache for results of TransactionLogFetch.  It's worth having
+ * Single-item cache for results of TransactionIdGetCommitSeqNo.  It's worth
+ * having
  * such a cache because we frequently find ourselves repeatedly checking the
  * same XID, for example when scanning a table just after a bulk insert,
  * update, or delete.
  */
 static TransactionId cachedFetchXid = InvalidTransactionId;
-static XidStatus cachedFetchXidStatus;
-static XLogRecPtr cachedCommitLSN;
+static CommitSeqNo cachedCSN;
 
-/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
-
-
-/* ----------------------------------------------------------------
- *		Postgres log access method interface
- *
- *		TransactionLogFetch
- * ----------------------------------------------------------------
+/*
+ * Also have a (separate) cache for CLogGetCommitLSN()
  */
+static TransactionId cachedLSNFetchXid = InvalidTransactionId;
+static XLogRecPtr cachedCommitLSN;
 
 /*
- * TransactionLogFetch --- fetch commit status of specified transaction id
+ * TransactionIdGetCommitSeqNo --- fetch CSN of specified transaction id
  */
-static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+CommitSeqNo
+TransactionIdGetCommitSeqNo(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
-	XLogRecPtr	xidlsn;
+	CommitSeqNo	csn;
 
 	/*
 	 * Before going to the commit log manager, check our single item cache to
 	 * see if we didn't just check the transaction status a moment ago.
 	 */
 	if (TransactionIdEquals(transactionId, cachedFetchXid))
-		return cachedFetchXidStatus;
+		return cachedCSN;
 
 	/*
 	 * Also, check to see if the transaction ID is a permanent one.
@@ -67,53 +69,63 @@ TransactionLogFetch(TransactionId transactionId)
 	if (!TransactionIdIsNormal(transactionId))
 	{
 		if (TransactionIdEquals(transactionId, BootstrapTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
+			return COMMITSEQNO_FROZEN;
 		if (TransactionIdEquals(transactionId, FrozenTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
-		return TRANSACTION_STATUS_ABORTED;
+			return COMMITSEQNO_FROZEN;
+		return COMMITSEQNO_ABORTED;
 	}
 
 	/*
-	 * Get the transaction status.
+	 * If the XID is older than TransactionXmin, check the clog. Otherwise
+	 * check the csnlog.
 	 */
-	xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+	Assert(TransactionIdIsValid(TransactionXmin));
+	if (TransactionIdPrecedes(transactionId, TransactionXmin))
+	{
+		XLogRecPtr lsn;
+
+		if (CLogGetStatus(transactionId, &lsn) == CLOG_XID_STATUS_COMMITTED)
+			csn = COMMITSEQNO_FROZEN;
+		else
+			csn = COMMITSEQNO_ABORTED;
+	}
+	else
+	{
+		csn = CSNLogGetCommitSeqNo(transactionId);
+
+		if (csn == COMMITSEQNO_COMMITTING)
+		{
+			/*
+			 * If the transaction is committing at this very instant, and
+			 * hasn't set its CSN yet, wait for it to finish doing so.
+			 *
+			 * XXX: Alternatively, we could wait on the heavy-weight lock on
+			 * the XID. that'd make TransactionIdCommitTree() slightly
+			 * cheaper, as it wouldn't need to acquire CommitSeqNoLock (even
+			 * in shared mode).
+			 */
+			LWLockAcquire(CommitSeqNoLock, LW_EXCLUSIVE);
+			LWLockRelease(CommitSeqNoLock);
+
+			csn = CSNLogGetCommitSeqNo(transactionId);
+			Assert(csn != COMMITSEQNO_COMMITTING);
+		}
+	}
 
 	/*
-	 * Cache it, but DO NOT cache status for unfinished or sub-committed
-	 * transactions!  We only cache status that is guaranteed not to change.
+	 * Cache it, but DO NOT cache status for unfinished transactions!
+	 * We only cache status that is guaranteed not to change.
 	 */
-	if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
-		xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn) ||
+		COMMITSEQNO_IS_ABORTED(csn))
 	{
 		cachedFetchXid = transactionId;
-		cachedFetchXidStatus = xidstatus;
-		cachedCommitLSN = xidlsn;
+		cachedCSN = csn;
 	}
 
-	return xidstatus;
+	return csn;
 }
 
-/* ----------------------------------------------------------------
- *						Interface functions
- *
- *		TransactionIdDidCommit
- *		TransactionIdDidAbort
- *		========
- *		   these functions test the transaction status of
- *		   a specified transaction id.
- *
- *		TransactionIdCommitTree
- *		TransactionIdAsyncCommitTree
- *		TransactionIdAbortTree
- *		========
- *		   these functions set the transaction status of the specified
- *		   transaction tree.
- *
- * See also TransactionIdIsInProgress, which once was in this module
- * but now lives in procarray.c.
- * ----------------------------------------------------------------
- */
-
 /*
  * TransactionIdDidCommit
  *		True iff transaction associated with the identifier did commit.
@@ -124,50 +136,14 @@ TransactionLogFetch(TransactionId transactionId)
 bool							/* true if given transaction committed */
 TransactionIdDidCommit(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked committed, it's committed.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 *
-	 * Originally we Assert'ed that the result of SubTransGetParent was not
-	 * zero. However with the introduction of prepared transactions, there can
-	 * be a window just after database startup where we do not have complete
-	 * knowledge in pg_subtrans of the transactions after TransactionXmin.
-	 * StartupSUBTRANS() has ensured that any missing information will be
-	 * zeroed.  Since this case should not happen under normal conditions, it
-	 * seems reasonable to emit a WARNING for it.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return false;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return false;
-		}
-		return TransactionIdDidCommit(parentXid);
-	}
-
-	/*
-	 * It's not committed.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
@@ -180,70 +156,35 @@ TransactionIdDidCommit(TransactionId transactionId)
 bool							/* true if given transaction aborted */
 TransactionIdDidAbort(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked aborted, it's aborted.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_ABORTED)
+	if (COMMITSEQNO_IS_ABORTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return true;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			/* see notes in TransactionIdDidCommit */
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return true;
-		}
-		return TransactionIdDidAbort(parentXid);
-	}
-
-	/*
-	 * It's not aborted.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
- * TransactionIdIsKnownCompleted
- *		True iff transaction associated with the identifier is currently
- *		known to have either committed or aborted.
+ * Returns the status of the tranaction.
  *
- * This does NOT look into pg_xact but merely probes our local cache
- * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated tqual.c checks for the same XID.  If this isn't extremely fast
- * then it will be counterproductive.
- *
- * Note:
- *		Assumes transaction identifier is valid.
+ * Note that this treats a a crashed transaction as still in-progress,
+ * until it falls off the xmin horizon.
  */
-bool
-TransactionIdIsKnownCompleted(TransactionId transactionId)
+TransactionIdStatus
+TransactionIdGetStatus(TransactionId xid)
 {
-	if (TransactionIdEquals(transactionId, cachedFetchXid))
-	{
-		/* If it's in the cache at all, it must be completed. */
-		return true;
-	}
+	CommitSeqNo csn;
+
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-	return false;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return XID_COMMITTED;
+	else if (COMMITSEQNO_IS_ABORTED(csn))
+		return XID_ABORTED;
+	else
+		return XID_INPROGRESS;
 }
 
 /*
@@ -252,28 +193,82 @@ TransactionIdIsKnownCompleted(TransactionId transactionId)
  *
  * "xid" is a toplevel transaction commit, and the xids array contains its
  * committed subtransactions.
- *
- * This commit operation is not guaranteed to be atomic, but if not, subxids
- * are correctly marked subcommit first.
  */
 void
 TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED,
-							   InvalidXLogRecPtr);
+	TransactionIdAsyncCommitTree(xid, nxids, xids, InvalidXLogRecPtr);
 }
 
 /*
  * TransactionIdAsyncCommitTree
- *		Same as above, but for async commits.  The commit record LSN is needed.
+ *		Same as above, but for async commits.
+ *
+ * "xid" is a toplevel transaction commit, and the xids array contains its
+ * committed subtransactions.
  */
 void
 TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 							 XLogRecPtr lsn)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED, lsn);
+	CommitSeqNo csn;
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+	/*
+	 * First update the clog, then CSN log.
+	 * oldestActiveXid advances based on CSN log content (see
+	 * AdvanceOldestActiveXid), and it should not become greater than
+	 * our xid before we set the clog status.
+	 * Otherwise other transactions could see us as aborted for some time
+	 * after we have written to CSN log, and somebody advanced the oldest
+	 * active xid past our xid, but before we write to clog.
+	 */
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_COMMITTED,
+					  lsn);
+
+	/*
+	 * Grab the CommitSeqNoLock, in shared mode. This is only used to
+	 * provide a way for a concurrent transaction to wait for us to
+	 * complete (see TransactionIdGetCommitSeqNo()).
+	 *
+	 * XXX: We could reduce the time the lock is held, by only setting
+	 * the CSN on the top-XID while holding the lock, and updating the
+	 * sub-XIDs later. But it doesn't matter much, because we're only
+	 * holding it in shared mode, and it's rare for it to be acquired
+	 * in exclusive mode.
+	 */
+	LWLockAcquire(CommitSeqNoLock, LW_SHARED);
+
+	/*
+	 * First update latestCompletedXid to cover this xid. We do this before
+	 * assigning a CSN, so that if someone acquires a new snapshot at the same
+	 * time, the xmax it computes is sure to cover our XID.
+	 */
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	/*
+	 * Mark our top transaction id as commit-in-progress.
+	 */
+	CSNLogSetCommitSeqNo(xid, 0, NULL, COMMITSEQNO_COMMITTING);
+
+	/* Get our CSN and increment */
+	csn = pg_atomic_fetch_add_u64(&ShmemVariableCache->nextCommitSeqNo, 1);
+	Assert(csn >= COMMITSEQNO_FIRST_NORMAL);
+
+	/* Stamp this XID (and sub-XIDs) with the CSN */
+	CSNLogSetCommitSeqNo(xid, nxids, xids, csn);
+
+	LWLockRelease(CommitSeqNoLock);
 }
 
 /*
@@ -289,8 +284,23 @@ TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 void
 TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	CSNLogSetCommitSeqNo(xid, nxids, xids, COMMITSEQNO_ABORTED);
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_ABORTED, InvalidCommitSeqNo);
 }
 
 /*
@@ -409,7 +419,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	 * checking TransactionLogFetch's cache will usually succeed and avoid an
 	 * extra trip to shared memory.
 	 */
-	if (TransactionIdEquals(xid, cachedFetchXid))
+	if (TransactionIdEquals(xid, cachedLSNFetchXid))
 		return cachedCommitLSN;
 
 	/* Special XIDs are always known committed */
@@ -419,7 +429,10 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	/*
 	 * Get the transaction status.
 	 */
-	(void) TransactionIdGetStatus(xid, &result);
+	(void) CLogGetStatus(xid, &result);
+
+	cachedLSNFetchXid = xid;
+	cachedCommitLSN = result;
 
 	return result;
 }
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ba03d96..e511d85 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -22,7 +22,7 @@
  *		transaction in prepared state with the same GID.
  *
  *		A global transaction (gxact) also has dummy PGXACT and PGPROC; this is
- *		what keeps the XID considered running by TransactionIdIsInProgress.
+ *		what keeps the XID considered running by the functions in procarray.c.
  *		It is also convenient as a PGPROC to hook the gxact's locks to.
  *
  *		Information to recover prepared transactions in case of crash is
@@ -78,6 +78,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -467,6 +468,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->lxid = (LocalTransactionId) xid;
 	pgxact->xid = xid;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	pgxact->delayChkpt = false;
 	pgxact->vacuumFlags = 0;
 	proc->pid = 0;
@@ -480,9 +482,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
 		SHMQueueInit(&(proc->myProcLocks[i]));
-	/* subxid data must be filled later by GXactLoadSubxactData */
-	pgxact->overflowed = false;
-	pgxact->nxids = 0;
 
 	gxact->prepared_at = prepared_at;
 	gxact->xid = xid;
@@ -500,34 +499,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 }
 
 /*
- * GXactLoadSubxactData
- *
- * If the transaction being persisted had any subtransactions, this must
- * be called before MarkAsPrepared() to load information into the dummy
- * PGPROC.
- */
-static void
-GXactLoadSubxactData(GlobalTransaction gxact, int nsubxacts,
-					 TransactionId *children)
-{
-	PGPROC	   *proc = &ProcGlobal->allProcs[gxact->pgprocno];
-	PGXACT	   *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
-
-	/* We need no extra lock since the GXACT isn't valid yet */
-	if (nsubxacts > PGPROC_MAX_CACHED_SUBXIDS)
-	{
-		pgxact->overflowed = true;
-		nsubxacts = PGPROC_MAX_CACHED_SUBXIDS;
-	}
-	if (nsubxacts > 0)
-	{
-		memcpy(proc->subxids.xids, children,
-			   nsubxacts * sizeof(TransactionId));
-		pgxact->nxids = nsubxacts;
-	}
-}
-
-/*
  * MarkAsPrepared
  *		Mark the GXACT as fully valid, and enter it into the global ProcArray.
  *
@@ -545,7 +516,7 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
 		LWLockRelease(TwoPhaseStateLock);
 
 	/*
-	 * Put it into the global ProcArray so TransactionIdIsInProgress considers
+	 * Put it into the global ProcArray so GetOldestActiveTransactionId() considers
 	 * the XID as still running.
 	 */
 	ProcArrayAdd(&ProcGlobal->allProcs[gxact->pgprocno]);
@@ -1036,8 +1007,6 @@ StartPrepare(GlobalTransaction gxact)
 	if (hdr.nsubxacts > 0)
 	{
 		save_state_data(children, hdr.nsubxacts * sizeof(TransactionId));
-		/* While we have the child-xact data, stuff it in the gxact too */
-		GXactLoadSubxactData(gxact, hdr.nsubxacts, children);
 	}
 	if (hdr.ncommitrels > 0)
 	{
@@ -1123,7 +1092,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * NB: a side effect of this is to make a dummy ProcArray entry for the
 	 * prepared XID.  This must happen before we clear the XID from MyPgXact,
 	 * else there is a window where the XID is not running according to
-	 * TransactionIdIsInProgress, and onlookers would be entitled to assume
+	 * GetOldestActiveTransactionId, and onlookers would be entitled to assume
 	 * the xact crashed.  Instead we have a window where the same XID appears
 	 * twice in ProcArray, which is OK.
 	 */
@@ -1373,7 +1342,6 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	char	   *buf;
 	char	   *bufptr;
 	TwoPhaseFileHeader *hdr;
-	TransactionId latestXid;
 	TransactionId *children;
 	RelFileNode *commitrels;
 	RelFileNode *abortrels;
@@ -1418,14 +1386,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	invalmsgs = (SharedInvalidationMessage *) bufptr;
 	bufptr += MAXALIGN(hdr->ninvalmsgs * sizeof(SharedInvalidationMessage));
 
-	/* compute latestXid among all children */
-	latestXid = TransactionIdLatest(xid, hdr->nsubxacts, children);
-
 	/*
 	 * The order of operations here is critical: make the XLOG entry for
 	 * commit or abort, then mark the transaction committed or aborted in
 	 * pg_xact, then remove its PGPROC from the global ProcArray (which means
-	 * TransactionIdIsInProgress will stop saying the prepared xact is in
+	 * GetOldestActiveTransactionId() will stop saying the prepared xact is in
 	 * progress), then run the post-commit or post-abort callbacks. The
 	 * callbacks will release the locks the transaction held.
 	 */
@@ -1440,7 +1405,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 									   hdr->nsubxacts, children,
 									   hdr->nabortrels, abortrels);
 
-	ProcArrayRemove(proc, latestXid);
+	ProcArrayRemove(proc);
 
 	/*
 	 * In case we fail while running the callbacks, mark the gxact invalid so
@@ -1926,17 +1891,17 @@ RecoverPreparedTransactions(void)
 		xid = gxact->xid;
 
 		/*
-		 * Reconstruct subtrans state for the transaction --- needed because
-		 * pg_subtrans is not preserved over a restart.  Note that we are
-		 * linking all the subtransactions directly to the top-level XID;
-		 * there may originally have been a more complex hierarchy, but
-		 * there's no need to restore that exactly. It's possible that
-		 * SubTransSetParent has been set before, if the prepared transaction
-		 * generated xid assignment records.
+		 * Reconstruct subtrans state for the transaction --- needed
+		 * because pg_csnlog is not preserved over a restart.  Note that
+		 * we are linking all the subtransactions directly to the
+		 * top-level XID; there may originally have been a more complex
+		 * hierarchy, but there's no need to restore that exactly.
+		 * It's possible that SubTransSetParent has been set before, if
+		 * the prepared transaction generated xid assignment records.
 		 */
 		buf = ProcessTwoPhaseBuffer(xid,
-									gxact->prepare_start_lsn,
-									gxact->ondisk, true, false);
+				gxact->prepare_start_lsn,
+				gxact->ondisk, true, false);
 		if (buf == NULL)
 			continue;
 
@@ -1965,7 +1930,6 @@ RecoverPreparedTransactions(void)
 		/* recovered, so reset the flag for entries generated by redo */
 		gxact->inredo = false;
 
-		GXactLoadSubxactData(gxact, hdr->nsubxacts, subxids);
 		MarkAsPrepared(gxact, true);
 
 		LWLockRelease(TwoPhaseStateLock);
@@ -2026,7 +1990,7 @@ ProcessTwoPhaseBuffer(TransactionId xid,
 		Assert(prepare_start_lsn != InvalidXLogRecPtr);
 
 	/* Already processed? */
-	if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+	if (TransactionIdGetStatus(xid) != XID_INPROGRESS)
 	{
 		if (fromdisk)
 		{
@@ -2225,7 +2189,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
-	/* Mark the transaction committed in pg_xact */
+	/* Mark the transaction committed in pg_xact and pg_csnlog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
 	/* Checkpoint can proceed now */
@@ -2263,7 +2227,7 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * Catch the scenario where we aborted partway through
 	 * RecordTransactionCommitPrepared ...
 	 */
-	if (TransactionIdDidCommit(xid))
+	if (TransactionIdGetStatus(xid) == XID_COMMITTED)
 		elog(PANIC, "cannot abort transaction %u, it was already committed",
 			 xid);
 
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 15e0559..9bbfa9d 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -15,6 +15,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/xact.h"
@@ -169,8 +171,8 @@ GetNewTransactionId(bool isSubXact)
 	 * Extend pg_subtrans and pg_commit_ts too.
 	 */
 	ExtendCLOG(xid);
+	ExtendCSNLOG(xid);
 	ExtendCommitTs(xid);
-	ExtendSUBTRANS(xid);
 
 	/*
 	 * Now advance the nextXid counter.  This must not happen until after we
@@ -200,17 +202,8 @@ GetNewTransactionId(bool isSubXact)
 	 * A solution to the atomic-store problem would be to give each PGXACT its
 	 * own spinlock used only for fetching/storing that PGXACT's xid and
 	 * related fields.
-	 *
-	 * If there's no room to fit a subtransaction XID into PGPROC, set the
-	 * cache-overflowed flag instead.  This forces readers to look in
-	 * pg_subtrans to map subtransaction XIDs up to top-level XIDs. There is a
-	 * race-condition window, in that the new XID will not appear as running
-	 * until its parent link has been placed into pg_subtrans. However, that
-	 * will happen before anyone could possibly have a reason to inquire about
-	 * the status of the XID, so it seems OK.  (Snapshots taken during this
-	 * window *will* include the parent XID, so they will deliver the correct
-	 * answer later on when someone does have a reason to inquire.)
 	 */
+	if (!isSubXact)
 	{
 		/*
 		 * Use volatile pointer to prevent code rearrangement; other backends
@@ -219,23 +212,9 @@ GetNewTransactionId(bool isSubXact)
 		 * nxids before filling the array entry.  Note we are assuming that
 		 * TransactionId and int fetch/store are atomic.
 		 */
-		volatile PGPROC *myproc = MyProc;
 		volatile PGXACT *mypgxact = MyPgXact;
 
-		if (!isSubXact)
-			mypgxact->xid = xid;
-		else
-		{
-			int			nxids = mypgxact->nxids;
-
-			if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
-			{
-				myproc->subxids.xids[nxids] = xid;
-				mypgxact->nxids = nxids + 1;
-			}
-			else
-				mypgxact->overflowed = true;
-		}
+		mypgxact->xid = xid;
 	}
 
 	LWLockRelease(XidGenLock);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b0aa69f..58f9a84 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -20,8 +20,10 @@
 #include <time.h>
 #include <unistd.h>
 
+#include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -184,11 +186,10 @@ typedef struct TransactionStateData
 	int			maxChildXids;	/* allocated size of childXids[] */
 	Oid			prevUser;		/* previous CurrentUserId setting */
 	int			prevSecContext; /* previous SecurityRestrictionContext */
-	bool		prevXactReadOnly;	/* entry-time xact r/o state */
-	bool		startedInRecovery;	/* did we start in recovery? */
-	bool		didLogXid;		/* has xid been included in WAL record? */
-	int			parallelModeLevel;	/* Enter/ExitParallelMode counter */
-	struct TransactionStateData *parent;	/* back link to parent */
+	bool		prevXactReadOnly;		/* entry-time xact r/o state */
+	bool		startedInRecovery;		/* did we start in recovery? */
+	int			parallelModeLevel;		/* Enter/ExitParallelMode counter */
+	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
 typedef TransactionStateData *TransactionState;
@@ -217,18 +218,10 @@ static TransactionStateData TopTransactionStateData = {
 	0,							/* previous SecurityRestrictionContext */
 	false,						/* entry-time xact r/o state */
 	false,						/* startedInRecovery */
-	false,						/* didLogXid */
 	0,							/* parallelMode */
 	NULL						/* link to parent state block */
 };
 
-/*
- * unreportedXids holds XIDs of all subtransactions that have not yet been
- * reported in an XLOG_XACT_ASSIGNMENT record.
- */
-static int	nUnreportedXids;
-static TransactionId unreportedXids[PGPROC_MAX_CACHED_SUBXIDS];
-
 static TransactionState CurrentTransactionState = &TopTransactionStateData;
 
 /*
@@ -312,7 +305,7 @@ static void CleanupTransaction(void);
 static void CheckTransactionChain(bool isTopLevel, bool throwError,
 					  const char *stmtType);
 static void CommitTransaction(void);
-static TransactionId RecordTransactionAbort(bool isSubXact);
+static void RecordTransactionAbort(bool isSubXact);
 static void StartTransaction(void);
 
 static void StartSubTransaction(void);
@@ -437,19 +430,6 @@ GetCurrentTransactionIdIfAny(void)
 }
 
 /*
- *	MarkCurrentTransactionIdLoggedIfAny
- *
- * Remember that the current xid - if it is assigned - now has been wal logged.
- */
-void
-MarkCurrentTransactionIdLoggedIfAny(void)
-{
-	if (TransactionIdIsValid(CurrentTransactionState->transactionId))
-		CurrentTransactionState->didLogXid = true;
-}
-
-
-/*
  *	GetStableLatestTransactionId
  *
  * Get the transaction's XID if it has one, else read the next-to-be-assigned
@@ -490,7 +470,6 @@ AssignTransactionId(TransactionState s)
 {
 	bool		isSubXact = (s->parent != NULL);
 	ResourceOwner currentOwner;
-	bool		log_unknown_top = false;
 
 	/* Assert that caller didn't screw up */
 	Assert(!TransactionIdIsValid(s->transactionId));
@@ -541,18 +520,14 @@ AssignTransactionId(TransactionState s)
 	 * superfluously log something. That can happen when an xid is included
 	 * somewhere inside a wal record, but not in XLogRecord->xl_xid, like in
 	 * xl_standby_locks.
+	 *
+	 * FIXME: didLogXid and the whole xact_assignment stuff is no more. We
+	 * no longer need it for subtransactions. Do we still need it for this
+	 * logical stuff?
 	 */
-	if (isSubXact && XLogLogicalInfoActive() &&
-		!TopTransactionStateData.didLogXid)
-		log_unknown_top = true;
 
 	/*
 	 * Generate a new Xid and record it in PG_PROC and pg_subtrans.
-	 *
-	 * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
-	 * shared storage other than PG_PROC; because if there's no room for it in
-	 * PG_PROC, the subtrans entry is needed to ensure that other backends see
-	 * the Xid as "running".  See GetNewTransactionId.
 	 */
 	s->transactionId = GetNewTransactionId(isSubXact);
 	if (!isSubXact)
@@ -587,59 +562,6 @@ AssignTransactionId(TransactionState s)
 	}
 	PG_END_TRY();
 	CurrentResourceOwner = currentOwner;
-
-	/*
-	 * Every PGPROC_MAX_CACHED_SUBXIDS assigned transaction ids within each
-	 * top-level transaction we issue a WAL record for the assignment. We
-	 * include the top-level xid and all the subxids that have not yet been
-	 * reported using XLOG_XACT_ASSIGNMENT records.
-	 *
-	 * This is required to limit the amount of shared memory required in a hot
-	 * standby server to keep track of in-progress XIDs. See notes for
-	 * RecordKnownAssignedTransactionIds().
-	 *
-	 * We don't keep track of the immediate parent of each subxid, only the
-	 * top-level transaction that each subxact belongs to. This is correct in
-	 * recovery only because aborted subtransactions are separately WAL
-	 * logged.
-	 *
-	 * This is correct even for the case where several levels above us didn't
-	 * have an xid assigned as we recursed up to them beforehand.
-	 */
-	if (isSubXact && XLogStandbyInfoActive())
-	{
-		unreportedXids[nUnreportedXids] = s->transactionId;
-		nUnreportedXids++;
-
-		/*
-		 * ensure this test matches similar one in
-		 * RecoverPreparedTransactions()
-		 */
-		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
-			log_unknown_top)
-		{
-			xl_xact_assignment xlrec;
-
-			/*
-			 * xtop is always set by now because we recurse up transaction
-			 * stack to the highest unassigned xid and then come back down
-			 */
-			xlrec.xtop = GetTopTransactionId();
-			Assert(TransactionIdIsValid(xlrec.xtop));
-			xlrec.nsubxacts = nUnreportedXids;
-
-			XLogBeginInsert();
-			XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment);
-			XLogRegisterData((char *) unreportedXids,
-							 nUnreportedXids * sizeof(TransactionId));
-
-			(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
-
-			nUnreportedXids = 0;
-			/* mark top, not current xact as having been logged */
-			TopTransactionStateData.didLogXid = true;
-		}
-	}
 }
 
 /*
@@ -1116,17 +1038,13 @@ AtSubStart_ResourceOwner(void)
 /*
  *	RecordTransactionCommit
  *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
- *
  * If you change this function, see RecordTransactionCommitPrepared also.
  */
-static TransactionId
+static void
 RecordTransactionCommit(void)
 {
 	TransactionId xid = GetTopTransactionIdIfAny();
 	bool		markXidCommitted = TransactionIdIsValid(xid);
-	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1290,7 +1208,7 @@ RecordTransactionCommit(void)
 		XLogFlush(XactLastRecEnd);
 
 		/*
-		 * Now we may update the CLOG, if we wrote a COMMIT record above
+		 * Now we may update the CLOG and CSNLOG, if we wrote a COMMIT record above
 		 */
 		if (markXidCommitted)
 			TransactionIdCommitTree(xid, nchildren, children);
@@ -1316,7 +1234,8 @@ RecordTransactionCommit(void)
 		 * flushed before the CLOG may be updated.
 		 */
 		if (markXidCommitted)
-			TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
+			TransactionIdAsyncCommitTree(xid, nchildren, children,
+										 XactLastRecEnd);
 	}
 
 	/*
@@ -1329,9 +1248,6 @@ RecordTransactionCommit(void)
 		END_CRIT_SECTION();
 	}
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
 	/*
 	 * Wait for synchronous replication, if required. Similar to the decision
 	 * above about using committing asynchronously we only want to wait if
@@ -1353,8 +1269,6 @@ cleanup:
 	/* Clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 
@@ -1522,15 +1436,11 @@ AtSubCommit_childXids(void)
 
 /*
  *	RecordTransactionAbort
- *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
  */
-static TransactionId
+static void
 RecordTransactionAbort(bool isSubXact)
 {
 	TransactionId xid = GetCurrentTransactionIdIfAny();
-	TransactionId latestXid;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1548,7 +1458,7 @@ RecordTransactionAbort(bool isSubXact)
 		/* Reset XactLastRecEnd until the next transaction writes something */
 		if (!isSubXact)
 			XactLastRecEnd = 0;
-		return InvalidTransactionId;
+		return;
 	}
 
 	/*
@@ -1611,18 +1521,6 @@ RecordTransactionAbort(bool isSubXact)
 
 	END_CRIT_SECTION();
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
-	/*
-	 * If we're aborting a subtransaction, we can immediately remove failed
-	 * XIDs from PGPROC's cache of running child XIDs.  We do that here for
-	 * subxacts, because we already have the child XID array at hand.  For
-	 * main xacts, the equivalent happens just after this function returns.
-	 */
-	if (isSubXact)
-		XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
-
 	/* Reset XactLastRecEnd until the next transaction writes something */
 	if (!isSubXact)
 		XactLastRecEnd = 0;
@@ -1630,8 +1528,6 @@ RecordTransactionAbort(bool isSubXact)
 	/* And clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 /*
@@ -1858,12 +1754,6 @@ StartTransaction(void)
 	currentCommandIdUsed = false;
 
 	/*
-	 * initialize reported xid accounting
-	 */
-	nUnreportedXids = 0;
-	s->didLogXid = false;
-
-	/*
 	 * must initialize resource-management stuff first
 	 */
 	AtStart_Memory();
@@ -1940,7 +1830,6 @@ static void
 CommitTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	is_parallel_worker = (s->blockState == TBLOCK_PARALLEL_INPROGRESS);
@@ -2040,17 +1929,11 @@ CommitTransaction(void)
 		 * We need to mark our XIDs as committed in pg_xact.  This is where we
 		 * durably commit.
 		 */
-		latestXid = RecordTransactionCommit();
+		RecordTransactionCommit();
 	}
 	else
 	{
 		/*
-		 * We must not mark our XID committed; the parallel master is
-		 * responsible for that.
-		 */
-		latestXid = InvalidTransactionId;
-
-		/*
 		 * Make sure the master will know about any WAL we wrote before it
 		 * commits.
 		 */
@@ -2064,7 +1947,7 @@ CommitTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionCommit.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * This is all post-commit cleanup.  Note that if an error is raised here,
@@ -2448,7 +2331,6 @@ static void
 AbortTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	/* Prevent cancel/die interrupt while cleaning up */
@@ -2553,11 +2435,9 @@ AbortTransaction(void)
 	 * record.
 	 */
 	if (!is_parallel_worker)
-		latestXid = RecordTransactionAbort(false);
+		RecordTransactionAbort(false);
 	else
 	{
-		latestXid = InvalidTransactionId;
-
 		/*
 		 * Since the parallel master won't get our value of XactLastRecEnd in
 		 * this case, we nudge WAL-writer ourselves in this case.  See related
@@ -2573,7 +2453,7 @@ AbortTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionAbort.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * Post-abort cleanup.  See notes in CommitTransaction() concerning
@@ -5385,9 +5265,12 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 	if (standbyState == STANDBY_DISABLED)
 	{
 		/*
-		 * Mark the transaction committed in pg_xact.
+		 * Mark the transaction committed in pg_xact. We don't bother updating
+		 * pg_csnlog during replay.
 		 */
-		TransactionIdCommitTree(xid, parsed->nsubxacts, parsed->subxacts);
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_COMMITTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5411,14 +5294,7 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * bits set on changes made by transactions that haven't yet
 		 * recovered. It's unlikely but it's good to be safe.
 		 */
-		TransactionIdAsyncCommitTree(
-									 xid, parsed->nsubxacts, parsed->subxacts, lsn);
-
-		/*
-		 * We must mark clog before we update the ProcArray.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
+		TransactionIdAsyncCommitTree(xid, parsed->nsubxacts, parsed->subxacts, lsn);
 
 		/*
 		 * Send any cache invalidations attached to the commit. We must
@@ -5543,8 +5419,13 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 
 	if (standbyState == STANDBY_DISABLED)
 	{
-		/* Mark the transaction aborted in pg_xact, no need for async stuff */
-		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
+		/*
+		 * Mark the transaction aborted in pg_xact, no need for async stuff or
+		 * to update pg_csnlog.
+		 */
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_ABORTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5563,12 +5444,6 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
 
 		/*
-		 * We must update the ProcArray after we have marked clog.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
-
-		/*
 		 * There are no flat files that need updating, nor invalidation
 		 * messages to send or undo.
 		 */
@@ -5657,14 +5532,6 @@ xact_redo(XLogReaderState *record)
 					   record->EndRecPtr);
 		LWLockRelease(TwoPhaseStateLock);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) XLogRecGetData(record);
-
-		if (standbyState >= STANDBY_INITIALIZED)
-			ProcArrayApplyXidAssignment(xlrec->xtop,
-										xlrec->nsubxacts, xlrec->xsub);
-	}
 	else
 		elog(PANIC, "xact_redo: unknown op code %u", info);
 }
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3654543..77f0af7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,7 +24,9 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
@@ -1099,8 +1101,6 @@ XLogInsertRecord(XLogRecData *rdata,
 	 */
 	WALInsertLockRelease();
 
-	MarkCurrentTransactionIdLoggedIfAny();
-
 	END_CRIT_SECTION();
 
 	/*
@@ -4961,6 +4961,7 @@ BootStrapXLOG(void)
 	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Select a hopefully-unique system identifier code for this installation.
@@ -5026,6 +5027,13 @@ BootStrapXLOG(void)
 	ShmemVariableCache->nextXid = checkPoint.nextXid;
 	ShmemVariableCache->nextOid = checkPoint.nextOid;
 	ShmemVariableCache->oidCount = 0;
+
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 	AdvanceOldestClogXid(checkPoint.oldestXid);
 	SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
@@ -5124,8 +5132,8 @@ BootStrapXLOG(void)
 
 	/* Bootstrap the commit log, too */
 	BootStrapCLOG();
+	BootStrapCSNLOG();
 	BootStrapCommitTs();
-	BootStrapSUBTRANS();
 	BootStrapMultiXact();
 
 	pfree(buffer);
@@ -6213,6 +6221,7 @@ StartupXLOG(void)
 	XLogPageReadPrivate private;
 	bool		fast_promoted = false;
 	struct stat st;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Read control file and check XLOG status looks valid.
@@ -6642,6 +6651,12 @@ StartupXLOG(void)
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
 
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	/*
 	 * Initialize replication slots, before there's a chance to remove
 	 * required resources.
@@ -6894,15 +6909,15 @@ StartupXLOG(void)
 			Assert(TransactionIdIsValid(oldestActiveXID));
 
 			/* Tell procarray about the range of xids it has to deal with */
-			ProcArrayInitRecovery(ShmemVariableCache->nextXid);
+			ProcArrayInitRecovery(oldestActiveXID, ShmemVariableCache->nextXid);
 
 			/*
-			 * Startup commit log and subtrans only.  MultiXact and commit
+			 * Startup commit log and csnlog only.  MultiXact and commit
 			 * timestamp have already been started up and other SLRUs are not
 			 * maintained during recovery and need not be started yet.
 			 */
 			StartupCLOG();
-			StartupSUBTRANS(oldestActiveXID);
+			StartupCSNLOG(oldestActiveXID);
 
 			/*
 			 * If we're beginning at a shutdown checkpoint, we know that
@@ -6913,7 +6928,6 @@ StartupXLOG(void)
 			if (wasShutdown)
 			{
 				RunningTransactionsData running;
-				TransactionId latestCompletedXid;
 
 				/*
 				 * Construct a RunningTransactions snapshot representing a
@@ -6921,16 +6935,8 @@ StartupXLOG(void)
 				 * alive. We're never overflowed at this point because all
 				 * subxids are listed with their parent prepared transactions.
 				 */
-				running.xcnt = nxids;
-				running.subxcnt = 0;
-				running.subxid_overflow = false;
 				running.nextXid = checkPoint.nextXid;
 				running.oldestRunningXid = oldestActiveXID;
-				latestCompletedXid = checkPoint.nextXid;
-				TransactionIdRetreat(latestCompletedXid);
-				Assert(TransactionIdIsNormal(latestCompletedXid));
-				running.latestCompletedXid = latestCompletedXid;
-				running.xids = xids;
 
 				ProcArrayApplyRecoveryInfo(&running);
 
@@ -7674,20 +7680,22 @@ StartupXLOG(void)
 	XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 	XLogCtl->lastSegSwitchLSN = EndOfLog;
 
-	/* also initialize latestCompletedXid, to nextXid - 1 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
-	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
-	LWLockRelease(ProcArrayLock);
+	/* also initialize latestCompletedXid, to nextXid - 1, and oldestActiveXid */
+	latestCompletedXid = ShmemVariableCache->nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid,
+						latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid,
+						oldestActiveXID);
 
 	/*
-	 * Start up the commit log and subtrans, if not already done for hot
+	 * Start up the commit log and csnlog, if not already done for hot
 	 * standby.  (commit timestamps are started below, if necessary.)
 	 */
 	if (standbyState == STANDBY_DISABLED)
 	{
 		StartupCLOG();
-		StartupSUBTRANS(oldestActiveXID);
+		StartupCSNLOG(oldestActiveXID);
 	}
 
 	/*
@@ -8356,8 +8364,8 @@ ShutdownXLOG(int code, Datum arg)
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	}
 	ShutdownCLOG();
+	ShutdownCSNLOG();
 	ShutdownCommitTs();
-	ShutdownSUBTRANS();
 	ShutdownMultiXact();
 }
 
@@ -8927,14 +8935,14 @@ CreateCheckPoint(int flags)
 		PreallocXlogFiles(recptr);
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  During recovery, though, we mustn't do this because
-	 * StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  During recovery, though, we mustn't do this because
+	 * StartupCSNLOG hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -9010,13 +9018,12 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointCLOG();
+	CheckPointCSNLOG();
 	CheckPointCommitTs();
-	CheckPointSUBTRANS();
 	CheckPointMultiXact();
 	CheckPointPredicate();
 	CheckPointRelationMap();
 	CheckPointReplicationSlots();
-	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointBuffers(flags);	/* performs all required fsyncs */
 	CheckPointReplicationOrigin();
@@ -9290,14 +9297,14 @@ CreateRestartPoint(int flags)
 	}
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  When hot standby is disabled, though, we mustn't do
-	 * this because StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  When hot standby is disabled, though, we mustn't do
+	 * this because StartupCSNLOG hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
@@ -9684,7 +9691,6 @@ xlog_redo(XLogReaderState *record)
 			TransactionId *xids;
 			int			nxids;
 			TransactionId oldestActiveXID;
-			TransactionId latestCompletedXid;
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
@@ -9695,16 +9701,8 @@ xlog_redo(XLogReaderState *record)
 			 * never overflowed at this point because all subxids are listed
 			 * with their parent prepared transactions.
 			 */
-			running.xcnt = nxids;
-			running.subxcnt = 0;
-			running.subxid_overflow = false;
 			running.nextXid = checkPoint.nextXid;
 			running.oldestRunningXid = oldestActiveXID;
-			latestCompletedXid = checkPoint.nextXid;
-			TransactionIdRetreat(latestCompletedXid);
-			Assert(TransactionIdIsNormal(latestCompletedXid));
-			running.latestCompletedXid = latestCompletedXid;
-			running.xids = xids;
 
 			ProcArrayApplyRecoveryInfo(&running);
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index a376b99..93962c4 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -69,6 +69,7 @@
 #include "parser/parse_relation.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -893,7 +894,7 @@ AddNewRelationTuple(Relation pg_class_desc,
 		 * We know that no xacts older than RecentXmin are still running, so
 		 * that will do.
 		 */
-		new_rel_reltup->relfrozenxid = RecentXmin;
+		new_rel_reltup->relfrozenxid = GetOldestActiveTransactionId();
 
 		/*
 		 * Similarly, initialize the minimum Multixact to the first value that
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index bacc08e..3291212 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1928,27 +1928,21 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 		/* Ignore messages destined for other databases */
 		if (qe->dboid == MyDatabaseId)
 		{
-			if (TransactionIdIsInProgress(qe->xid))
+			TransactionIdStatus xidstatus = TransactionIdGetStatus(qe->xid);
+
+			if (xidstatus == XID_INPROGRESS)
 			{
 				/*
 				 * The source transaction is still in progress, so we can't
 				 * process this message yet.  Break out of the loop, but first
 				 * back up *current so we will reprocess the message next
-				 * time.  (Note: it is unlikely but not impossible for
-				 * TransactionIdDidCommit to fail, so we can't really avoid
-				 * this advance-then-back-up behavior when dealing with an
-				 * uncommitted message.)
-				 *
-				 * Note that we must test TransactionIdIsInProgress before we
-				 * test TransactionIdDidCommit, else we might return a message
-				 * from a transaction that is not yet visible to snapshots;
-				 * compare the comments at the head of tqual.c.
+				 * time.
 				 */
 				*current = thisentry;
 				reachedStop = true;
 				break;
 			}
-			else if (TransactionIdDidCommit(qe->xid))
+			else if (xidstatus == XID_COMMITTED)
 			{
 				/* qe->data is the null-terminated channel name */
 				char	   *channel = qe->data;
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 7d57f97..d1b2bd2 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -33,6 +33,7 @@
 #include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -841,7 +842,8 @@ static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
-					 RecentXmin, ReadNextMultiXactId(), relpersistence);
+					 GetOldestActiveTransactionId(), ReadNextMultiXactId(),
+					 relpersistence);
 }
 
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index bb00858..f24dd4c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -87,6 +87,7 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -1422,7 +1423,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			 * deletion at commit.
 			 */
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-									  RecentXmin, minmulti);
+									  GetOldestActiveTransactionId(), minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 				heap_create_init_fork(rel);
 
@@ -1436,7 +1437,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			{
 				rel = relation_open(toast_relid, AccessExclusiveLock);
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-										  RecentXmin, minmulti);
+										  GetOldestActiveTransactionId(), minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 					heap_create_init_fork(rel);
 				heap_close(rel, NoLock);
@@ -4231,7 +4232,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
-							 RecentXmin,
+							 GetOldestActiveTransactionId(),
 							 ReadNextMultiXactId(),
 							 persistence);
 		}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 486fd0c..23d3640 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -165,7 +165,6 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 static void
 DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 {
-	SnapBuild  *builder = ctx->snapshot_builder;
 	uint8		info = XLogRecGetInfo(buf->record) & ~XLR_INFO_MASK;
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(buf->record),
@@ -176,8 +175,6 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* this is also used in END_OF_RECOVERY checkpoints */
 		case XLOG_CHECKPOINT_SHUTDOWN:
 		case XLOG_END_OF_RECOVERY:
-			SnapBuildSerializationPoint(builder, buf->origptr);
-
 			break;
 		case XLOG_CHECKPOINT_ONLINE:
 
@@ -217,8 +214,11 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * ok not to call ReorderBufferProcessXid() in that case, except in the
 	 * assignment case there'll not be any later records with the same xid;
 	 * and in the assignment case we'll not decode those xacts.
+	 *
+	 * FIXME: the assignment record is no more. I don't understand the above
+	 * comment. Can it be just removed?
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -259,23 +259,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeAbort(ctx, buf, &parsed, xid);
 				break;
 			}
-		case XLOG_XACT_ASSIGNMENT:
-			{
-				xl_xact_assignment *xlrec;
-				int			i;
-				TransactionId *sub_xid;
-
-				xlrec = (xl_xact_assignment *) XLogRecGetData(r);
-
-				sub_xid = &xlrec->xsub[0];
-
-				for (i = 0; i < xlrec->nsubxacts; i++)
-				{
-					ReorderBufferAssignChild(reorder, xlrec->xtop,
-											 *(sub_xid++), buf->origptr);
-				}
-				break;
-			}
 		case XLOG_XACT_PREPARE:
 
 			/*
@@ -354,7 +337,7 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -409,7 +392,7 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -502,7 +485,7 @@ DecodeLogicalMsgOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
 	/* No point in doing anything yet. */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 85721ea..a98ce00 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -113,7 +113,7 @@ CheckLogicalDecodingRequirements(void)
 static LogicalDecodingContext *
 StartupDecodingContext(List *output_plugin_options,
 					   XLogRecPtr start_lsn,
-					   TransactionId xmin_horizon,
+
 					   bool need_full_snapshot,
 					   XLogPageReadCB read_page,
 					   LogicalOutputPluginWriterPrepareWrite prepare_write,
@@ -173,7 +173,7 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
-		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
+		AllocateSnapshotBuilder(ctx->reorder, start_lsn,
 								need_full_snapshot);
 
 	ctx->reorder->private_data = ctx;
@@ -302,7 +302,7 @@ CreateInitDecodingContext(char *plugin,
 	ReplicationSlotMarkDirty();
 	ReplicationSlotSave();
 
-	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr, xmin_horizon,
+	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr,
 								 need_full_snapshot, read_page, prepare_write,
 								 do_write, update_progress);
 
@@ -394,10 +394,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	}
 
 	ctx = StartupDecodingContext(output_plugin_options,
-								 start_lsn, InvalidTransactionId, false,
+								 start_lsn, false,
 								 read_page, prepare_write, do_write,
 								 update_progress);
-
 	/* call output plugin initialization callback */
 	old_context = MemoryContextSwitchTo(ctx->context);
 	if (ctx->callbacks.startup_cb != NULL)
@@ -777,12 +776,12 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 }
 
 /*
- * Set the required catalog xmin horizon for historic snapshots in the current
- * replication slot.
+ * Set the oldest snapshot required for historic catalog lookups in the
+ * current replication slot.
  *
- * Note that in the most cases, we won't be able to immediately use the xmin
- * to increase the xmin horizon: we need to wait till the client has confirmed
- * receiving current_lsn with LogicalConfirmReceivedLocation().
+ * Note that in the most cases, we won't be able to immediately use the
+ * snapshot to increase the oldest snapshot, we need to wait till the client
+ * has confirmed receiving current_lsn with LogicalConfirmReceivedLocation().
  */
 void
 LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5567bee..7dca8f2 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1236,7 +1236,6 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	Size		size;
 
 	size = sizeof(SnapshotData) +
-		sizeof(TransactionId) * orig_snap->xcnt +
 		sizeof(TransactionId) * (txn->nsubtxns + 1);
 
 	snap = MemoryContextAllocZero(rb->context, size);
@@ -1245,36 +1244,33 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	snap->copied = true;
 	snap->active_count = 1;		/* mark as active so nobody frees it */
 	snap->regd_count = 0;
-	snap->xip = (TransactionId *) (snap + 1);
-
-	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
 
 	/*
 	 * snap->subxip contains all txids that belong to our transaction which we
 	 * need to check via cmin/cmax. That's why we store the toplevel
 	 * transaction in there as well.
 	 */
-	snap->subxip = snap->xip + snap->xcnt;
-	snap->subxip[i++] = txn->xid;
+	snap->this_xip = (TransactionId *) (snap + 1);
+	snap->this_xip[i++] = txn->xid;
 
 	/*
 	 * nsubxcnt isn't decreased when subtransactions abort, so count manually.
 	 * Since it's an upper boundary it is safe to use it for the allocation
 	 * above.
 	 */
-	snap->subxcnt = 1;
+	snap->this_xcnt = 1;
 
 	dlist_foreach(iter, &txn->subtxns)
 	{
 		ReorderBufferTXN *sub_txn;
 
 		sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
-		snap->subxip[i++] = sub_txn->xid;
-		snap->subxcnt++;
+		snap->this_xip[i++] = sub_txn->xid;
+		snap->this_xcnt++;
 	}
 
 	/* sort so we can bsearch() later */
-	qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+	qsort(snap->this_xip, snap->this_xcnt, sizeof(TransactionId), xidComparator);
 
 	/* store the specified current CommandId */
 	snap->curcid = cid;
@@ -1346,6 +1342,7 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 	}
 
 	snapshot_now = txn->base_snapshot;
+	Assert(snapshot_now->snapshotcsn != InvalidCommitSeqNo);
 
 	/* build data to be able to lookup the CommandIds of catalog tuples */
 	ReorderBufferBuildTupleCidHash(rb, txn);
@@ -2238,10 +2235,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
-					sizeof(TransactionId) * snap->xcnt +
-					sizeof(TransactionId) * snap->subxcnt
-					;
+				sz += sizeof(SnapshotData);
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
@@ -2251,20 +2245,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
-
-				if (snap->xcnt)
-				{
-					memcpy(data, snap->xip,
-						   sizeof(TransactionId) * snap->xcnt);
-					data += sizeof(TransactionId) * snap->xcnt;
-				}
-
-				if (snap->subxcnt)
-				{
-					memcpy(data, snap->subxip,
-						   sizeof(TransactionId) * snap->subxcnt);
-					data += sizeof(TransactionId) * snap->subxcnt;
-				}
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
@@ -2530,24 +2510,16 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	oldsnap;
 				Snapshot	newsnap;
 				Size		size;
 
-				oldsnap = (Snapshot) data;
-
-				size = sizeof(SnapshotData) +
-					sizeof(TransactionId) * oldsnap->xcnt +
-					sizeof(TransactionId) * (oldsnap->subxcnt + 0);
+				size = sizeof(SnapshotData);
 
 				change->data.snapshot = MemoryContextAllocZero(rb->context, size);
 
 				newsnap = change->data.snapshot;
 
 				memcpy(newsnap, data, size);
-				newsnap->xip = (TransactionId *)
-					(((char *) newsnap) + sizeof(SnapshotData));
-				newsnap->subxip = newsnap->xip + newsnap->xcnt;
 				newsnap->copied = true;
 				break;
 			}
@@ -3199,7 +3171,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 			continue;
 
 		/* not for our transaction */
-		if (!TransactionIdInArray(f_mapped_xid, snapshot->subxip, snapshot->subxcnt))
+		if (!TransactionIdInArray(f_mapped_xid, snapshot->this_xip, snapshot->this_xcnt))
 			continue;
 
 		/* ok, relevant, queue for apply */
@@ -3227,7 +3199,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 		RewriteMappingFile *f = files_a[off];
 
 		elog(DEBUG1, "applying mapping: \"%s\" in %u", f->fname,
-			 snapshot->subxip[0]);
+			 snapshot->this_xip[0]);
 		ApplyLogicalMappingFile(tuplecid_data, relid, f->fname);
 		pfree(f);
 	}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 281b7ab..580d45b 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -164,17 +164,15 @@ struct SnapBuild
 	/* all transactions >= than this are uncommitted */
 	TransactionId xmax;
 
+	/* this determines the state of transactions between xmin and xmax */
+	CommitSeqNo snapshotcsn;
+
 	/*
 	 * Don't replay commits from an LSN < this LSN. This can be set externally
 	 * but it will also be advanced (never retreat) from within snapbuild.c.
 	 */
 	XLogRecPtr	start_decoding_at;
 
-	/*
-	 * Don't start decoding WAL until the "xl_running_xacts" information
-	 * indicates there are no running xids with an xid smaller than this.
-	 */
-	TransactionId initial_xmin_horizon;
 
 	/* Indicates if we are building full snapshot or just catalog one. */
 	bool		building_full_snapshot;
@@ -185,70 +183,9 @@ struct SnapBuild
 	Snapshot	snapshot;
 
 	/*
-	 * LSN of the last location we are sure a snapshot has been serialized to.
-	 */
-	XLogRecPtr	last_serialized_snapshot;
-
-	/*
 	 * The reorderbuffer we need to update with usable snapshots et al.
 	 */
 	ReorderBuffer *reorder;
-
-	/*
-	 * Outdated: This struct isn't used for its original purpose anymore, but
-	 * can't be removed / changed in a minor version, because it's stored
-	 * on-disk.
-	 */
-	struct
-	{
-		/*
-		 * NB: This field is misused, until a major version can break on-disk
-		 * compatibility. See SnapBuildNextPhaseAt() /
-		 * SnapBuildStartNextPhaseAt().
-		 */
-		TransactionId was_xmin;
-		TransactionId was_xmax;
-
-		size_t		was_xcnt;	/* number of used xip entries */
-		size_t		was_xcnt_space; /* allocated size of xip */
-		TransactionId *was_xip; /* running xacts array, xidComparator-sorted */
-	}			was_running;
-
-	/*
-	 * Array of transactions which could have catalog changes that committed
-	 * between xmin and xmax.
-	 */
-	struct
-	{
-		/* number of committed transactions */
-		size_t		xcnt;
-
-		/* available space for committed transactions */
-		size_t		xcnt_space;
-
-		/*
-		 * Until we reach a CONSISTENT state, we record commits of all
-		 * transactions, not just the catalog changing ones. Record when that
-		 * changes so we know we cannot export a snapshot safely anymore.
-		 */
-		bool		includes_all_transactions;
-
-		/*
-		 * Array of committed transactions that have modified the catalog.
-		 *
-		 * As this array is frequently modified we do *not* keep it in
-		 * xidComparator order. Instead we sort the array when building &
-		 * distributing a snapshot.
-		 *
-		 * TODO: It's unclear whether that reasoning has much merit. Every
-		 * time we add something here after becoming consistent will also
-		 * require distributing a snapshot. Storing them sorted would
-		 * potentially also make it easier to purge (but more complicated wrt
-		 * wraparound?). Should be improved if sorting while building the
-		 * snapshot shows up in profiles.
-		 */
-		TransactionId *xip;
-	}			committed;
 };
 
 /*
@@ -258,9 +195,6 @@ struct SnapBuild
 static ResourceOwner SavedResourceOwnerDuringExport = NULL;
 static bool ExportInProgress = false;
 
-/* ->committed manipulation */
-static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
-
 /* snapshot building/manipulation/distribution functions */
 static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
@@ -270,41 +204,6 @@ static void SnapBuildSnapIncRefcount(Snapshot snap);
 
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
-/* xlog reading helper functions for SnapBuildProcessRecord */
-static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
-static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff);
-
-/* serialization functions */
-static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
-static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
-
-/*
- * Return TransactionId after which the next phase of initial snapshot
- * building will happen.
- */
-static inline TransactionId
-SnapBuildNextPhaseAt(SnapBuild *builder)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	return builder->was_running.was_xmax;
-}
-
-/*
- * Set TransactionId after which the next phase of initial snapshot building
- * will happen.
- */
-static inline void
-SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	builder->was_running.was_xmax = at;
-}
 
 /*
  * Allocate a new snapshot builder.
@@ -314,7 +213,6 @@ SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
  */
 SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
-						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
 						bool need_full_snapshot)
 {
@@ -335,13 +233,6 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->reorder = reorder;
 	/* Other struct members initialized by zeroing via palloc0 above */
 
-	builder->committed.xcnt = 0;
-	builder->committed.xcnt_space = 128;	/* arbitrary number */
-	builder->committed.xip =
-		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
-	builder->committed.includes_all_transactions = true;
-
-	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
 
@@ -380,7 +271,6 @@ SnapBuildFreeSnapshot(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 	Assert(snap->regd_count == 0);
 
@@ -438,7 +328,6 @@ SnapBuildSnapDecRefcount(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 
 	Assert(snap->regd_count == 0);
@@ -468,10 +357,9 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	Snapshot	snapshot;
 	Size		ssize;
 
-	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+	Assert(builder->state >= SNAPBUILD_CONSISTENT);
 
 	ssize = sizeof(SnapshotData)
-		+ sizeof(TransactionId) * builder->committed.xcnt
 		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
 
 	snapshot = MemoryContextAllocZero(builder->context, ssize);
@@ -479,52 +367,34 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->satisfies = HeapTupleSatisfiesHistoricMVCC;
 
 	/*
-	 * We misuse the original meaning of SnapshotData's xip and subxip fields
-	 * to make the more fitting for our needs.
-	 *
-	 * In the 'xip' array we store transactions that have to be treated as
-	 * committed. Since we will only ever look at tuples from transactions
-	 * that have modified the catalog it's more efficient to store those few
-	 * that exist between xmin and xmax (frequently there are none).
-	 *
 	 * Snapshots that are used in transactions that have modified the catalog
-	 * also use the 'subxip' array to store their toplevel xid and all the
+	 * use the 'this_xip' array to store their toplevel xid and all the
 	 * subtransaction xids so we can recognize when we need to treat rows as
-	 * visible that are not in xip but still need to be visible. Subxip only
+	 * visible that would not normally be visible by the CSN test. this_xip only
 	 * gets filled when the transaction is copied into the context of a
 	 * catalog modifying transaction since we otherwise share a snapshot
 	 * between transactions. As long as a txn hasn't modified the catalog it
 	 * doesn't need to treat any uncommitted rows as visible, so there is no
 	 * need for those xids.
 	 *
-	 * Both arrays are qsort'ed so that we can use bsearch() on them.
+	 * this_xip array is qsort'ed so that we can use bsearch() on them.
 	 */
 	Assert(TransactionIdIsNormal(builder->xmin));
 	Assert(TransactionIdIsNormal(builder->xmax));
+	Assert(builder->snapshotcsn != InvalidCommitSeqNo);
 
 	snapshot->xmin = builder->xmin;
 	snapshot->xmax = builder->xmax;
-
-	/* store all transactions to be treated as committed by this snapshot */
-	snapshot->xip =
-		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
-	snapshot->xcnt = builder->committed.xcnt;
-	memcpy(snapshot->xip,
-		   builder->committed.xip,
-		   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+	snapshot->snapshotcsn = builder->snapshotcsn;
 
 	/*
-	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
+	 * Initially, this_xip is empty, i.e. it's a snapshot to be used by
 	 * transactions that don't modify the catalog. Will be filled by
 	 * ReorderBufferCopySnap() if necessary.
 	 */
-	snapshot->subxcnt = 0;
-	snapshot->subxip = NULL;
+	snapshot->this_xcnt = 0;
+	snapshot->this_xip = NULL;
 
-	snapshot->suboverflowed = false;
 	snapshot->takenDuringRecovery = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
@@ -545,9 +415,6 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -555,9 +422,6 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (builder->state != SNAPBUILD_CONSISTENT)
 		elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state");
 
-	if (!builder->committed.includes_all_transactions)
-		elog(ERROR, "cannot build an initial slot snapshot, not all transactions are monitored anymore");
-
 	/* so we don't overwrite the existing value */
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
@@ -569,56 +433,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * mechanism. Due to that we can do this without locks, we're only
 	 * changing our own value.
 	 */
-#ifdef USE_ASSERT_CHECKING
-	{
-		TransactionId safeXid;
-
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
-		safeXid = GetOldestSafeDecodingTransactionId(true);
-		LWLockRelease(ProcArrayLock);
-
-		Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin));
-	}
-#endif
-
-	MyPgXact->xmin = snap->xmin;
-
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
-
-	/*
-	 * snapbuild.c builds transactions in an "inverted" manner, which means it
-	 * stores committed transactions in ->xip, not ones in progress. Build a
-	 * classical snapshot by marking all non-committed transactions as
-	 * in-progress. This can be expensive.
-	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
-	{
-		void	   *test;
-
-		/*
-		 * Check whether transaction committed using the decoding snapshot
-		 * meaning of ->xip.
-		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
-					   sizeof(TransactionId), xidComparator);
-
-		if (test == NULL)
-		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-				ereport(ERROR,
-						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-						 errmsg("initial slot snapshot too large")));
-
-			newxip[newxcnt++] = xid;
-		}
-
-		TransactionIdAdvance(xid);
-	}
-
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	MyPgXact->snapshotcsn = snap->snapshotcsn;
 
 	return snap;
 }
@@ -661,10 +476,10 @@ SnapBuildExportSnapshot(SnapBuild *builder)
 	snapname = ExportSnapshot(snap);
 
 	ereport(LOG,
-			(errmsg_plural("exported logical decoding snapshot: \"%s\" with %u transaction ID",
-						   "exported logical decoding snapshot: \"%s\" with %u transaction IDs",
-						   snap->xcnt,
-						   snapname, snap->xcnt)));
+			(errmsg("exported logical decoding snapshot: \"%s\" at %X/%X",
+					snapname,
+					(uint32) (snap->snapshotcsn >> 32),
+					(uint32) snap->snapshotcsn)));
 	return snapname;
 }
 
@@ -722,16 +537,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 	 * We can't handle data in transactions if we haven't built a snapshot
 	 * yet, so don't store them.
 	 */
-	if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-		return false;
-
-	/*
-	 * No point in keeping track of changes in transactions that we don't have
-	 * enough information about to decode. This means that they started before
-	 * we got into the SNAPBUILD_FULL_SNAPSHOT state.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT &&
-		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+	if (builder->state < SNAPBUILD_CONSISTENT)
 		return false;
 
 	/*
@@ -851,76 +657,6 @@ SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 }
 
 /*
- * Keep track of a new catalog changing transaction that has committed.
- */
-static void
-SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	if (builder->committed.xcnt == builder->committed.xcnt_space)
-	{
-		builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
-
-		elog(DEBUG1, "increasing space for committed transactions to %u",
-			 (uint32) builder->committed.xcnt_space);
-
-		builder->committed.xip = repalloc(builder->committed.xip,
-										  builder->committed.xcnt_space * sizeof(TransactionId));
-	}
-
-	/*
-	 * TODO: It might make sense to keep the array sorted here instead of
-	 * doing it every time we build a new snapshot. On the other hand this
-	 * gets called repeatedly when a transaction with subtransactions commits.
-	 */
-	builder->committed.xip[builder->committed.xcnt++] = xid;
-}
-
-/*
- * Remove knowledge about transactions we treat as committed that are smaller
- * than ->xmin. Those won't ever get checked via the ->committed array but via
- * the clog machinery, so we don't need to waste memory on them.
- */
-static void
-SnapBuildPurgeCommittedTxn(SnapBuild *builder)
-{
-	int			off;
-	TransactionId *workspace;
-	int			surviving_xids = 0;
-
-	/* not ready yet */
-	if (!TransactionIdIsNormal(builder->xmin))
-		return;
-
-	/* TODO: Neater algorithm than just copying and iterating? */
-	workspace =
-		MemoryContextAlloc(builder->context,
-						   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* copy xids that still are interesting to workspace */
-	for (off = 0; off < builder->committed.xcnt; off++)
-	{
-		if (NormalTransactionIdPrecedes(builder->committed.xip[off],
-										builder->xmin))
-			;					/* remove */
-		else
-			workspace[surviving_xids++] = builder->committed.xip[off];
-	}
-
-	/* copy workspace back to persistent state */
-	memcpy(builder->committed.xip, workspace,
-		   surviving_xids * sizeof(TransactionId));
-
-	elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
-		 (uint32) builder->committed.xcnt, (uint32) surviving_xids,
-		 builder->xmin, builder->xmax);
-	builder->committed.xcnt = surviving_xids;
-
-	pfree(workspace);
-}
-
-/*
  * Handle everything that needs to be done when a transaction commits
  */
 void
@@ -929,26 +665,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 {
 	int			nxact;
 
-	bool		needs_snapshot = false;
-	bool		needs_timetravel = false;
-	bool		sub_needs_timetravel = false;
+	bool		forced_timetravel = false;
 
-	TransactionId xmax = xid;
+	TransactionId xmax;
 
 	/*
-	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
-	 * will they be part of a snapshot.  So we don't need to record anything.
+	 * If we couldn't observe every change of a transaction because it was
+	 * already running at the point we started to observe we have to assume it
+	 * made catalog changes.
+	 *
+	 * This has the positive benefit that we afterwards have enough
+	 * information to build an exportable snapshot that's usable by pg_dump et
+	 * al.
 	 */
-	if (builder->state == SNAPBUILD_START ||
-		(builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-		 TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder))))
-	{
-		/* ensure that only commits after this are getting replayed */
-		if (builder->start_decoding_at <= lsn)
-			builder->start_decoding_at = lsn + 1;
-		return;
-	}
-
 	if (builder->state < SNAPBUILD_CONSISTENT)
 	{
 		/* ensure that only commits after this are getting replayed */
@@ -956,104 +685,45 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			builder->start_decoding_at = lsn + 1;
 
 		/*
-		 * If building an exportable snapshot, force xid to be tracked, even
-		 * if the transaction didn't modify the catalog.
+		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
+		 * timetravel ones, but we want to be able to export a snapshot when
+		 * we reached consistency.
 		 */
-		if (builder->building_full_snapshot)
-		{
-			needs_timetravel = true;
-		}
+		forced_timetravel = true;
+		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
 	}
 
+	xmax = builder->xmax;
+
+	if (NormalTransactionIdFollows(xid, xmax))
+		xmax = xid;
+	if (!forced_timetravel)
+	{
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
+			forced_timetravel = true;
+	}
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
 
-		/*
-		 * Add subtransaction to base snapshot if catalog modifying, we don't
-		 * distinguish to toplevel transactions there.
-		 */
-		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
-		{
-			sub_needs_timetravel = true;
-			needs_snapshot = true;
-
-			elog(DEBUG1, "found subtransaction %u:%u with catalog changes",
-				 xid, subxid);
-
-			SnapBuildAddCommittedTxn(builder, subxid);
+		if (NormalTransactionIdFollows(subxid, xmax))
+			xmax = subxid;
 
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
-		}
-
-		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state, even
-		 * if not catalog modifying.  Don't need to distribute a snapshot in
-		 * that case.
-		 */
-		else if (needs_timetravel)
+		if (!forced_timetravel)
 		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
+			if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+				forced_timetravel = true;
 		}
 	}
 
-	/* if top-level modified catalog, it'll need a snapshot */
-	if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
-	{
-		elog(DEBUG2, "found top level transaction %u, with catalog changes",
-			 xid);
-		needs_snapshot = true;
-		needs_timetravel = true;
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (sub_needs_timetravel)
-	{
-		/* track toplevel txn as well, subxact alone isn't meaningful */
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (needs_timetravel)
-	{
-		elog(DEBUG2, "forced transaction %u to do timetravel", xid);
-
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-
-	if (!needs_timetravel)
-	{
-		/* record that we cannot export a general snapshot anymore */
-		builder->committed.includes_all_transactions = false;
-	}
-
-	Assert(!needs_snapshot || needs_timetravel);
-
-	/*
-	 * Adjust xmax of the snapshot builder, we only do that for committed,
-	 * catalog modifying, transactions, everything else isn't interesting for
-	 * us since we'll never look at the respective rows.
-	 */
-	if (needs_timetravel &&
-		(!TransactionIdIsValid(builder->xmax) ||
-		 TransactionIdFollowsOrEquals(xmax, builder->xmax)))
-	{
-		builder->xmax = xmax;
-		TransactionIdAdvance(builder->xmax);
-	}
+	builder->xmax = xmax;
+	/* We use the commit record's LSN as the snapshot */
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
 	/* if there's any reason to build a historic snapshot, do so now */
-	if (needs_snapshot)
+	if (forced_timetravel)
 	{
 		/*
-		 * If we haven't built a complete snapshot yet there's no need to hand
-		 * it out, it wouldn't (and couldn't) be used anyway.
-		 */
-		if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-			return;
-
-		/*
 		 * Decrease the snapshot builder's refcount of the old snapshot, note
 		 * that it still will be used if it has been handed out to the
 		 * reorderbuffer earlier.
@@ -1096,43 +766,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	ReorderBufferTXN *txn;
 
 	/*
-	 * If we're not consistent yet, inspect the record to see whether it
-	 * allows to get closer to being consistent. If we are consistent, dump
-	 * our snapshot so others or we, after a restart, can use it.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-	{
-		/* returns false if there's no point in performing cleanup just yet */
-		if (!SnapBuildFindSnapshot(builder, lsn, running))
-			return;
-	}
-	else
-		SnapBuildSerialize(builder, lsn);
-
-	/*
 	 * Update range of interesting xids based on the running xacts
-	 * information. We don't increase ->xmax using it, because once we are in
-	 * a consistent state we can do that ourselves and much more efficiently
-	 * so, because we only need to do it for catalog transactions since we
-	 * only ever look at those.
-	 *
-	 * NB: Because of that xmax can be lower than xmin, because we only
-	 * increase xmax when a catalog modifying transaction commits. While odd
-	 * looking, it's correct and actually more efficient this way since we hit
-	 * fast paths in tqual.c.
+	 * information.
 	 */
 	builder->xmin = running->oldestRunningXid;
+	builder->xmax = running->nextXid;
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
-	/* Remove transactions we don't need to keep track off anymore */
-	SnapBuildPurgeCommittedTxn(builder);
-
-	elog(DEBUG3, "xmin: %u, xmax: %u, oldestrunning: %u",
-		 builder->xmin, builder->xmax,
-		 running->oldestRunningXid);
+	elog(DEBUG3, "xmin: %u, xmax: %u",
+		 builder->xmin, builder->xmax);
+	Assert(lsn != InvalidXLogRecPtr);
 
 	/*
-	 * Increase shared memory limits, so vacuum can work on tuples we
-	 * prevented from being pruned till now.
+	 * Increase shared memory limits, so vacuum can work on tuples we prevented
+	 * from being pruned till now.
 	 */
 	LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
 
@@ -1148,12 +795,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 * beginning. That point is where we can restart from.
 	 */
 
-	/*
-	 * Can't know about a serialized snapshot's location if we're not
-	 * consistent.
-	 */
 	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
+		builder->state = SNAPBUILD_CONSISTENT;
 
 	txn = ReorderBufferGetOldestTXN(builder->reorder);
 
@@ -1163,781 +806,4 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 */
 	if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
 		LogicalIncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);
-
-	/*
-	 * No in-progress transaction, can reuse the last serialized snapshot if
-	 * we have one.
-	 */
-	else if (txn == NULL &&
-			 builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
-			 builder->last_serialized_snapshot != InvalidXLogRecPtr)
-		LogicalIncreaseRestartDecodingForSlot(lsn,
-											  builder->last_serialized_snapshot);
-}
-
-
-/*
- * Build the start of a snapshot that's capable of decoding the catalog.
- *
- * Helper function for SnapBuildProcessRunningXacts() while we're not yet
- * consistent.
- *
- * Returns true if there is a point in performing internal maintenance/cleanup
- * using the xl_running_xacts record.
- */
-static bool
-SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
-{
-	/* ---
-	 * Build catalog decoding snapshot incrementally using information about
-	 * the currently running transactions. There are several ways to do that:
-	 *
-	 * a) There were no running transactions when the xl_running_xacts record
-	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state while waiting on c)'s sub-states.
-	 *
-	 * b) This (in a previous run) or another decoding slot serialized a
-	 *	  snapshot to disk that we can use.  Can't use this method for the
-	 *	  initial snapshot when slot is being created and needs full snapshot
-	 *	  for export or direct use, as that snapshot will only contain catalog
-	 *	  modifying transactions.
-	 *
-	 * c) First incrementally build a snapshot for catalog tuples
-	 *	  (BUILDING_SNAPSHOT), that requires all, already in-progress,
-	 *	  transactions to finish.  Every transaction starting after that
-	 *	  (FULL_SNAPSHOT state), has enough information to be decoded.  But
-	 *	  for older running transactions no viable snapshot exists yet, so
-	 *	  CONSISTENT will only be reached once all of those have finished.
-	 * ---
-	 */
-
-	/*
-	 * xl_running_xact record is older than what we can use, we might not have
-	 * all necessary catalog rows anymore.
-	 */
-	if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
-		NormalTransactionIdPrecedes(running->oldestRunningXid,
-									builder->initial_xmin_horizon))
-	{
-		ereport(DEBUG1,
-				(errmsg_internal("skipping snapshot at %X/%X while building logical decoding snapshot, xmin horizon too low",
-								 (uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail_internal("initial xmin horizon of %u vs the snapshot's %u",
-									builder->initial_xmin_horizon, running->oldestRunningXid)));
-
-
-		SnapBuildWaitSnapshot(running, builder->initial_xmin_horizon);
-
-		return true;
-	}
-
-	/*
-	 * a) No transaction were running, we can jump to consistent.
-	 *
-	 * This is not affected by races around xl_running_xacts, because we can
-	 * miss transaction commits, but currently not transactions starting.
-	 *
-	 * NB: We might have already started to incrementally assemble a snapshot,
-	 * so we need to be careful to deal with that.
-	 */
-	if (running->oldestRunningXid == running->nextXid)
-	{
-		if (builder->start_decoding_at == InvalidXLogRecPtr ||
-			builder->start_decoding_at <= lsn)
-			/* can decode everything after this */
-			builder->start_decoding_at = lsn + 1;
-
-		/* As no transactions were running xmin/xmax can be trivially set. */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no running transactions.")));
-
-		return false;
-	}
-	/* b) valid on disk state and not building full snapshot */
-	else if (!builder->building_full_snapshot &&
-			 SnapBuildRestore(builder, lsn))
-	{
-		/* there won't be any state to cleanup */
-		return false;
-	}
-
-	/*
-	 * c) transition from START to BUILDING_SNAPSHOT.
-	 *
-	 * In START state, and a xl_running_xacts record with running xacts is
-	 * encountered.  In that case, switch to BUILDING_SNAPSHOT state, and
-	 * record xl_running_xacts->nextXid.  Once all running xacts have finished
-	 * (i.e. they're all >= nextXid), we have a complete catalog snapshot.  It
-	 * might look that we could use xl_running_xact's ->xids information to
-	 * get there quicker, but that is problematic because transactions marked
-	 * as running, might already have inserted their commit record - it's
-	 * infeasible to change that with locking.
-	 */
-	else if (builder->state == SNAPBUILD_START)
-	{
-		builder->state = SNAPBUILD_BUILDING_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		/*
-		 * Start with an xmin/xmax that's correct for future, when all the
-		 * currently running transactions have finished. We'll update both
-		 * while waiting for the pending transactions to finish.
-		 */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial starting point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from BUILDING_SNAPSHOT to FULL_SNAPSHOT.
-	 *
-	 * In BUILDING_SNAPSHOT state, and this xl_running_xacts' oldestRunningXid
-	 * is >= than nextXid from when we switched to BUILDING_SNAPSHOT.  This
-	 * means all transactions starting afterwards have enough information to
-	 * be decoded.  Switch to FULL_SNAPSHOT.
-	 */
-	else if (builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_FULL_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from FULL_SNAPSHOT to CONSISTENT.
-	 *
-	 * In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
-	 * oldestRunningXid is >= than nextXid from when we switched to
-	 * FULL_SNAPSHOT.  This means all transactions that are currently in
-	 * progress have a catalog snapshot, and all their changes have been
-	 * collected.  Switch to CONSISTENT.
-	 */
-	else if (builder->state == SNAPBUILD_FULL_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no old transactions anymore.")));
-	}
-
-	/*
-	 * We already started to track running xacts and need to wait for all
-	 * in-progress ones to finish. We fall through to the normal processing of
-	 * records so incremental cleanup can be performed.
-	 */
-	return true;
-
-}
-
-/* ---
- * Iterate through xids in record, wait for all older than the cutoff to
- * finish.  Then, if possible, log a new xl_running_xacts record.
- *
- * This isn't required for the correctness of decoding, but to:
- * a) allow isolationtester to notice that we're currently waiting for
- *	  something.
- * b) log a new xl_running_xacts record where it'd be helpful, without having
- *	  to write for bgwriter or checkpointer.
- * ---
- */
-static void
-SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff)
-{
-	int			off;
-
-	for (off = 0; off < running->xcnt; off++)
-	{
-		TransactionId xid = running->xids[off];
-
-		/*
-		 * Upper layers should prevent that we ever need to wait on ourselves.
-		 * Check anyway, since failing to do so would either result in an
-		 * endless wait or an Assert() failure.
-		 */
-		if (TransactionIdIsCurrentTransactionId(xid))
-			elog(ERROR, "waiting for ourselves");
-
-		if (TransactionIdFollows(xid, cutoff))
-			continue;
-
-		XactLockTableWait(xid, NULL, NULL, XLTW_None);
-	}
-
-	/*
-	 * All transactions we needed to finish finished - try to ensure there is
-	 * another xl_running_xacts record in a timely manner, without having to
-	 * write for bgwriter or checkpointer to log one.  During recovery we
-	 * can't enforce that, so we'll have to wait.
-	 */
-	if (!RecoveryInProgress())
-	{
-		LogStandbySnapshot();
-	}
-}
-
-/* -----------------------------------
- * Snapshot serialization support
- * -----------------------------------
- */
-
-/*
- * We store current state of struct SnapBuild on disk in the following manner:
- *
- * struct SnapBuildOnDisk;
- * TransactionId * running.xcnt_space;
- * TransactionId * committed.xcnt; (*not xcnt_space*)
- *
- */
-typedef struct SnapBuildOnDisk
-{
-	/* first part of this struct needs to be version independent */
-
-	/* data not covered by checksum */
-	uint32		magic;
-	pg_crc32c	checksum;
-
-	/* data covered by checksum */
-
-	/* version, in case we want to support pg_upgrade */
-	uint32		version;
-	/* how large is the on disk data, excluding the constant sized part */
-	uint32		length;
-
-	/* version dependent part */
-	SnapBuild	builder;
-
-	/* variable amount of TransactionIds follows */
-} SnapBuildOnDisk;
-
-#define SnapBuildOnDiskConstantSize \
-	offsetof(SnapBuildOnDisk, builder)
-#define SnapBuildOnDiskNotChecksummedSize \
-	offsetof(SnapBuildOnDisk, version)
-
-#define SNAPBUILD_MAGIC 0x51A1E001
-#define SNAPBUILD_VERSION 2
-
-/*
- * Store/Load a snapshot from disk, depending on the snapshot builder's state.
- *
- * Supposed to be used by external (i.e. not snapbuild.c) code that just read
- * a record that's a potential location for a serialized snapshot.
- */
-void
-SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn)
-{
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		SnapBuildRestore(builder, lsn);
-	else
-		SnapBuildSerialize(builder, lsn);
-}
-
-/*
- * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
- * been done by another decoding process.
- */
-static void
-SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
-{
-	Size		needed_length;
-	SnapBuildOnDisk *ondisk;
-	char	   *ondisk_c;
-	int			fd;
-	char		tmppath[MAXPGPATH];
-	char		path[MAXPGPATH];
-	int			ret;
-	struct stat stat_buf;
-	Size		sz;
-
-	Assert(lsn != InvalidXLogRecPtr);
-	Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
-		   builder->last_serialized_snapshot <= lsn);
-
-	/*
-	 * no point in serializing if we cannot continue to work immediately after
-	 * restoring the snapshot
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
-
-	/*
-	 * We identify snapshots by the LSN they are valid for. We don't need to
-	 * include timelines in the name as each LSN maps to exactly one timeline
-	 * unless the user used pg_resetwal or similar. If a user did so, there's
-	 * no hope continuing to decode anyway.
-	 */
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	/*
-	 * first check whether some other backend already has written the snapshot
-	 * for this LSN. It's perfectly fine if there's none, so we accept ENOENT
-	 * as a valid state. Everything else is an unexpected error.
-	 */
-	ret = stat(path, &stat_buf);
-
-	if (ret != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errmsg("could not stat file \"%s\": %m", path)));
-
-	else if (ret == 0)
-	{
-		/*
-		 * somebody else has already serialized to this point, don't overwrite
-		 * but remember location, so we don't need to read old data again.
-		 *
-		 * To be sure it has been synced to disk after the rename() from the
-		 * tempfile filename to the real filename, we just repeat the fsync.
-		 * That ought to be cheap because in most scenarios it should already
-		 * be safely on disk.
-		 */
-		fsync_fname(path, false);
-		fsync_fname("pg_logical/snapshots", true);
-
-		builder->last_serialized_snapshot = lsn;
-		goto out;
-	}
-
-	/*
-	 * there is an obvious race condition here between the time we stat(2) the
-	 * file and us writing the file. But we rename the file into place
-	 * atomically and all files created need to contain the same data anyway,
-	 * so this is perfectly fine, although a bit of a resource waste. Locking
-	 * seems like pointless complication.
-	 */
-	elog(DEBUG1, "serializing snapshot to %s", path);
-
-	/* to make sure only we will write to this tempfile, include pid */
-	sprintf(tmppath, "pg_logical/snapshots/%X-%X.snap.%u.tmp",
-			(uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
-
-	/*
-	 * Unlink temporary file if it already exists, needs to have been before a
-	 * crash/error since we won't enter this function twice from within a
-	 * single decoding slot/backend and the temporary file contains the pid of
-	 * the current process.
-	 */
-	if (unlink(tmppath) != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not remove file \"%s\": %m", path)));
-
-	needed_length = sizeof(SnapBuildOnDisk) +
-		sizeof(TransactionId) * builder->committed.xcnt;
-
-	ondisk_c = MemoryContextAllocZero(builder->context, needed_length);
-	ondisk = (SnapBuildOnDisk *) ondisk_c;
-	ondisk->magic = SNAPBUILD_MAGIC;
-	ondisk->version = SNAPBUILD_VERSION;
-	ondisk->length = needed_length;
-	INIT_CRC32C(ondisk->checksum);
-	COMP_CRC32C(ondisk->checksum,
-				((char *) ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-	ondisk_c += sizeof(SnapBuildOnDisk);
-
-	memcpy(&ondisk->builder, builder, sizeof(SnapBuild));
-	/* NULL-ify memory-only data */
-	ondisk->builder.context = NULL;
-	ondisk->builder.snapshot = NULL;
-	ondisk->builder.reorder = NULL;
-	ondisk->builder.committed.xip = NULL;
-
-	COMP_CRC32C(ondisk->checksum,
-				&ondisk->builder,
-				sizeof(SnapBuild));
-
-	/* there shouldn't be any running xacts */
-	Assert(builder->was_running.was_xcnt == 0);
-
-	/* copy committed xacts */
-	sz = sizeof(TransactionId) * builder->committed.xcnt;
-	memcpy(ondisk_c, builder->committed.xip, sz);
-	COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
-	ondisk_c += sz;
-
-	FIN_CRC32C(ondisk->checksum);
-
-	/* we have valid data now, open tempfile and write it there */
-	fd = OpenTransientFile(tmppath,
-						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
-						   S_IRUSR | S_IWUSR);
-	if (fd < 0)
-		ereport(ERROR,
-				(errmsg("could not open file \"%s\": %m", path)));
-
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
-	if ((write(fd, ondisk, needed_length)) != needed_length)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-
-	/*
-	 * fsync the file before renaming so that even if we crash after this we
-	 * have either a fully valid file or nothing.
-	 *
-	 * TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
-	 * some noticeable overhead since it's performed synchronously during
-	 * decoding?
-	 */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
-	if (pg_fsync(fd) != 0)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not fsync file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-	CloseTransientFile(fd);
-
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * We may overwrite the work from some other backend, but that's ok, our
-	 * snapshot is valid as well, we'll just have done some superfluous work.
-	 */
-	if (rename(tmppath, path) != 0)
-	{
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not rename file \"%s\" to \"%s\": %m",
-						tmppath, path)));
-	}
-
-	/* make sure we persist */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
-	 * as a serialization point.
-	 */
-	builder->last_serialized_snapshot = lsn;
-
-out:
-	ReorderBufferSetRestartPoint(builder->reorder,
-								 builder->last_serialized_snapshot);
-}
-
-/*
- * Restore a snapshot into 'builder' if previously one has been stored at the
- * location indicated by 'lsn'. Returns true if successful, false otherwise.
- */
-static bool
-SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
-{
-	SnapBuildOnDisk ondisk;
-	int			fd;
-	char		path[MAXPGPATH];
-	Size		sz;
-	int			readBytes;
-	pg_crc32c	checksum;
-
-	/* no point in loading a snapshot if we're already there */
-	if (builder->state == SNAPBUILD_CONSISTENT)
-		return false;
-
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
-
-	if (fd < 0 && errno == ENOENT)
-		return false;
-	else if (fd < 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", path)));
-
-	/* ----
-	 * Make sure the snapshot had been stored safely to disk, that's normally
-	 * cheap.
-	 * Note that we do not need PANIC here, nobody will be able to use the
-	 * slot without fsyncing, and saving it won't succeed without an fsync()
-	 * either...
-	 * ----
-	 */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-
-	/* read statically sized portion of snapshot */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
-	pgstat_report_wait_end();
-	if (readBytes != SnapBuildOnDiskConstantSize)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) SnapBuildOnDiskConstantSize)));
-	}
-
-	if (ondisk.magic != SNAPBUILD_MAGIC)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has wrong magic number: %u instead of %u",
-						path, ondisk.magic, SNAPBUILD_MAGIC)));
-
-	if (ondisk.version != SNAPBUILD_VERSION)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has unsupported version: %u instead of %u",
-						path, ondisk.version, SNAPBUILD_VERSION)));
-
-	INIT_CRC32C(checksum);
-	COMP_CRC32C(checksum,
-				((char *) &ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-
-	/* read SnapBuild */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk.builder, sizeof(SnapBuild));
-	pgstat_report_wait_end();
-	if (readBytes != sizeof(SnapBuild))
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sizeof(SnapBuild))));
-	}
-	COMP_CRC32C(checksum, &ondisk.builder, sizeof(SnapBuild));
-
-	/* restore running xacts (dead, but kept for backward compat) */
-	sz = sizeof(TransactionId) * ondisk.builder.was_running.was_xcnt_space;
-	ondisk.builder.was_running.was_xip =
-		MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.was_running.was_xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.was_running.was_xip, sz);
-
-	/* restore committed xacts information */
-	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
-	ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.committed.xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.committed.xip, sz);
-
-	CloseTransientFile(fd);
-
-	FIN_CRC32C(checksum);
-
-	/* verify checksum of what we've read */
-	if (!EQ_CRC32C(checksum, ondisk.checksum))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("checksum mismatch for snapbuild state file \"%s\": is %u, should be %u",
-						path, checksum, ondisk.checksum)));
-
-	/*
-	 * ok, we now have a sensible snapshot here, figure out if it has more
-	 * information than we have.
-	 */
-
-	/*
-	 * We are only interested in consistent snapshots for now, comparing
-	 * whether one incomplete snapshot is more "advanced" seems to be
-	 * unnecessarily complex.
-	 */
-	if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
-		goto snapshot_not_interesting;
-
-	/*
-	 * Don't use a snapshot that requires an xmin that we cannot guarantee to
-	 * be available.
-	 */
-	if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
-		goto snapshot_not_interesting;
-
-
-	/* ok, we think the snapshot is sensible, copy over everything important */
-	builder->xmin = ondisk.builder.xmin;
-	builder->xmax = ondisk.builder.xmax;
-	builder->state = ondisk.builder.state;
-
-	builder->committed.xcnt = ondisk.builder.committed.xcnt;
-	/* We only allocated/stored xcnt, not xcnt_space xids ! */
-	/* don't overwrite preallocated xip, if we don't have anything here */
-	if (builder->committed.xcnt > 0)
-	{
-		pfree(builder->committed.xip);
-		builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
-		builder->committed.xip = ondisk.builder.committed.xip;
-	}
-	ondisk.builder.committed.xip = NULL;
-
-	/* our snapshot is not interesting anymore, build a new one */
-	if (builder->snapshot != NULL)
-	{
-		SnapBuildSnapDecRefcount(builder->snapshot);
-	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
-	SnapBuildSnapIncRefcount(builder->snapshot);
-
-	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-
-	Assert(builder->state == SNAPBUILD_CONSISTENT);
-
-	ereport(LOG,
-			(errmsg("logical decoding found consistent point at %X/%X",
-					(uint32) (lsn >> 32), (uint32) lsn),
-			 errdetail("Logical decoding will begin using saved snapshot.")));
-	return true;
-
-snapshot_not_interesting:
-	if (ondisk.builder.committed.xip != NULL)
-		pfree(ondisk.builder.committed.xip);
-	return false;
-}
-
-/*
- * Remove all serialized snapshots that are not required anymore because no
- * slot can need them. This doesn't actually have to run during a checkpoint,
- * but it's a convenient point to schedule this.
- *
- * NB: We run this during checkpoints even if logical decoding is disabled so
- * we cleanup old slots at some point after it got disabled.
- */
-void
-CheckPointSnapBuild(void)
-{
-	XLogRecPtr	cutoff;
-	XLogRecPtr	redo;
-	DIR		   *snap_dir;
-	struct dirent *snap_de;
-	char		path[MAXPGPATH + 21];
-
-	/*
-	 * We start off with a minimum of the last redo pointer. No new
-	 * replication slot will start before that, so that's a safe upper bound
-	 * for removal.
-	 */
-	redo = GetRedoRecPtr();
-
-	/* now check for the restart ptrs from existing slots */
-	cutoff = ReplicationSlotsComputeLogicalRestartLSN();
-
-	/* don't start earlier than the restart lsn */
-	if (redo < cutoff)
-		cutoff = redo;
-
-	snap_dir = AllocateDir("pg_logical/snapshots");
-	while ((snap_de = ReadDir(snap_dir, "pg_logical/snapshots")) != NULL)
-	{
-		uint32		hi;
-		uint32		lo;
-		XLogRecPtr	lsn;
-		struct stat statbuf;
-
-		if (strcmp(snap_de->d_name, ".") == 0 ||
-			strcmp(snap_de->d_name, "..") == 0)
-			continue;
-
-		snprintf(path, sizeof(path), "pg_logical/snapshots/%s", snap_de->d_name);
-
-		if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
-		{
-			elog(DEBUG1, "only regular files expected: %s", path);
-			continue;
-		}
-
-		/*
-		 * temporary filenames from SnapBuildSerialize() include the LSN and
-		 * everything but are postfixed by .$pid.tmp. We can just remove them
-		 * the same as other files because there can be none that are
-		 * currently being written that are older than cutoff.
-		 *
-		 * We just log a message if a file doesn't fit the pattern, it's
-		 * probably some editors lock/state file or similar...
-		 */
-		if (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2)
-		{
-			ereport(LOG,
-					(errmsg("could not parse file name \"%s\"", path)));
-			continue;
-		}
-
-		lsn = ((uint64) hi) << 32 | lo;
-
-		/* check whether we still need it */
-		if (lsn < cutoff || cutoff == InvalidXLogRecPtr)
-		{
-			elog(DEBUG1, "removing snapbuild snapshot %s", path);
-
-			/*
-			 * It's not particularly harmful, though strange, if we can't
-			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be a cure worse than the disease.
-			 */
-			if (unlink(path) < 0)
-			{
-				ereport(LOG,
-						(errcode_for_file_access(),
-						 errmsg("could not remove file \"%s\": %m",
-								path)));
-				continue;
-			}
-		}
-	}
-	FreeDir(snap_dir);
 }
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed14..4e9f140 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,10 +16,10 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
-#include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -127,8 +127,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ProcGlobalShmemSize());
 		size = add_size(size, XLOGShmemSize());
 		size = add_size(size, CLOGShmemSize());
+		size = add_size(size, CSNLOGShmemSize());
 		size = add_size(size, CommitTsShmemSize());
-		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, BackgroundWorkerShmemSize());
 		size = add_size(size, MultiXactShmemSize());
@@ -219,8 +219,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 */
 	XLOGShmemInit();
 	CLOGShmemInit();
+	CSNLOGShmemInit();
 	CommitTsShmemInit();
-	SUBTRANSShmemInit();
 	MultiXactShmemInit();
 	InitBufferPool();
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index a7e8cf2..0606417 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -13,24 +13,14 @@
  * See notes in src/backend/access/transam/README.
  *
  * The process arrays now also include structures representing prepared
- * transactions.  The xid and subxids fields of these are valid, as are the
+ * transactions.  The xid fields of these are valid, as are the
  * myProcLocks lists.  They can be distinguished from regular backend PGPROCs
  * at need by checking for pid == 0.
  *
- * During hot standby, we also keep a list of XIDs representing transactions
- * that are known to be running in the master (or more precisely, were running
- * as of the current point in the WAL stream).  This list is kept in the
- * KnownAssignedXids array, and is updated by watching the sequence of
- * arriving XIDs.  This is necessary because if we leave those XIDs out of
- * snapshots taken for standby queries, then they will appear to be already
- * complete, leading to MVCC failures.  Note that in hot standby, the PGPROC
- * array represents standby processes, which by definition are not running
- * transactions that have XIDs.
- *
- * It is perhaps possible for a backend on the master to terminate without
- * writing an abort record for its transaction.  While that shouldn't really
- * happen, it would tie up KnownAssignedXids indefinitely, so we protect
- * ourselves by pruning the array when a valid list of running XIDs arrives.
+ * During hot standby, we update latestCompletedXid, oldestActiveXid, and
+ * latestObservedXid, as we replay transaction commit/abort and standby WAL
+ * records. Note that in hot standby, the PGPROC array represents standby
+ * processes, which by definition are not running transactions that have XIDs.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -46,7 +36,8 @@
 #include <signal.h>
 
 #include "access/clog.h"
-#include "access/subtrans.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -68,24 +59,6 @@ typedef struct ProcArrayStruct
 	int			numProcs;		/* number of valid procs entries */
 	int			maxProcs;		/* allocated size of procs array */
 
-	/*
-	 * Known assigned XIDs handling
-	 */
-	int			maxKnownAssignedXids;	/* allocated size of array */
-	int			numKnownAssignedXids;	/* current # of valid entries */
-	int			tailKnownAssignedXids;	/* index of oldest valid element */
-	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
-
-	/*
-	 * Highest subxid that has been removed from KnownAssignedXids array to
-	 * prevent overflow; or InvalidTransactionId if none.  We track this for
-	 * similar reasons to tracking overflowing cached subxids in PGXACT
-	 * entries.  Must hold exclusive ProcArrayLock to change this, and shared
-	 * lock to read it.
-	 */
-	TransactionId lastOverflowedXid;
-
 	/* oldest xmin of any replication slot */
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
@@ -101,76 +74,22 @@ static PGPROC *allProcs;
 static PGXACT *allPgXact;
 
 /*
- * Bookkeeping for tracking emulated transactions in recovery
+ * Cached values for GetRecentGlobalXmin().
+ *
+ * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
+ * InvalidTransactionId, to ensure that no one tries to use a stale
+ * value. Readers should ensure that it has been set to something else
+ * before using it.
  */
-static TransactionId *KnownAssignedXids;
-static bool *KnownAssignedXidsValid;
-static TransactionId latestObservedXid = InvalidTransactionId;
+static TransactionId RecentGlobalXmin = InvalidTransactionId;
+static TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /*
- * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
- * the highest xid that might still be running that we don't have in
- * KnownAssignedXids.
+ * Bookkeeping for tracking transactions in recovery
  */
-static TransactionId standbySnapshotPendingXmin;
-
-#ifdef XIDCACHE_DEBUG
-
-/* counters for XidCache measurement */
-static long xc_by_recent_xmin = 0;
-static long xc_by_known_xact = 0;
-static long xc_by_my_xact = 0;
-static long xc_by_latest_xid = 0;
-static long xc_by_main_xid = 0;
-static long xc_by_child_xid = 0;
-static long xc_by_known_assigned = 0;
-static long xc_no_overflow = 0;
-static long xc_slow_answer = 0;
-
-#define xc_by_recent_xmin_inc()		(xc_by_recent_xmin++)
-#define xc_by_known_xact_inc()		(xc_by_known_xact++)
-#define xc_by_my_xact_inc()			(xc_by_my_xact++)
-#define xc_by_latest_xid_inc()		(xc_by_latest_xid++)
-#define xc_by_main_xid_inc()		(xc_by_main_xid++)
-#define xc_by_child_xid_inc()		(xc_by_child_xid++)
-#define xc_by_known_assigned_inc()	(xc_by_known_assigned++)
-#define xc_no_overflow_inc()		(xc_no_overflow++)
-#define xc_slow_answer_inc()		(xc_slow_answer++)
-
-static void DisplayXidCache(void);
-#else							/* !XIDCACHE_DEBUG */
-
-#define xc_by_recent_xmin_inc()		((void) 0)
-#define xc_by_known_xact_inc()		((void) 0)
-#define xc_by_my_xact_inc()			((void) 0)
-#define xc_by_latest_xid_inc()		((void) 0)
-#define xc_by_main_xid_inc()		((void) 0)
-#define xc_by_child_xid_inc()		((void) 0)
-#define xc_by_known_assigned_inc()	((void) 0)
-#define xc_no_overflow_inc()		((void) 0)
-#define xc_slow_answer_inc()		((void) 0)
-#endif							/* XIDCACHE_DEBUG */
-
-/* Primitives for KnownAssignedXids array handling for standby */
-static void KnownAssignedXidsCompress(bool force);
-static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock);
-static bool KnownAssignedXidsSearch(TransactionId xid, bool remove);
-static bool KnownAssignedXidExists(TransactionId xid);
-static void KnownAssignedXidsRemove(TransactionId xid);
-static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids);
-static void KnownAssignedXidsRemovePreceding(TransactionId xid);
-static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
-static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
-							   TransactionId *xmin,
-							   TransactionId xmax);
-static TransactionId KnownAssignedXidsGetOldestXmin(void);
-static void KnownAssignedXidsDisplay(int trace_level);
-static void KnownAssignedXidsReset(void);
-static inline void ProcArrayEndTransactionInternal(PGPROC *proc,
-								PGXACT *pgxact, TransactionId latestXid);
-static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
+static TransactionId latestObservedXid = InvalidTransactionId;
+
+static void AdvanceOldestActiveXid(TransactionId myXid);
 
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -186,31 +105,6 @@ ProcArrayShmemSize(void)
 	size = offsetof(ProcArrayStruct, pgprocnos);
 	size = add_size(size, mul_size(sizeof(int), PROCARRAY_MAXPROCS));
 
-	/*
-	 * During Hot Standby processing we have a data structure called
-	 * KnownAssignedXids, created in shared memory. Local data structures are
-	 * also created in various backends during GetSnapshotData(),
-	 * TransactionIdIsInProgress() and GetRunningTransactionData(). All of the
-	 * main structures created in those functions must be identically sized,
-	 * since we may at times copy the whole of the data structures around. We
-	 * refer to this size as TOTAL_MAX_CACHED_SUBXIDS.
-	 *
-	 * Ideally we'd only create this structure if we were actually doing hot
-	 * standby in the current run, but we don't know that yet at the time
-	 * shared memory is being set up.
-	 */
-#define TOTAL_MAX_CACHED_SUBXIDS \
-	((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
-
-	if (EnableHotStandby)
-	{
-		size = add_size(size,
-						mul_size(sizeof(TransactionId),
-								 TOTAL_MAX_CACHED_SUBXIDS));
-		size = add_size(size,
-						mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS));
-	}
-
 	return size;
 }
 
@@ -238,31 +132,11 @@ CreateSharedProcArray(void)
 		procArray->numProcs = 0;
 		procArray->maxProcs = PROCARRAY_MAXPROCS;
 		procArray->replication_slot_xmin = InvalidTransactionId;
-		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
-		procArray->numKnownAssignedXids = 0;
-		procArray->tailKnownAssignedXids = 0;
-		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
-		procArray->lastOverflowedXid = InvalidTransactionId;
 	}
 
 	allProcs = ProcGlobal->allProcs;
 	allPgXact = ProcGlobal->allPgXact;
 
-	/* Create or attach to the KnownAssignedXids arrays too, if needed */
-	if (EnableHotStandby)
-	{
-		KnownAssignedXids = (TransactionId *)
-			ShmemInitStruct("KnownAssignedXids",
-							mul_size(sizeof(TransactionId),
-									 TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-		KnownAssignedXidsValid = (bool *)
-			ShmemInitStruct("KnownAssignedXidsValid",
-							mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-	}
-
 	/* Register and initialize fields of ProcLWLockTranche */
 	LWLockRegisterTranche(LWTRANCHE_PROC, "proc");
 }
@@ -320,43 +194,15 @@ ProcArrayAdd(PGPROC *proc)
 
 /*
  * Remove the specified PGPROC from the shared array.
- *
- * When latestXid is a valid XID, we are removing a live 2PC gxact from the
- * array, and thus causing it to appear as "not running" anymore.  In this
- * case we must advance latestCompletedXid.  (This is essentially the same
- * as ProcArrayEndTransaction followed by removal of the PGPROC, but we take
- * the ProcArrayLock only once, and don't damage the content of the PGPROC;
- * twophase.c depends on the latter.)
  */
 void
-ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
+ProcArrayRemove(PGPROC *proc)
 {
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-#ifdef XIDCACHE_DEBUG
-	/* dump stats at backend shutdown, but not prepared-xact end */
-	if (proc->pid != 0)
-		DisplayXidCache();
-#endif
-
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/* Advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-	}
-	else
-	{
-		/* Shouldn't be trying to remove a live transaction here */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-	}
-
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
 		if (arrayP->pgprocnos[index] == proc->pgprocno)
@@ -385,211 +231,51 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
  * commit/abort must already be reported to WAL and pg_xact.
  *
  * proc is currently always MyProc, but we pass it explicitly for flexibility.
- * latestXid is the latest Xid among the transaction's main XID and
- * subtransactions, or InvalidTransactionId if it has no XID.  (We must ask
- * the caller to pass latestXid, instead of computing it from the PGPROC's
- * contents, because the subxid information in the PGPROC might be
- * incomplete.)
  */
 void
-ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
+ProcArrayEndTransaction(PGPROC *proc)
 {
 	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
+	TransactionId myXid;
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		/*
-		 * We must lock ProcArrayLock while clearing our advertised XID, so
-		 * that we do not exit the set of "running" transactions while someone
-		 * else is taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/*
-		 * If we can immediately acquire ProcArrayLock, we clear our own XID
-		 * and release the lock.  If not, use group XID clearing to improve
-		 * efficiency.
-		 */
-		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
-		{
-			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
-			LWLockRelease(ProcArrayLock);
-		}
-		else
-			ProcArrayGroupClearXid(proc, latestXid);
-	}
-	else
-	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		proc->lxid = InvalidLocalTransactionId;
-		pgxact->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->delayChkpt = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
+	myXid = pgxact->xid;
 
-		Assert(pgxact->nxids == 0);
-		Assert(pgxact->overflowed == false);
-	}
-}
-
-/*
- * Mark a write transaction as no longer running.
- *
- * We don't do any locking here; caller must handle that.
- */
-static inline void
-ProcArrayEndTransactionInternal(PGPROC *proc, PGXACT *pgxact,
-								TransactionId latestXid)
-{
+	/* A shared lock is enough to modify our own fields */
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
-	/* must be cleared with xid/xmin: */
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
+	/* must be cleared with xid/xmin/snapshotcsn: */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false; /* be sure this is cleared in abort */
 	proc->recoveryConflictPending = false;
 
-	/* Clear the subtransaction-XID cache too while holding the lock */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	LWLockRelease(ProcArrayLock);
+
+	/* If we were the oldest active XID, advance oldestXid */
+	if (TransactionIdIsValid(myXid))
+		AdvanceOldestActiveXid(myXid);
 
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
-/*
- * ProcArrayGroupClearXid -- group XID clearing
- *
- * When we cannot immediately acquire ProcArrayLock in exclusive mode at
- * commit time, add ourselves to a list of processes that need their XIDs
- * cleared.  The first process to add itself to the list will acquire
- * ProcArrayLock in exclusive mode and perform ProcArrayEndTransactionInternal
- * on behalf of all group members.  This avoids a great deal of contention
- * around ProcArrayLock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
- */
-static void
-ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
+void
+ProcArrayResetXmin(PGPROC *proc)
 {
-	volatile PROC_HDR *procglobal = ProcGlobal;
-	uint32		nextidx;
-	uint32		wakeidx;
-
-	/* We should definitely have an XID to clear. */
-	Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-	/* Add ourselves to the list of processes needing a group XID clear. */
-	proc->procArrayGroupMember = true;
-	proc->procArrayGroupMemberXid = latestXid;
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, nextidx);
-
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   (uint32) proc->pgprocno))
-			break;
-	}
-
-	/*
-	 * If the list was not empty, the leader will clear our XID.  It is
-	 * impossible to have followers without a leader because the first process
-	 * that has added itself to the list will always have nextidx as
-	 * INVALID_PGPROCNO.
-	 */
-	if (nextidx != INVALID_PGPROCNO)
-	{
-		int			extraWaits = 0;
-
-		/* Sleep until the leader clears our XID. */
-		pgstat_report_wait_start(WAIT_EVENT_PROCARRAY_GROUP_UPDATE);
-		for (;;)
-		{
-			/* acts as a read barrier */
-			PGSemaphoreLock(proc->sem);
-			if (!proc->procArrayGroupMember)
-				break;
-			extraWaits++;
-		}
-		pgstat_report_wait_end();
-
-		Assert(pg_atomic_read_u32(&proc->procArrayGroupNext) == INVALID_PGPROCNO);
-
-		/* Fix semaphore count for any absorbed wakeups */
-		while (extraWaits-- > 0)
-			PGSemaphoreUnlock(proc->sem);
-		return;
-	}
-
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID clearing, saving a pointer to the head of the list.  Trying
-	 * to pop elements one at a time could lead to an ABA problem.
-	 */
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   INVALID_PGPROCNO))
-			break;
-	}
-
-	/* Remember head of list so we can perform wakeups after dropping lock. */
-	wakeidx = nextidx;
-
-	/* Walk the list and clear all XIDs. */
-	while (nextidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[nextidx];
-		PGXACT	   *pgxact = &allPgXact[nextidx];
-
-		ProcArrayEndTransactionInternal(proc, pgxact, proc->procArrayGroupMemberXid);
-
-		/* Move to next proc in list. */
-		nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-	}
-
-	/* We're done with the lock now. */
-	LWLockRelease(ProcArrayLock);
+	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
 
 	/*
-	 * Now that we've released the lock, go back and wake everybody up.  We
-	 * don't do this under the lock so as to keep lock hold times to a
-	 * minimum.  The system calls we need to perform to wake other processes
-	 * up are probably much slower than the simple memory writes we did while
-	 * holding the lock.
+	 * Note we can do this without locking because we assume that storing an Xid
+	 * is atomic.
 	 */
-	while (wakeidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[wakeidx];
-
-		wakeidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, INVALID_PGPROCNO);
-
-		/* ensure all previous writes are visible before follower continues. */
-		pg_write_barrier();
-
-		proc->procArrayGroupMember = false;
+	pgxact->xmin = InvalidTransactionId;
 
-		if (proc != MyProc)
-			PGSemaphoreUnlock(proc->sem);
-	}
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
 /*
@@ -614,38 +300,47 @@ ProcArrayClearTransaction(PGPROC *proc)
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	proc->recoveryConflictPending = false;
 
 	/* redundant, but just in case */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false;
 
-	/* Clear the subtransaction-XID cache too */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	/*
+	 * We don't need to update oldestActiveXid, because the gxact entry in
+	 * the procarray is still running with the same XID.
+	 */
+
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
 /*
  * ProcArrayInitRecovery -- initialize recovery xid mgmt environment
  *
- * Remember up to where the startup process initialized the CLOG and subtrans
+ * Remember up to where the startup process initialized the CLOG and CSNLOG
  * so we can ensure it's initialized gaplessly up to the point where necessary
  * while in recovery.
  */
 void
-ProcArrayInitRecovery(TransactionId initializedUptoXID)
+ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID)
 {
 	Assert(standbyState == STANDBY_INITIALIZED);
 	Assert(TransactionIdIsNormal(initializedUptoXID));
 
 	/*
-	 * we set latestObservedXid to the xid SUBTRANS has been initialized up
+	 * we set latestObservedXid to the xid SUBTRANS (XXX csnlog?) has been initialized up
 	 * to, so we can extend it from that point onwards in
 	 * RecordKnownAssignedTransactionIds, and when we get consistent in
 	 * ProcArrayApplyRecoveryInfo().
 	 */
 	latestObservedXid = initializedUptoXID;
 	TransactionIdRetreat(latestObservedXid);
+
+	/* also initialize oldestActiveXid */
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, oldestActiveXID);
 }
 
 /*
@@ -666,20 +361,11 @@ ProcArrayInitRecovery(TransactionId initializedUptoXID)
 void
 ProcArrayApplyRecoveryInfo(RunningTransactions running)
 {
-	TransactionId *xids;
-	int			nxids;
 	TransactionId nextXid;
-	int			i;
 
 	Assert(standbyState >= STANDBY_INITIALIZED);
 	Assert(TransactionIdIsValid(running->nextXid));
 	Assert(TransactionIdIsValid(running->oldestRunningXid));
-	Assert(TransactionIdIsNormal(running->latestCompletedXid));
-
-	/*
-	 * Remove stale transactions, if any.
-	 */
-	ExpireOldKnownAssignedTransactionIds(running->oldestRunningXid);
 
 	/*
 	 * Remove stale locks, if any.
@@ -687,7 +373,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 * Locks are always assigned to the toplevel xid so we don't need to care
 	 * about subxcnt/subxids (and by extension not about ->suboverflowed).
 	 */
-	StandbyReleaseOldLocks(running->xcnt, running->xids);
+	StandbyReleaseOldLocks(running->oldestRunningXid);
 
 	/*
 	 * If our snapshot is already valid, nothing else to do...
@@ -695,51 +381,6 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	if (standbyState == STANDBY_SNAPSHOT_READY)
 		return;
 
-	/*
-	 * If our initial RunningTransactionsData had an overflowed snapshot then
-	 * we knew we were missing some subxids from our snapshot. If we continue
-	 * to see overflowed snapshots then we might never be able to start up, so
-	 * we make another test to see if our snapshot is now valid. We know that
-	 * the missing subxids are equal to or earlier than nextXid. After we
-	 * initialise we continue to apply changes during recovery, so once the
-	 * oldestRunningXid is later than the nextXid from the initial snapshot we
-	 * know that we no longer have missing information and can mark the
-	 * snapshot as valid.
-	 */
-	if (standbyState == STANDBY_SNAPSHOT_PENDING)
-	{
-		/*
-		 * If the snapshot isn't overflowed or if its empty we can reset our
-		 * pending state and use this snapshot instead.
-		 */
-		if (!running->subxid_overflow || running->xcnt == 0)
-		{
-			/*
-			 * If we have already collected known assigned xids, we need to
-			 * throw them away before we apply the recovery snapshot.
-			 */
-			KnownAssignedXidsReset();
-			standbyState = STANDBY_INITIALIZED;
-		}
-		else
-		{
-			if (TransactionIdPrecedes(standbySnapshotPendingXmin,
-									  running->oldestRunningXid))
-			{
-				standbyState = STANDBY_SNAPSHOT_READY;
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshots are now enabled");
-			}
-			else
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshot waiting for non-overflowed snapshot or "
-					 "until oldest active xid on standby is at least %u (now %u)",
-					 standbySnapshotPendingXmin,
-					 running->oldestRunningXid);
-			return;
-		}
-	}
-
 	Assert(standbyState == STANDBY_INITIALIZED);
 
 	/*
@@ -750,78 +391,10 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 */
 
 	/*
-	 * Nobody else is running yet, but take locks anyhow
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
-	 * sort them first.
-	 *
-	 * Some of the new xids are top-level xids and some are subtransactions.
-	 * We don't call SubtransSetParent because it doesn't matter yet. If we
-	 * aren't overflowed then all xids will fit in snapshot and so we don't
-	 * need subtrans. If we later overflow, an xid assignment record will add
-	 * xids to subtrans. If RunningXacts is overflowed then we don't have
-	 * enough information to correctly update subtrans anyway.
-	 */
-
-	/*
-	 * Allocate a temporary array to avoid modifying the array passed as
-	 * argument.
-	 */
-	xids = palloc(sizeof(TransactionId) * (running->xcnt + running->subxcnt));
-
-	/*
-	 * Add to the temp array any xids which have not already completed.
-	 */
-	nxids = 0;
-	for (i = 0; i < running->xcnt + running->subxcnt; i++)
-	{
-		TransactionId xid = running->xids[i];
-
-		/*
-		 * The running-xacts snapshot can contain xids that were still visible
-		 * in the procarray when the snapshot was taken, but were already
-		 * WAL-logged as completed. They're not running anymore, so ignore
-		 * them.
-		 */
-		if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
-			continue;
-
-		xids[nxids++] = xid;
-	}
-
-	if (nxids > 0)
-	{
-		if (procArray->numKnownAssignedXids != 0)
-		{
-			LWLockRelease(ProcArrayLock);
-			elog(ERROR, "KnownAssignedXids is not empty");
-		}
-
-		/*
-		 * Sort the array so that we can add them safely into
-		 * KnownAssignedXids.
-		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
-
-		/*
-		 * Add the sorted snapshot into KnownAssignedXids
-		 */
-		for (i = 0; i < nxids; i++)
-			KnownAssignedXidsAdd(xids[i], xids[i], true);
-
-		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	}
-
-	pfree(xids);
-
-	/*
-	 * latestObservedXid is at least set to the point where SUBTRANS was
+	 * latestObservedXid is at least set to the point where CSNLOG was
 	 * started up to (c.f. ProcArrayInitRecovery()) or to the biggest xid
-	 * RecordKnownAssignedTransactionIds() was called for.  Initialize
-	 * subtrans from thereon, up to nextXid - 1.
+	 * RecordKnownAssignedTransactionIds() (FIXME: gone!) was called for.  Initialize
+	 * csnlog from thereon, up to nextXid - 1.
 	 *
 	 * We need to duplicate parts of RecordKnownAssignedTransactionId() here,
 	 * because we've just added xids to the known assigned xids machinery that
@@ -831,52 +404,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	TransactionIdAdvance(latestObservedXid);
 	while (TransactionIdPrecedes(latestObservedXid, running->nextXid))
 	{
-		ExtendSUBTRANS(latestObservedXid);
+		ExtendCSNLOG(latestObservedXid);
 		TransactionIdAdvance(latestObservedXid);
 	}
 	TransactionIdRetreat(latestObservedXid);	/* = running->nextXid - 1 */
 
-	/* ----------
-	 * Now we've got the running xids we need to set the global values that
-	 * are used to track snapshots as they evolve further.
-	 *
-	 * - latestCompletedXid which will be the xmax for snapshots
-	 * - lastOverflowedXid which shows whether snapshots overflow
-	 * - nextXid
-	 *
-	 * If the snapshot overflowed, then we still initialise with what we know,
-	 * but the recovery snapshot isn't fully valid yet because we know there
-	 * are some subxids missing. We don't know the specific subxids that are
-	 * missing, so conservatively assume the last one is latestObservedXid.
-	 * ----------
-	 */
-	if (running->subxid_overflow)
-	{
-		standbyState = STANDBY_SNAPSHOT_PENDING;
-
-		standbySnapshotPendingXmin = latestObservedXid;
-		procArray->lastOverflowedXid = latestObservedXid;
-	}
-	else
-	{
-		standbyState = STANDBY_SNAPSHOT_READY;
-
-		standbySnapshotPendingXmin = InvalidTransactionId;
-	}
-
-	/*
-	 * If a transaction wrote a commit record in the gap between taking and
-	 * logging the snapshot then latestCompletedXid may already be higher than
-	 * the value from the snapshot, so check before we use the incoming value.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  running->latestCompletedXid))
-		ShmemVariableCache->latestCompletedXid = running->latestCompletedXid;
-
-	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
-
-	LWLockRelease(ProcArrayLock);
-
 	/*
 	 * ShmemVariableCache->nextXid must be beyond any observed xid.
 	 *
@@ -895,366 +427,224 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	if (standbyState == STANDBY_SNAPSHOT_READY)
-		elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
-	else
-		elog(trace_recovery(DEBUG1),
-			 "recovery snapshot waiting for non-overflowed snapshot or "
-			 "until oldest active xid on standby is at least %u (now %u)",
-			 standbySnapshotPendingXmin,
-			 running->oldestRunningXid);
+	standbyState = STANDBY_SNAPSHOT_READY;
+	elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
 }
 
 /*
- * ProcArrayApplyXidAssignment
- *		Process an XLOG_XACT_ASSIGNMENT WAL record
+ * TransactionIdIsActive -- is xid the top-level XID of an active backend?
+ *
+ * This ignores prepared transactions and subtransactions, since that's not
+ * needed for current uses.
  */
-void
-ProcArrayApplyXidAssignment(TransactionId topxid,
-							int nsubxids, TransactionId *subxids)
+bool
+TransactionIdIsActive(TransactionId xid)
 {
-	TransactionId max_xid;
+	bool		result = false;
+	ProcArrayStruct *arrayP = procArray;
 	int			i;
 
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	max_xid = TransactionIdLatest(topxid, nsubxids, subxids);
-
-	/*
-	 * Mark all the subtransactions as observed.
-	 *
-	 * NOTE: This will fail if the subxid contains too many previously
-	 * unobserved xids to fit into known-assigned-xids. That shouldn't happen
-	 * as the code stands, because xid-assignment records should never contain
-	 * more than PGPROC_MAX_CACHED_SUBXIDS entries.
-	 */
-	RecordKnownAssignedTransactionIds(max_xid);
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	/*
-	 * Notice that we update pg_subtrans with the top-level xid, rather than
-	 * the parent xid. This is a difference between normal processing and
-	 * recovery, yet is still correct in all cases. The reason is that
-	 * subtransaction commit is not marked in clog until commit processing, so
-	 * all aborted subtransactions have already been clearly marked in clog.
-	 * As a result we are able to refer directly to the top-level
-	 * transaction's state rather than skipping through all the intermediate
-	 * states in the subtransaction tree. This should be the first time we
-	 * have attempted to SubTransSetParent().
-	 */
-	for (i = 0; i < nsubxids; i++)
-		SubTransSetParent(subxids[i], topxid);
+	for (i = 0; i < arrayP->numProcs; i++)
+	{
+		int			pgprocno = arrayP->pgprocnos[i];
+		volatile PGPROC *proc = &allProcs[pgprocno];
+		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		TransactionId pxid;
 
-	/* KnownAssignedXids isn't maintained yet, so we're done for now */
-	if (standbyState == STANDBY_INITIALIZED)
-		return;
+		/* Fetch xid just once - see GetNewTransactionId */
+		pxid = pgxact->xid;
 
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		if (!TransactionIdIsValid(pxid))
+			continue;
 
-	/*
-	 * Remove subxids from known-assigned-xacts.
-	 */
-	KnownAssignedXidsRemoveTree(InvalidTransactionId, nsubxids, subxids);
+		if (proc->pid == 0)
+			continue;			/* ignore prepared transactions */
 
-	/*
-	 * Advance lastOverflowedXid to be at least the last of these subxids.
-	 */
-	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
-		procArray->lastOverflowedXid = max_xid;
+		if (TransactionIdEquals(pxid, xid))
+		{
+			result = true;
+			break;
+		}
+	}
 
 	LWLockRelease(ProcArrayLock);
+
+	return result;
 }
 
 /*
- * TransactionIdIsInProgress -- is given transaction running in some backend
- *
- * Aside from some shortcuts such as checking RecentXmin and our own Xid,
- * there are four possibilities for finding a running transaction:
- *
- * 1. The given Xid is a main transaction Id.  We will find this out cheaply
- * by looking at the PGXACT struct for each backend.
+ * AdvanceOldestActiveXid --
  *
- * 2. The given Xid is one of the cached subxact Xids in the PGPROC array.
- * We can find this out cheaply too.
- *
- * 3. In Hot Standby mode, we must search the KnownAssignedXids list to see
- * if the Xid is running on the master.
- *
- * 4. Search the SubTrans tree to find the Xid's topmost parent, and then see
- * if that is running according to PGXACT or KnownAssignedXids.  This is the
- * slowest way, but sadly it has to be done always if the others failed,
- * unless we see that the cached subxact sets are complete (none have
- * overflowed).
- *
- * ProcArrayLock has to be held while we do 1, 2, 3.  If we save the top Xids
- * while doing 1 and 3, we can release the ProcArrayLock while we do 4.
- * This buys back some concurrency (and we can't retrieve the main Xids from
- * PGXACT again anyway; see GetNewTransactionId).
+ * Advance oldestActiveXid. 'oldXid' is the current value, and it's known to be
+ * finished now.
  */
-bool
-TransactionIdIsInProgress(TransactionId xid)
+static void
+AdvanceOldestActiveXid(TransactionId myXid)
 {
-	static TransactionId *xids = NULL;
-	int			nxids = 0;
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId topxid;
-	int			i,
-				j;
+	TransactionId nextXid;
+	TransactionId xid;
+	TransactionId oldValue;
 
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.  (Note: in particular, this guarantees that
-	 * we reject InvalidTransactionId, FrozenTransactionId, etc as not
-	 * running.)
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-	{
-		xc_by_recent_xmin_inc();
-		return false;
-	}
+	oldValue = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
 
-	/*
-	 * We may have just checked the status of this transaction, so if it is
-	 * already known to be completed, we can fall out without any access to
-	 * shared memory.
-	 */
-	if (TransactionIdIsKnownCompleted(xid))
-	{
-		xc_by_known_xact_inc();
-		return false;
-	}
+	/* Quick exit if we were not the oldest active XID. */
+	if (myXid != oldValue)
+		return;
 
-	/*
-	 * Also, we can handle our own transaction (and subtransactions) without
-	 * any access to shared memory.
-	 */
-	if (TransactionIdIsCurrentTransactionId(xid))
-	{
-		xc_by_my_xact_inc();
-		return true;
-	}
+	xid = myXid;
+	TransactionIdAdvance(xid);
 
-	/*
-	 * If first time through, get workspace to remember main XIDs in. We
-	 * malloc it permanently to avoid repeated palloc/pfree overhead.
-	 */
-	if (xids == NULL)
+	for (;;)
 	{
 		/*
-		 * In hot standby mode, reserve enough space to hold all xids in the
-		 * known-assigned list. If we later finish recovery, we no longer need
-		 * the bigger array, but we don't bother to shrink it.
+		 * Current nextXid is the upper bound, if there are no transactions
+		 * active at all.
 		 */
-		int			maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
-
-		xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
-		if (xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
+		/* assume we can read nextXid atomically without holding XidGenlock. */
+		nextXid = ShmemVariableCache->nextXid;
+		/* Scan the CSN Log for the next in-progress xid */
+		xid = CSNLogGetNextInProgressXid(xid, nextXid);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	/*
-	 * Now that we have the lock, we can check latestCompletedXid; if the
-	 * target Xid is after that, it's surely still running.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
-	{
-		LWLockRelease(ProcArrayLock);
-		xc_by_latest_xid_inc();
-		return true;
-	}
-
-	/* No shortcuts, gotta grovel through the array */
-	for (i = 0; i < arrayP->numProcs; i++)
-	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
-		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
-
-		/* Ignore my own proc --- dealt with it above */
-		if (proc == MyProc)
-			continue;
-
-		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
-
-		if (!TransactionIdIsValid(pxid))
-			continue;
-
-		/*
-		 * Step 1: check the main Xid
-		 */
-		if (TransactionIdEquals(pxid, xid))
-		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_main_xid_inc();
-			return true;
-		}
-
-		/*
-		 * We can ignore main Xids that are younger than the target Xid, since
-		 * the target could not possibly be their child.
-		 */
-		if (TransactionIdPrecedes(xid, pxid))
-			continue;
-
-		/*
-		 * Step 2: check the cached child-Xids arrays
-		 */
-		for (j = pgxact->nxids - 1; j >= 0; j--)
+		if (xid == oldValue)
 		{
-			/* Fetch xid just once - see GetNewTransactionId */
-			TransactionId cxid = proc->subxids.xids[j];
-
-			if (TransactionIdEquals(cxid, xid))
-			{
-				LWLockRelease(ProcArrayLock);
-				xc_by_child_xid_inc();
-				return true;
-			}
+			/* nothing more to do */
+			break;
 		}
 
 		/*
-		 * Save the main Xid for step 4.  We only need to remember main Xids
-		 * that have uncached children.  (Note: there is no race condition
-		 * here because the overflowed flag cannot be cleared, only set, while
-		 * we hold ProcArrayLock.  So we can't miss an Xid that we need to
-		 * worry about.)
+		 * Update oldestActiveXid with that value.
 		 */
-		if (pgxact->overflowed)
-			xids[nxids++] = pxid;
-	}
-
-	/*
-	 * Step 3: in hot standby mode, check the known-assigned-xids list.  XIDs
-	 * in the list must be treated as running.
-	 */
-	if (RecoveryInProgress())
-	{
-		/* none of the PGXACT entries should have XIDs in hot standby mode */
-		Assert(nxids == 0);
-
-		if (KnownAssignedXidExists(xid))
+		if (!pg_atomic_compare_exchange_u32(&ShmemVariableCache->oldestActiveXid,
+											&oldValue,
+											xid))
 		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_known_assigned_inc();
-			return true;
+			/*
+			 * Someone beat us to it. This can happen if we hit the race
+			 * condition described below. That's OK. We're no longer the oldest active
+			 * XID in that case, so we're done.
+			 */
+			Assert(TransactionIdFollows(oldValue, myXid));
+			break;
 		}
 
 		/*
-		 * If the KnownAssignedXids overflowed, we have to check pg_subtrans
-		 * too.  Fetch all xids from KnownAssignedXids that are lower than
-		 * xid, since if xid is a subtransaction its parent will always have a
-		 * lower value.  Note we will collect both main and subXIDs here, but
-		 * there's no help for it.
+		 * We're not necessarily done yet. It's possible that the XID that we saw
+		 * as still running committed just before we updated oldestActiveXid.
+		 * She didn't see herself as the oldest transaction, so she wouldn't
+		 * update oldestActiveXid. Loop back to check the XID that we saw as
+		 * the oldest in-progress one is still in-progress, and if not, update
+		 * oldestActiveXid again, on behalf of that transaction.
 		 */
-		if (TransactionIdPrecedesOrEquals(xid, procArray->lastOverflowedXid))
-			nxids = KnownAssignedXidsGet(xids, xid);
-	}
-
-	LWLockRelease(ProcArrayLock);
-
-	/*
-	 * If none of the relevant caches overflowed, we know the Xid is not
-	 * running without even looking at pg_subtrans.
-	 */
-	if (nxids == 0)
-	{
-		xc_no_overflow_inc();
-		return false;
-	}
-
-	/*
-	 * Step 4: have to check pg_subtrans.
-	 *
-	 * At this point, we know it's either a subtransaction of one of the Xids
-	 * in xids[], or it's not running.  If it's an already-failed
-	 * subtransaction, we want to say "not running" even though its parent may
-	 * still be running.  So first, check pg_xact to see if it's been aborted.
-	 */
-	xc_slow_answer_inc();
-
-	if (TransactionIdDidAbort(xid))
-		return false;
-
-	/*
-	 * It isn't aborted, so check whether the transaction tree it belongs to
-	 * is still running (or, more precisely, whether it was running when we
-	 * held ProcArrayLock).
-	 */
-	topxid = SubTransGetTopmostTransaction(xid);
-	Assert(TransactionIdIsValid(topxid));
-	if (!TransactionIdEquals(topxid, xid))
-	{
-		for (i = 0; i < nxids; i++)
-		{
-			if (TransactionIdEquals(xids[i], topxid))
-				return true;
-		}
+		oldValue = xid;
 	}
-
-	return false;
 }
 
+
 /*
- * TransactionIdIsActive -- is xid the top-level XID of an active backend?
- *
- * This differs from TransactionIdIsInProgress in that it ignores prepared
- * transactions, as well as transactions running on the master if we're in
- * hot standby.  Also, we ignore subtransactions since that's not needed
- * for current uses.
+ * This is like GetOldestXmin(NULL, true), but can return slightly stale, cached value.
  */
-bool
-TransactionIdIsActive(TransactionId xid)
+TransactionId
+GetRecentGlobalXmin(void)
 {
-	bool		result = false;
+	TransactionId globalXmin;
 	ProcArrayStruct *arrayP = procArray;
-	int			i;
+	int			index;
+	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
+	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
 
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-		return false;
+	if (TransactionIdIsValid(RecentGlobalXmin))
+		return RecentGlobalXmin;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (i = 0; i < arrayP->numProcs; i++)
+	/*
+	 * We initialize the MIN() calculation with oldestActiveXid. This
+	 * is a lower bound for the XIDs that might appear in the ProcArray later,
+	 * and so protects us against overestimating the result due to future
+	 * additions.
+	 */
+	globalXmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+	Assert(TransactionIdIsNormal(globalXmin));
+
+	for (index = 0; index < arrayP->numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
+		int			pgprocno = arrayP->pgprocnos[index];
 		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
 
 		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
+		TransactionId xid = pgxact->xid;
 
-		if (!TransactionIdIsValid(pxid))
+		/*
+		 * Backend is doing logical decoding which manages xmin separately,
+		 * check below.
+		 */
+		if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
 			continue;
 
-		if (proc->pid == 0)
-			continue;			/* ignore prepared transactions */
+		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
+			continue;
 
-		if (TransactionIdEquals(pxid, xid))
-		{
-			result = true;
-			break;
-		}
+		/* First consider the transaction's own Xid, if any */
+		if (TransactionIdIsNormal(xid) &&
+			TransactionIdPrecedes(xid, globalXmin))
+			globalXmin = xid;
+
+		/*
+		 * Also consider the transaction's Xmin, if set.
+		 *
+		 * We must check both Xid and Xmin because a transaction might
+		 * have an Xmin but not (yet) an Xid; conversely, if it has an
+		 * Xid, that could determine some not-yet-set Xmin.
+		 */
+		xid = pgxact->xmin; /* Fetch just once */
+		if (TransactionIdIsNormal(xid) &&
+			TransactionIdPrecedes(xid, globalXmin))
+			globalXmin = xid;
 	}
 
+	/* fetch into volatile var while ProcArrayLock is held */
+	replication_slot_xmin = procArray->replication_slot_xmin;
+	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+
 	LWLockRelease(ProcArrayLock);
 
-	return result;
+	/* Update cached variables */
+	RecentGlobalXmin = globalXmin - vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(RecentGlobalXmin))
+		RecentGlobalXmin = FirstNormalTransactionId;
+
+	/* Check whether there's a replication slot requiring an older xmin. */
+	if (TransactionIdIsValid(replication_slot_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_xmin;
+
+	/* Non-catalog tables can be vacuumed if older than this xid */
+	RecentGlobalDataXmin = RecentGlobalXmin;
+
+	/*
+	 * Check whether there's a replication slot requiring an older catalog
+	 * xmin.
+	 */
+	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_catalog_xmin;
+
+	return RecentGlobalXmin;
 }
 
+TransactionId
+GetRecentGlobalDataXmin(void)
+{
+	if (TransactionIdIsValid(RecentGlobalDataXmin))
+		return RecentGlobalDataXmin;
+
+	(void) GetRecentGlobalXmin();
+	Assert(TransactionIdIsValid(RecentGlobalDataXmin));
+
+	return RecentGlobalDataXmin;
+}
 
 /*
  * GetOldestXmin -- returns oldest transaction that was running
@@ -1278,7 +668,7 @@ TransactionIdIsActive(TransactionId xid)
  * ignore concurrently running lazy VACUUMs because (a) they must be working
  * on other tables, and (b) they don't need to do snapshot-based lookups.
  *
- * This is also used to determine where to truncate pg_subtrans.  For that
+ * This is also used to determine where to truncate pg_csnlog. For that
  * backends in all databases have to be considered, so rel = NULL has to be
  * passed in.
  *
@@ -1309,6 +699,10 @@ TransactionIdIsActive(TransactionId xid)
  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.
+ *
+ *
+ * XXX: We track GlobalXmin in shared memory now. Would it makes sense to
+ * have GetOldestXmin() just return that? At least for the rel == NULL case.
  */
 TransactionId
 GetOldestXmin(Relation rel, int flags)
@@ -1339,7 +733,7 @@ GetOldestXmin(Relation rel, int flags)
 	 * and so protects us against overestimating the result due to future
 	 * additions.
 	 */
-	result = ShmemVariableCache->latestCompletedXid;
+	result = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
 	Assert(TransactionIdIsNormal(result));
 	TransactionIdAdvance(result);
 
@@ -1382,28 +776,11 @@ GetOldestXmin(Relation rel, int flags)
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
 
-	if (RecoveryInProgress())
-	{
-		/*
-		 * Check to see whether KnownAssignedXids contains an xid value older
-		 * than the main procarray.
-		 */
-		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
-
-		LWLockRelease(ProcArrayLock);
+	LWLockRelease(ProcArrayLock);
 
-		if (TransactionIdIsNormal(kaxmin) &&
-			TransactionIdPrecedes(kaxmin, result))
-			result = kaxmin;
-	}
-	else
+	if (!RecoveryInProgress())
 	{
 		/*
-		 * No other information needed, so release the lock immediately.
-		 */
-		LWLockRelease(ProcArrayLock);
-
-		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
 		 * being careful not to generate a "permanent" XID.
 		 *
@@ -1447,308 +824,167 @@ GetOldestXmin(Relation rel, int flags)
 }
 
 /*
- * GetMaxSnapshotXidCount -- get max size for snapshot XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotXidCount(void)
-{
-	return procArray->maxProcs;
-}
 
-/*
- * GetMaxSnapshotSubxidCount -- get max size for snapshot sub-XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotSubxidCount(void)
-{
-	return TOTAL_MAX_CACHED_SUBXIDS;
-}
+oldestActiveXid
+	oldest XID that's currently in-progress
+
+GlobalXmin
+	oldest XID that's *seen* by any active snapshot as still in-progress
+
+latestCompletedXid
+	latest XID that has committed.
+
+CSN
+	current CSN
+
+
+
+Get snapshot:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read oldestActiveXid. Store it in MyProc->xmin
+3. Read CSN
+4. LWLockRelease(ProcArrayLock)
+
+End-of-xact:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Reset MyProc->xmin, xid and CSN
+3. Was my XID == oldestActiveXid? If so, advance oldestActiveXid.
+4. Was my xmin == oldestXmin? If so, advance oldestXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceGlobalXmin:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read current oldestActiveXid. That's the upper bound. If a transaction
+   begins now, that's the xmin it would get.
+3. Scan ProcArray, for the smallest xmin.
+4. Set that as the new GlobalXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceOldestActiveXid:
+
+Two alternatives: scan the csnlog or scan the procarray. Scanning the
+procarray is tricky: it's possible that a backend has just read nextXid,
+but not set it in MyProc->xid yet.
+
+
+*/
+
+
 
 /*
- * GetSnapshotData -- returns information about running transactions.
- *
- * The returned snapshot includes xmin (lowest still-running xact ID),
- * xmax (highest completed xact ID + 1), and a list of running xact IDs
- * in the range xmin <= xid < xmax.  It is used as follows:
- *		All xact IDs < xmin are considered finished.
- *		All xact IDs >= xmax are considered still running.
- *		For an xact ID xmin <= xid < xmax, consult list to see whether
- *		it is considered running or not.
+ * GetSnapshotData -- returns an MVCC snapshot.
+ *
+ * The crux of the returned snapshot is the current Commit-Sequence-Number.
+ * All transactions that committed before the CSN is considered
+ * as visible to the snapshot, and all transactions that committed at or
+ * later are considered as still-in-progress.
+ *
+ * The returned snapshot also includes xmin (lowest still-running xact ID),
+ * and xmax (highest completed xact ID + 1). They can be used to avoid
+ * the more expensive check against the CSN:
+ *		All xact IDs < xmin are known to be finished.
+ *		All xact IDs >= xmax are known to be still running.
+ *		For an xact ID xmin <= xid < xmax, consult the CSNLOG to see
+ *		whether its CSN is before or after the snapshot's CSN.
+ *
  * This ensures that the set of transactions seen as "running" by the
  * current xact will not change after it takes the snapshot.
  *
- * All running top-level XIDs are included in the snapshot, except for lazy
- * VACUUM processes.  We also try to include running subtransaction XIDs,
- * but since PGPROC has only a limited cache area for subxact XIDs, full
- * information may not be available.  If we find any overflowed subxid arrays,
- * we have to mark the snapshot's subxid data as overflowed, and extra work
- * *may* need to be done to determine what's running (see XidInMVCCSnapshot()
- * in tqual.c).
- *
  * We also update the following backend-global variables:
  *		TransactionXmin: the oldest xmin of any snapshot in use in the
- *			current transaction (this is the same as MyPgXact->xmin).
- *		RecentXmin: the xmin computed for the most recent snapshot.  XIDs
- *			older than this are known not running any more.
+ *			current transaction.
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
- *			running transactions, except those running LAZY VACUUM).  This is
- *			the same computation done by
- *			GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM).
+ *			running transactions, except those running LAZY VACUUM). This
+ *			can be used to opportunistically remove old dead tuples.
  *		RecentGlobalDataXmin: the global xmin for non-catalog tables
  *			>= RecentGlobalXmin
- *
- * Note: this function should probably not be called with an argument that's
- * not statically allocated (see xip allocation below).
  */
 Snapshot
 GetSnapshotData(Snapshot snapshot)
 {
-	ProcArrayStruct *arrayP = procArray;
 	TransactionId xmin;
 	TransactionId xmax;
-	TransactionId globalxmin;
-	int			index;
-	int			count = 0;
-	int			subcount = 0;
-	bool		suboverflowed = false;
-	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
-	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	CommitSeqNo snapshotcsn;
+	bool		takenDuringRecovery;
 
 	Assert(snapshot != NULL);
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * This does open a possibility for avoiding repeated malloc/free: since
-	 * maxProcs does not change at runtime, we can simply reuse the previous
-	 * xip arrays if any.  (This relies on the fact that all callers pass
-	 * static SnapshotData structs.)
-	 */
-	if (snapshot->xip == NULL)
-	{
-		/*
-		 * First call for this snapshot. Snapshot is same size whether or not
-		 * we are in recovery, see later comments.
-		 */
-		snapshot->xip = (TransactionId *)
-			malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
-		if (snapshot->xip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
-		if (snapshot->subxip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	/*
-	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
-	 * going to set MyPgXact->xmin.
+	 * A shared lock is enough to modify my own entry
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	/* xmax is always latestCompletedXid + 1 */
-	xmax = ShmemVariableCache->latestCompletedXid;
-	Assert(TransactionIdIsNormal(xmax));
-	TransactionIdAdvance(xmax);
-
-	/* initialize xmin calculation with xmax */
-	globalxmin = xmin = xmax;
+	takenDuringRecovery = RecoveryInProgress();
 
-	snapshot->takenDuringRecovery = RecoveryInProgress();
+	/* Anything older than oldestActiveXid is surely finished by now. */
+	xmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
 
-	if (!snapshot->takenDuringRecovery)
+	/* Announce my xmin, to hold back GlobalXmin. */
+	if (!TransactionIdIsValid(MyPgXact->xmin))
 	{
-		int		   *pgprocnos = arrayP->pgprocnos;
-		int			numProcs;
+		TransactionId oldestActiveXid;
+
+		MyPgXact->xmin = xmin;
 
 		/*
-		 * Spin over procArray checking xid, xmin, and subxids.  The goal is
-		 * to gather all active xids, find the lowest xmin, and try to record
-		 * subxids.
+		 * Recheck, if oldestActiveXid advanced after we read it.
+		 *
+		 * This protects against a race condition with AdvanceGlobalXmin().
+		 * If a transaction ends runs AdvanceGlobalXmin(), just after we fetch
+		 * oldestActiveXid, but before we set MyPgXact->xmin, it's possible
+		 * that AdvanceGlobalXmin() computed a new GlobalXmin that doesn't
+		 * cover the xmin that we got. To fix that, check oldestActiveXid
+		 * again, after setting xmin. Redoing it once is enough, we don't need
+		 * to loop, because the (stale) xmin that we set prevents the same
+		 * race condition from advancing oldestXid again.
+		 *
+		 * For a brief moment, we can have the situation that our xmin is
+		 * lower than GlobalXmin, but it's OK because we don't use that xmin
+		 * until we've re-checked and corrected it if necessary.
 		 */
-		numProcs = arrayP->numProcs;
-		for (index = 0; index < numProcs; index++)
-		{
-			int			pgprocno = pgprocnos[index];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			TransactionId xid;
-
-			/*
-			 * Backend is doing logical decoding which manages xmin
-			 * separately, check below.
-			 */
-			if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
-				continue;
+		/*
+		 * memory barrier to make sure that setting the xmin in our PGPROC entry
+		 * is made visible to others, before the read below.
+		 */
+		pg_memory_barrier();
 
-			/* Ignore procs running LAZY VACUUM */
-			if (pgxact->vacuumFlags & PROC_IN_VACUUM)
-				continue;
+		oldestActiveXid  = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+		if (oldestActiveXid != xmin)
+		{
+			xmin = oldestActiveXid;
 
-			/* Update globalxmin to be the smallest valid xmin */
-			xid = pgxact->xmin; /* fetch just once */
-			if (TransactionIdIsNormal(xid) &&
-				NormalTransactionIdPrecedes(xid, globalxmin))
-				globalxmin = xid;
+			MyPgXact->xmin = xmin;
+		}
 
-			/* Fetch xid just once - see GetNewTransactionId */
-			xid = pgxact->xid;
+		TransactionXmin = xmin;
+	}
 
-			/*
-			 * If the transaction has no XID assigned, we can skip it; it
-			 * won't have sub-XIDs either.  If the XID is >= xmax, we can also
-			 * skip it; such transactions will be treated as running anyway
-			 * (and any sub-XIDs will also be >= xmax).
-			 */
-			if (!TransactionIdIsNormal(xid)
-				|| !NormalTransactionIdPrecedes(xid, xmax))
-				continue;
+	/*
+	 * Get the current snapshot CSN, and copy that to my PGPROC entry. This
+	 * serializes us with any concurrent commits.
+	 */
+	snapshotcsn = pg_atomic_read_u64(&ShmemVariableCache->nextCommitSeqNo);
+	if (MyPgXact->snapshotcsn == InvalidCommitSeqNo)
+		MyPgXact->snapshotcsn = snapshotcsn;
+	/*
+	 * Also get xmax. It is always latestCompletedXid + 1.
+	 * Make sure to read it after CSN (see TransactionIdAsyncCommitTree())
+	 */
+	pg_read_barrier();
+	xmax = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	Assert(TransactionIdIsNormal(xmax));
+	TransactionIdAdvance(xmax);
 
-			/*
-			 * We don't include our own XIDs (if any) in the snapshot, but we
-			 * must include them in xmin.
-			 */
-			if (NormalTransactionIdPrecedes(xid, xmin))
-				xmin = xid;
-			if (pgxact == MyPgXact)
-				continue;
+	LWLockRelease(ProcArrayLock);
 
-			/* Add XID to snapshot. */
-			snapshot->xip[count++] = xid;
-
-			/*
-			 * Save subtransaction XIDs if possible (if we've already
-			 * overflowed, there's no point).  Note that the subxact XIDs must
-			 * be later than their parent, so no need to check them against
-			 * xmin.  We could filter against xmax, but it seems better not to
-			 * do that much work while holding the ProcArrayLock.
-			 *
-			 * The other backend can add more subxids concurrently, but cannot
-			 * remove any.  Hence it's important to fetch nxids just once.
-			 * Should be safe to use memcpy, though.  (We needn't worry about
-			 * missing any xids added concurrently, because they must postdate
-			 * xmax.)
-			 *
-			 * Again, our own XIDs are not included in the snapshot.
-			 */
-			if (!suboverflowed)
-			{
-				if (pgxact->overflowed)
-					suboverflowed = true;
-				else
-				{
-					int			nxids = pgxact->nxids;
-
-					if (nxids > 0)
-					{
-						volatile PGPROC *proc = &allProcs[pgprocno];
-
-						memcpy(snapshot->subxip + subcount,
-							   (void *) proc->subxids.xids,
-							   nxids * sizeof(TransactionId));
-						subcount += nxids;
-					}
-				}
-			}
-		}
-	}
-	else
-	{
-		/*
-		 * We're in hot standby, so get XIDs from KnownAssignedXids.
-		 *
-		 * We store all xids directly into subxip[]. Here's why:
-		 *
-		 * In recovery we don't know which xids are top-level and which are
-		 * subxacts, a design choice that greatly simplifies xid processing.
-		 *
-		 * It seems like we would want to try to put xids into xip[] only, but
-		 * that is fairly small. We would either need to make that bigger or
-		 * to increase the rate at which we WAL-log xid assignment; neither is
-		 * an appealing choice.
-		 *
-		 * We could try to store xids into xip[] first and then into subxip[]
-		 * if there are too many xids. That only works if the snapshot doesn't
-		 * overflow because we do not search subxip[] in that case. A simpler
-		 * way is to just store all xids in the subxact array because this is
-		 * by far the bigger array. We just leave the xip array empty.
-		 *
-		 * Either way we need to change the way XidInMVCCSnapshot() works
-		 * depending upon when the snapshot was taken, or change normal
-		 * snapshot processing so it matches.
-		 *
-		 * Note: It is possible for recovery to end before we finish taking
-		 * the snapshot, and for newly assigned transaction ids to be added to
-		 * the ProcArray.  xmax cannot change while we hold ProcArrayLock, so
-		 * those newly added transaction ids would be filtered away, so we
-		 * need not be concerned about them.
-		 */
-		subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
-												  xmax);
-
-		if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
-			suboverflowed = true;
-	}
-
-
-	/* fetch into volatile var while ProcArrayLock is held */
-	replication_slot_xmin = procArray->replication_slot_xmin;
-	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
-
-	if (!TransactionIdIsValid(MyPgXact->xmin))
-		MyPgXact->xmin = TransactionXmin = xmin;
-
-	LWLockRelease(ProcArrayLock);
-
-	/*
-	 * Update globalxmin to include actual process xids.  This is a slightly
-	 * different way of computing it than GetOldestXmin uses, but should give
-	 * the same result.
-	 */
-	if (TransactionIdPrecedes(xmin, globalxmin))
-		globalxmin = xmin;
-
-	/* Update global variables too */
-	RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(RecentGlobalXmin))
-		RecentGlobalXmin = FirstNormalTransactionId;
-
-	/* Check whether there's a replication slot requiring an older xmin. */
-	if (TransactionIdIsValid(replication_slot_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_xmin;
-
-	/* Non-catalog tables can be vacuumed if older than this xid */
-	RecentGlobalDataXmin = RecentGlobalXmin;
-
-	/*
-	 * Check whether there's a replication slot requiring an older catalog
-	 * xmin.
-	 */
-	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_catalog_xmin;
-
-	RecentXmin = xmin;
-
-	snapshot->xmin = xmin;
-	snapshot->xmax = xmax;
-	snapshot->xcnt = count;
-	snapshot->subxcnt = subcount;
-	snapshot->suboverflowed = suboverflowed;
-
-	snapshot->curcid = GetCurrentCommandId(false);
+	snapshot->xmin = xmin;
+	snapshot->xmax = xmax;
+	snapshot->snapshotcsn = snapshotcsn;
+	snapshot->curcid = GetCurrentCommandId(false);
+	snapshot->takenDuringRecovery = takenDuringRecovery;
 
 	/*
 	 * This is a new snapshot, so set both refcounts are zero, and mark it as
@@ -1804,8 +1040,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin,
 	if (!sourcevxid)
 		return false;
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1877,8 +1115,10 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 	Assert(TransactionIdIsNormal(xmin));
 	Assert(proc != NULL);
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	pgxact = &allPgXact[proc->pgprocno];
 
@@ -1905,29 +1145,24 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 /*
  * GetRunningTransactionData -- returns information about running transactions.
  *
- * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * Returns the oldest running TransactionId among all backends, even VACUUM
+ * processes.
+ *
+ * We acquire XidGenlock, but the caller is responsible for releasing it.
+ * Acquiring XidGenLock ensures that no new XID can be assigned until
+ * the caller has WAL-logged this snapshot, and releases the lock.
+ * FIXME: this also used to hold ProcArrayLock, to prevent any transactions
+ * from committing until the caller has WAL-logged. I don't think we need
+ * that anymore, but verify.
  *
- * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
- * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
- * array until the caller has WAL-logged this snapshot, and releases the
- * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
- * lock is released.
+ * Returns the current xmin and xmax, like GetSnapshotData does.
  *
  * The returned data structure is statically allocated; caller should not
  * modify it, and must not assume it is valid past the next call.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
- *
- * Note that if any transaction has overflowed its cached subtransactions
- * then there is no real need include any subtransactions. That isn't a
- * common enough case to worry about optimising the size of the WAL record,
- * and we may wish to see that data for diagnostic purposes anyway.
  */
 RunningTransactions
 GetRunningTransactionData(void)
@@ -1937,52 +1172,18 @@ GetRunningTransactionData(void)
 
 	ProcArrayStruct *arrayP = procArray;
 	RunningTransactions CurrentRunningXacts = &CurrentRunningXactsData;
-	TransactionId latestCompletedXid;
 	TransactionId oldestRunningXid;
-	TransactionId *xids;
 	int			index;
-	int			count;
-	int			subcount;
-	bool		suboverflowed;
 
 	Assert(!RecoveryInProgress());
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * Should only be allocated in bgwriter, since only ever executed during
-	 * checkpoints.
-	 */
-	if (CurrentRunningXacts->xids == NULL)
-	{
-		/*
-		 * First call
-		 */
-		CurrentRunningXacts->xids = (TransactionId *)
-			malloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
-		if (CurrentRunningXacts->xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	xids = CurrentRunningXacts->xids;
-
-	count = subcount = 0;
-	suboverflowed = false;
-
-	/*
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
-	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
-
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
 	/*
@@ -2004,47 +1205,8 @@ GetRunningTransactionData(void)
 		if (!TransactionIdIsValid(xid))
 			continue;
 
-		xids[count++] = xid;
-
 		if (TransactionIdPrecedes(xid, oldestRunningXid))
 			oldestRunningXid = xid;
-
-		if (pgxact->overflowed)
-			suboverflowed = true;
-	}
-
-	/*
-	 * Spin over procArray collecting all subxids, but only if there hasn't
-	 * been a suboverflow.
-	 */
-	if (!suboverflowed)
-	{
-		for (index = 0; index < arrayP->numProcs; index++)
-		{
-			int			pgprocno = arrayP->pgprocnos[index];
-			volatile PGPROC *proc = &allProcs[pgprocno];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			int			nxids;
-
-			/*
-			 * Save subtransaction XIDs. Other backends can't add or remove
-			 * entries while we're holding XidGenLock.
-			 */
-			nxids = pgxact->nxids;
-			if (nxids > 0)
-			{
-				memcpy(&xids[count], (void *) proc->subxids.xids,
-					   nxids * sizeof(TransactionId));
-				count += nxids;
-				subcount += nxids;
-
-				/*
-				 * Top-level XID of a transaction is always less than any of
-				 * its subxids, so we don't need to check if any of the
-				 * subxids are smaller than oldestRunningXid
-				 */
-			}
-		}
 	}
 
 	/*
@@ -2056,18 +1218,14 @@ GetRunningTransactionData(void)
 	 * increases if slots do.
 	 */
 
-	CurrentRunningXacts->xcnt = count - subcount;
-	CurrentRunningXacts->subxcnt = subcount;
-	CurrentRunningXacts->subxid_overflow = suboverflowed;
 	CurrentRunningXacts->nextXid = ShmemVariableCache->nextXid;
 	CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
-	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
-	Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
 
-	/* We don't release the locks here, the caller is responsible for that */
+	LWLockRelease(ProcArrayLock);
+	/* We don't release XidGenLock here, the caller is responsible for that */
 
 	return CurrentRunningXacts;
 }
@@ -2075,17 +1233,18 @@ GetRunningTransactionData(void)
 /*
  * GetOldestActiveTransactionId()
  *
- * Similar to GetSnapshotData but returns just oldestActiveXid. We include
+ * Returns the oldest XID that's still running. We include
  * all PGXACTs with an assigned TransactionId, even VACUUM processes.
  * We look at all databases, though there is no need to include WALSender
  * since this has no effect on hot standby conflicts.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
+ *
+ * XXX: We could just use return ShmemVariableCache->oldestActiveXid. this
+ * uses a different method of computing the value though, so maybe this is
+ * useful as a cross-check?
  */
 TransactionId
 GetOldestActiveTransactionId(void)
@@ -2540,7 +1699,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
  *
  * All callers that are checking xmins always now supply a valid and useful
  * value for limitXmin. The limitXmin is always lower than the lowest
- * numbered KnownAssignedXid that is not already a FATAL error. This is
+ * numbered KnownAssignedXid (XXX) that is not already a FATAL error. This is
  * because we only care about cleanup records that are cleaning up tuple
  * versions from committed transactions. In that case they will only occur
  * at the point where the record is less than the lowest running xid. That
@@ -2996,170 +2155,9 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
-
-#define XidCacheRemove(i) \
-	do { \
-		MyProc->subxids.xids[i] = MyProc->subxids.xids[MyPgXact->nxids - 1]; \
-		MyPgXact->nxids--; \
-	} while (0)
-
-/*
- * XidCacheRemoveRunningXids
- *
- * Remove a bunch of TransactionIds from the list of known-running
- * subtransactions for my backend.  Both the specified xid and those in
- * the xids[] array (of length nxids) are removed from the subxids cache.
- * latestXid must be the latest XID among the group.
- */
-void
-XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid)
-{
-	int			i,
-				j;
-
-	Assert(TransactionIdIsValid(xid));
-
-	/*
-	 * We must hold ProcArrayLock exclusively in order to remove transactions
-	 * from the PGPROC array.  (See src/backend/access/transam/README.)  It's
-	 * possible this could be relaxed since we know this routine is only used
-	 * to abort subtransactions, but pending closer analysis we'd best be
-	 * conservative.
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Under normal circumstances xid and xids[] will be in increasing order,
-	 * as will be the entries in subxids.  Scan backwards to avoid O(N^2)
-	 * behavior when removing a lot of xids.
-	 */
-	for (i = nxids - 1; i >= 0; i--)
-	{
-		TransactionId anxid = xids[i];
-
-		for (j = MyPgXact->nxids - 1; j >= 0; j--)
-		{
-			if (TransactionIdEquals(MyProc->subxids.xids[j], anxid))
-			{
-				XidCacheRemove(j);
-				break;
-			}
-		}
-
-		/*
-		 * Ordinarily we should have found it, unless the cache has
-		 * overflowed. However it's also possible for this routine to be
-		 * invoked multiple times for the same subtransaction, in case of an
-		 * error during AbortSubTransaction.  So instead of Assert, emit a
-		 * debug warning.
-		 */
-		if (j < 0 && !MyPgXact->overflowed)
-			elog(WARNING, "did not find subXID %u in MyProc", anxid);
-	}
-
-	for (j = MyPgXact->nxids - 1; j >= 0; j--)
-	{
-		if (TransactionIdEquals(MyProc->subxids.xids[j], xid))
-		{
-			XidCacheRemove(j);
-			break;
-		}
-	}
-	/* Ordinarily we should have found it, unless the cache has overflowed */
-	if (j < 0 && !MyPgXact->overflowed)
-		elog(WARNING, "did not find subXID %u in MyProc", xid);
-
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-#ifdef XIDCACHE_DEBUG
-
-/*
- * Print stats about effectiveness of XID cache
- */
-static void
-DisplayXidCache(void)
-{
-	fprintf(stderr,
-			"XidCache: xmin: %ld, known: %ld, myxact: %ld, latest: %ld, mainxid: %ld, childxid: %ld, knownassigned: %ld, nooflo: %ld, slow: %ld\n",
-			xc_by_recent_xmin,
-			xc_by_known_xact,
-			xc_by_my_xact,
-			xc_by_latest_xid,
-			xc_by_main_xid,
-			xc_by_child_xid,
-			xc_by_known_assigned,
-			xc_no_overflow,
-			xc_slow_answer);
-}
-#endif							/* XIDCACHE_DEBUG */
-
-
-/* ----------------------------------------------
- *		KnownAssignedTransactions sub-module
- * ----------------------------------------------
- */
-
-/*
- * In Hot Standby mode, we maintain a list of transactions that are (or were)
- * running in the master at the current point in WAL.  These XIDs must be
- * treated as running by standby transactions, even though they are not in
- * the standby server's PGXACT array.
- *
- * We record all XIDs that we know have been assigned.  That includes all the
- * XIDs seen in WAL records, plus all unobserved XIDs that we can deduce have
- * been assigned.  We can deduce the existence of unobserved XIDs because we
- * know XIDs are assigned in sequence, with no gaps.  The KnownAssignedXids
- * list expands as new XIDs are observed or inferred, and contracts when
- * transaction completion records arrive.
- *
- * During hot standby we do not fret too much about the distinction between
- * top-level XIDs and subtransaction XIDs. We store both together in the
- * KnownAssignedXids list.  In backends, this is copied into snapshots in
- * GetSnapshotData(), taking advantage of the fact that XidInMVCCSnapshot()
- * doesn't care about the distinction either.  Subtransaction XIDs are
- * effectively treated as top-level XIDs and in the typical case pg_subtrans
- * links are *not* maintained (which does not affect visibility).
- *
- * We have room in KnownAssignedXids and in snapshots to hold maxProcs *
- * (1 + PGPROC_MAX_CACHED_SUBXIDS) XIDs, so every master transaction must
- * report its subtransaction XIDs in a WAL XLOG_XACT_ASSIGNMENT record at
- * least every PGPROC_MAX_CACHED_SUBXIDS.  When we receive one of these
- * records, we mark the subXIDs as children of the top XID in pg_subtrans,
- * and then remove them from KnownAssignedXids.  This prevents overflow of
- * KnownAssignedXids and snapshots, at the cost that status checks for these
- * subXIDs will take a slower path through TransactionIdIsInProgress().
- * This means that KnownAssignedXids is not necessarily complete for subXIDs,
- * though it should be complete for top-level XIDs; this is the same situation
- * that holds with respect to the PGPROC entries in normal running.
- *
- * When we throw away subXIDs from KnownAssignedXids, we need to keep track of
- * that, similarly to tracking overflow of a PGPROC's subxids array.  We do
- * that by remembering the lastOverflowedXID, ie the last thrown-away subXID.
- * As long as that is within the range of interesting XIDs, we have to assume
- * that subXIDs are missing from snapshots.  (Note that subXID overflow occurs
- * on primary when 65th subXID arrives, whereas on standby it occurs when 64th
- * subXID arrives - that is not an error.)
- *
- * Should a backend on primary somehow disappear before it can write an abort
- * record, then we just leave those XIDs in KnownAssignedXids. They actually
- * aborted but we think they were running; the distinction is irrelevant
- * because either way any changes done by the transaction are not visible to
- * backends in the standby.  We prune KnownAssignedXids when
- * XLOG_RUNNING_XACTS arrives, to forestall possible overflow of the
- * array due to such dead XIDs.
- */
-
 /*
  * RecordKnownAssignedTransactionIds
- *		Record the given XID in KnownAssignedXids, as well as any preceding
+ *		Record the given XID in KnownAssignedXids (FIXME: update comment, KnownAssignedXid is no more), as well as any preceding
  *		unobserved XIDs.
  *
  * RecordKnownAssignedTransactionIds() should be run for *every* WAL record
@@ -3188,7 +2186,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		TransactionId next_expected_xid;
 
 		/*
-		 * Extend subtrans like we do in GetNewTransactionId() during normal
+		 * Extend csnlog like we do in GetNewTransactionId() during normal
 		 * operation using individual extend steps. Note that we do not need
 		 * to extend clog since its extensions are WAL logged.
 		 *
@@ -3200,28 +2198,11 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		while (TransactionIdPrecedes(next_expected_xid, xid))
 		{
 			TransactionIdAdvance(next_expected_xid);
-			ExtendSUBTRANS(next_expected_xid);
+			ExtendCSNLOG(next_expected_xid);
 		}
 		Assert(next_expected_xid == xid);
 
 		/*
-		 * If the KnownAssignedXids machinery isn't up yet, there's nothing
-		 * more to do since we don't track assigned xids yet.
-		 */
-		if (standbyState <= STANDBY_INITIALIZED)
-		{
-			latestObservedXid = xid;
-			return;
-		}
-
-		/*
-		 * Add (latestObservedXid, xid] onto the KnownAssignedXids array.
-		 */
-		next_expected_xid = latestObservedXid;
-		TransactionIdAdvance(next_expected_xid);
-		KnownAssignedXidsAdd(next_expected_xid, xid, false);
-
-		/*
 		 * Now we can advance latestObservedXid
 		 */
 		latestObservedXid = xid;
@@ -3234,726 +2215,3 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		LWLockRelease(XidGenLock);
 	}
 }
-
-/*
- * ExpireTreeKnownAssignedTransactionIds
- *		Remove the given XIDs from KnownAssignedXids.
- *
- * Called during recovery in analogy with and in place of ProcArrayEndTransaction()
- */
-void
-ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
-									  TransactionId *subxids, TransactionId max_xid)
-{
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
-
-	/* As in ProcArrayEndTransaction, advance latestCompletedXid */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  max_xid))
-		ShmemVariableCache->latestCompletedXid = max_xid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireAllKnownAssignedTransactionIds
- *		Remove all entries in KnownAssignedXids
- */
-void
-ExpireAllKnownAssignedTransactionIds(void)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireOldKnownAssignedTransactionIds
- *		Remove KnownAssignedXids entries preceding the given XID
- */
-void
-ExpireOldKnownAssignedTransactionIds(TransactionId xid)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
-}
-
-
-/*
- * Private module functions to manipulate KnownAssignedXids
- *
- * There are 5 main uses of the KnownAssignedXids data structure:
- *
- *	* backends taking snapshots - all valid XIDs need to be copied out
- *	* backends seeking to determine presence of a specific XID
- *	* startup process adding new known-assigned XIDs
- *	* startup process removing specific XIDs as transactions end
- *	* startup process pruning array when special WAL records arrive
- *
- * This data structure is known to be a hot spot during Hot Standby, so we
- * go to some lengths to make these operations as efficient and as concurrent
- * as possible.
- *
- * The XIDs are stored in an array in sorted order --- TransactionIdPrecedes
- * order, to be exact --- to allow binary search for specific XIDs.  Note:
- * in general TransactionIdPrecedes would not provide a total order, but
- * we know that the entries present at any instant should not extend across
- * a large enough fraction of XID space to wrap around (the master would
- * shut down for fear of XID wrap long before that happens).  So it's OK to
- * use TransactionIdPrecedes as a binary-search comparator.
- *
- * It's cheap to maintain the sortedness during insertions, since new known
- * XIDs are always reported in XID order; we just append them at the right.
- *
- * To keep individual deletions cheap, we need to allow gaps in the array.
- * This is implemented by marking array elements as valid or invalid using
- * the parallel boolean array KnownAssignedXidsValid[].  A deletion is done
- * by setting KnownAssignedXidsValid[i] to false, *without* clearing the
- * XID entry itself.  This preserves the property that the XID entries are
- * sorted, so we can do binary searches easily.  Periodically we compress
- * out the unused entries; that's much cheaper than having to compress the
- * array immediately on every deletion.
- *
- * The actually valid items in KnownAssignedXids[] and KnownAssignedXidsValid[]
- * are those with indexes tail <= i < head; items outside this subscript range
- * have unspecified contents.  When head reaches the end of the array, we
- * force compression of unused entries rather than wrapping around, since
- * allowing wraparound would greatly complicate the search logic.  We maintain
- * an explicit tail pointer so that pruning of old XIDs can be done without
- * immediately moving the array contents.  In most cases only a small fraction
- * of the array contains valid entries at any instant.
- *
- * Although only the startup process can ever change the KnownAssignedXids
- * data structure, we still need interlocking so that standby backends will
- * not observe invalid intermediate states.  The convention is that backends
- * must hold shared ProcArrayLock to examine the array.  To remove XIDs from
- * the array, the startup process must hold ProcArrayLock exclusively, for
- * the usual transactional reasons (compare commit/abort of a transaction
- * during normal running).  Compressing unused entries out of the array
- * likewise requires exclusive lock.  To add XIDs to the array, we just insert
- * them into slots to the right of the head pointer and then advance the head
- * pointer.  This wouldn't require any lock at all, except that on machines
- * with weak memory ordering we need to be careful that other processors
- * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
- *
- * Algorithmic analysis:
- *
- * If we have a maximum of M slots, with N XIDs currently spread across
- * S elements then we have N <= S <= M always.
- *
- *	* Adding a new XID is O(1) and needs little locking (unless compression
- *		must happen)
- *	* Compressing the array is O(S) and requires exclusive lock
- *	* Removing an XID is O(logS) and requires exclusive lock
- *	* Taking a snapshot is O(S) and requires shared lock
- *	* Checking for an XID is O(logS) and requires shared lock
- *
- * In comparison, using a hash table for KnownAssignedXids would mean that
- * taking snapshots would be O(M). If we can maintain S << M then the
- * sorted array technique will deliver significantly faster snapshots.
- * If we try to keep S too small then we will spend too much time compressing,
- * so there is an optimal point for any workload mix. We use a heuristic to
- * decide when to compress the array, though trimming also helps reduce
- * frequency of compressing. The heuristic requires us to track the number of
- * currently valid XIDs in the array.
- */
-
-
-/*
- * Compress KnownAssignedXids by shifting valid data down to the start of the
- * array, removing any gaps.
- *
- * A compression step is forced if "force" is true, otherwise we do it
- * only if a heuristic indicates it's a good time to do it.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsCompress(bool force)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			head,
-				tail;
-	int			compress_index;
-	int			i;
-
-	/* no spinlock required since we hold ProcArrayLock exclusively */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	if (!force)
-	{
-		/*
-		 * If we can choose how much to compress, use a heuristic to avoid
-		 * compressing too often or not often enough.
-		 *
-		 * Heuristic is if we have a large enough current spread and less than
-		 * 50% of the elements are currently in use, then compress. This
-		 * should ensure we compress fairly infrequently. We could compress
-		 * less often though the virtual array would spread out more and
-		 * snapshots would become more expensive.
-		 */
-		int			nelements = head - tail;
-
-		if (nelements < 4 * PROCARRAY_MAXPROCS ||
-			nelements < 2 * pArray->numKnownAssignedXids)
-			return;
-	}
-
-	/*
-	 * We compress the array by reading the valid values from tail to head,
-	 * re-aligning data to 0th element.
-	 */
-	compress_index = 0;
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			KnownAssignedXids[compress_index] = KnownAssignedXids[i];
-			KnownAssignedXidsValid[compress_index] = true;
-			compress_index++;
-		}
-	}
-
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = compress_index;
-}
-
-/*
- * Add xids into KnownAssignedXids at the head of the array.
- *
- * xids from from_xid to to_xid, inclusive, are added to the array.
- *
- * If exclusive_lock is true then caller already holds ProcArrayLock in
- * exclusive mode, so we need no extra locking here.  Else caller holds no
- * lock, so we need to be sure we maintain sufficient interlocks against
- * concurrent readers.  (Only the startup process ever calls this, so no need
- * to worry about concurrent writers.)
- */
-static void
-KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	TransactionId next_xid;
-	int			head,
-				tail;
-	int			nxids;
-	int			i;
-
-	Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
-
-	/*
-	 * Calculate how many array slots we'll need.  Normally this is cheap; in
-	 * the unusual case where the XIDs cross the wrap point, we do it the hard
-	 * way.
-	 */
-	if (to_xid >= from_xid)
-		nxids = to_xid - from_xid + 1;
-	else
-	{
-		nxids = 1;
-		next_xid = from_xid;
-		while (TransactionIdPrecedes(next_xid, to_xid))
-		{
-			nxids++;
-			TransactionIdAdvance(next_xid);
-		}
-	}
-
-	/*
-	 * Since only the startup process modifies the head/tail pointers, we
-	 * don't need a lock to read them here.
-	 */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	Assert(head >= 0 && head <= pArray->maxKnownAssignedXids);
-	Assert(tail >= 0 && tail < pArray->maxKnownAssignedXids);
-
-	/*
-	 * Verify that insertions occur in TransactionId sequence.  Note that even
-	 * if the last existing element is marked invalid, it must still have a
-	 * correctly sequenced XID value.
-	 */
-	if (head > tail &&
-		TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
-	{
-		KnownAssignedXidsDisplay(LOG);
-		elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
-	}
-
-	/*
-	 * If our xids won't fit in the remaining space, compress out free space
-	 */
-	if (head + nxids > pArray->maxKnownAssignedXids)
-	{
-		/* must hold lock to compress */
-		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		KnownAssignedXidsCompress(true);
-
-		head = pArray->headKnownAssignedXids;
-		/* note: we no longer care about the tail pointer */
-
-		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
-
-		/*
-		 * If it still won't fit then we're out of memory
-		 */
-		if (head + nxids > pArray->maxKnownAssignedXids)
-			elog(ERROR, "too many KnownAssignedXids");
-	}
-
-	/* Now we can insert the xids into the space starting at head */
-	next_xid = from_xid;
-	for (i = 0; i < nxids; i++)
-	{
-		KnownAssignedXids[head] = next_xid;
-		KnownAssignedXidsValid[head] = true;
-		TransactionIdAdvance(next_xid);
-		head++;
-	}
-
-	/* Adjust count of number of valid entries */
-	pArray->numKnownAssignedXids += nxids;
-
-	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
-	 * pointer, not because the update is likely to be non-atomic, but to
-	 * ensure that other processors see the above array updates before they
-	 * see the head pointer change.
-	 *
-	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
-	 */
-	if (exclusive_lock)
-		pArray->headKnownAssignedXids = head;
-	else
-	{
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-}
-
-/*
- * KnownAssignedXidsSearch
- *
- * Searches KnownAssignedXids for a specific xid and optionally removes it.
- * Returns true if it was found, false if not.
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- * Exclusive lock must be held for remove = true.
- */
-static bool
-KnownAssignedXidsSearch(TransactionId xid, bool remove)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			first,
-				last;
-	int			head;
-	int			tail;
-	int			result_index = -1;
-
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for spinlock */
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-	}
-	else
-	{
-		/* take spinlock to ensure we see up-to-date array contents */
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-
-	/*
-	 * Standard binary search.  Note we can ignore the KnownAssignedXidsValid
-	 * array here, since even invalid entries will contain sorted XIDs.
-	 */
-	first = tail;
-	last = head - 1;
-	while (first <= last)
-	{
-		int			mid_index;
-		TransactionId mid_xid;
-
-		mid_index = (first + last) / 2;
-		mid_xid = KnownAssignedXids[mid_index];
-
-		if (xid == mid_xid)
-		{
-			result_index = mid_index;
-			break;
-		}
-		else if (TransactionIdPrecedes(xid, mid_xid))
-			last = mid_index - 1;
-		else
-			first = mid_index + 1;
-	}
-
-	if (result_index < 0)
-		return false;			/* not in array */
-
-	if (!KnownAssignedXidsValid[result_index])
-		return false;			/* in array, but invalid */
-
-	if (remove)
-	{
-		KnownAssignedXidsValid[result_index] = false;
-
-		pArray->numKnownAssignedXids--;
-		Assert(pArray->numKnownAssignedXids >= 0);
-
-		/*
-		 * If we're removing the tail element then advance tail pointer over
-		 * any invalid elements.  This will speed future searches.
-		 */
-		if (result_index == tail)
-		{
-			tail++;
-			while (tail < head && !KnownAssignedXidsValid[tail])
-				tail++;
-			if (tail >= head)
-			{
-				/* Array is empty, so we can reset both pointers */
-				pArray->headKnownAssignedXids = 0;
-				pArray->tailKnownAssignedXids = 0;
-			}
-			else
-			{
-				pArray->tailKnownAssignedXids = tail;
-			}
-		}
-	}
-
-	return true;
-}
-
-/*
- * Is the specified XID present in KnownAssignedXids[]?
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- */
-static bool
-KnownAssignedXidExists(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	return KnownAssignedXidsSearch(xid, false);
-}
-
-/*
- * Remove the specified XID from KnownAssignedXids[].
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemove(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	elog(trace_recovery(DEBUG4), "remove KnownAssignedXid %u", xid);
-
-	/*
-	 * Note: we cannot consider it an error to remove an XID that's not
-	 * present.  We intentionally remove subxact IDs while processing
-	 * XLOG_XACT_ASSIGNMENT, to avoid array overflow.  Then those XIDs will be
-	 * removed again when the top-level xact commits or aborts.
-	 *
-	 * It might be possible to track such XIDs to distinguish this case from
-	 * actual errors, but it would be complicated and probably not worth it.
-	 * So, just ignore the search result.
-	 */
-	(void) KnownAssignedXidsSearch(xid, true);
-}
-
-/*
- * KnownAssignedXidsRemoveTree
- *		Remove xid (if it's not InvalidTransactionId) and all the subxids.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids)
-{
-	int			i;
-
-	if (TransactionIdIsValid(xid))
-		KnownAssignedXidsRemove(xid);
-
-	for (i = 0; i < nsubxids; i++)
-		KnownAssignedXidsRemove(subxids[i]);
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * Prune KnownAssignedXids up to, but *not* including xid. If xid is invalid
- * then clear the whole table.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemovePreceding(TransactionId removeXid)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			count = 0;
-	int			head,
-				tail,
-				i;
-
-	if (!TransactionIdIsValid(removeXid))
-	{
-		elog(trace_recovery(DEBUG4), "removing all KnownAssignedXids");
-		pArray->numKnownAssignedXids = 0;
-		pArray->headKnownAssignedXids = pArray->tailKnownAssignedXids = 0;
-		return;
-	}
-
-	elog(trace_recovery(DEBUG4), "prune KnownAssignedXids to %u", removeXid);
-
-	/*
-	 * Mark entries invalid starting at the tail.  Since array is sorted, we
-	 * can stop as soon as we reach an entry >= removeXid.
-	 */
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			if (TransactionIdFollowsOrEquals(knownXid, removeXid))
-				break;
-
-			if (!StandbyTransactionIdIsPrepared(knownXid))
-			{
-				KnownAssignedXidsValid[i] = false;
-				count++;
-			}
-		}
-	}
-
-	pArray->numKnownAssignedXids -= count;
-	Assert(pArray->numKnownAssignedXids >= 0);
-
-	/*
-	 * Advance the tail pointer if we've marked the tail item invalid.
-	 */
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-			break;
-	}
-	if (i >= head)
-	{
-		/* Array is empty, so we can reset both pointers */
-		pArray->headKnownAssignedXids = 0;
-		pArray->tailKnownAssignedXids = 0;
-	}
-	else
-	{
-		pArray->tailKnownAssignedXids = i;
-	}
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * KnownAssignedXidsGet - Get an array of xids by scanning KnownAssignedXids.
- * We filter out anything >= xmax.
- *
- * Returns the number of XIDs stored into xarray[].  Caller is responsible
- * that array is large enough.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax)
-{
-	TransactionId xtmp = InvalidTransactionId;
-
-	return KnownAssignedXidsGetAndSetXmin(xarray, &xtmp, xmax);
-}
-
-/*
- * KnownAssignedXidsGetAndSetXmin - as KnownAssignedXidsGet, plus
- * we reduce *xmin to the lowest xid value seen if not already lower.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
-							   TransactionId xmax)
-{
-	int			count = 0;
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop. We can stop
-	 * once we reach the initially seen head, since we are certain that an xid
-	 * cannot enter and then leave the array while we hold ProcArrayLock.  We
-	 * might miss newly-added xids, but they should be >= xmax so irrelevant
-	 * anyway.
-	 *
-	 * Must take spinlock to ensure we see up-to-date array contents.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			/*
-			 * Update xmin if required.  Only the first XID need be checked,
-			 * since the array is sorted.
-			 */
-			if (count == 0 &&
-				TransactionIdPrecedes(knownXid, *xmin))
-				*xmin = knownXid;
-
-			/*
-			 * Filter out anything >= xmax, again relying on sorted property
-			 * of array.
-			 */
-			if (TransactionIdIsValid(xmax) &&
-				TransactionIdFollowsOrEquals(knownXid, xmax))
-				break;
-
-			/* Add knownXid into output array */
-			xarray[count++] = knownXid;
-		}
-	}
-
-	return count;
-}
-
-/*
- * Get oldest XID in the KnownAssignedXids array, or InvalidTransactionId
- * if nothing there.
- */
-static TransactionId
-KnownAssignedXidsGetOldestXmin(void)
-{
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-			return KnownAssignedXids[i];
-	}
-
-	return InvalidTransactionId;
-}
-
-/*
- * Display KnownAssignedXids to provide debug trail
- *
- * Currently this is only called within startup process, so we need no
- * special locking.
- *
- * Note this is pretty expensive, and much of the expense will be incurred
- * even if the elog message will get discarded.  It's not currently called
- * in any performance-critical places, however, so no need to be tenser.
- */
-static void
-KnownAssignedXidsDisplay(int trace_level)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	StringInfoData buf;
-	int			head,
-				tail,
-				i;
-	int			nxids = 0;
-
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	initStringInfo(&buf);
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			nxids++;
-			appendStringInfo(&buf, "[%d]=%u ", i, KnownAssignedXids[i]);
-		}
-	}
-
-	elog(trace_level, "%d KnownAssignedXids (num=%d tail=%d head=%d) %s",
-		 nxids,
-		 pArray->numKnownAssignedXids,
-		 pArray->tailKnownAssignedXids,
-		 pArray->headKnownAssignedXids,
-		 buf.data);
-
-	pfree(buf.data);
-}
-
-/*
- * KnownAssignedXidsReset
- *		Resets KnownAssignedXids to be empty
- */
-static void
-KnownAssignedXidsReset(void)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	pArray->numKnownAssignedXids = 0;
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = 0;
-
-	LWLockRelease(ProcArrayLock);
-}
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 81c291f..e0cf10d 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -65,7 +65,7 @@
 
 #include "postgres.h"
 
-#include "access/transam.h"
+#include "access/mvccvars.h"
 #include "miscadmin.h"
 #include "storage/lwlock.h"
 #include "storage/pg_shmem.h"
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index d491ece..0ee15ef 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -101,9 +101,6 @@ InitRecoveryTransactionEnvironment(void)
 void
 ShutdownRecoveryTransactionEnvironment(void)
 {
-	/* Mark all tracked in-progress transactions as finished. */
-	ExpireAllKnownAssignedTransactionIds();
-
 	/* Release all locks the tracked transactions were holding */
 	StandbyReleaseAllLocks();
 
@@ -309,7 +306,7 @@ ResolveRecoveryConflictWithTablespace(Oid tsid)
 	 *
 	 * We don't wait for commit because drop tablespace is non-transactional.
 	 */
-	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
+	temp_file_users = GetConflictingVirtualXIDs(InvalidCommitSeqNo,
 												InvalidOid);
 	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
@@ -606,8 +603,7 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
-		TransactionIdDidCommit(xid) ||
-		TransactionIdDidAbort(xid))
+		TransactionIdGetStatus(xid) != XID_INPROGRESS)
 		return;
 
 	elog(trace_recovery(DEBUG4),
@@ -722,7 +718,7 @@ StandbyReleaseAllLocks(void)
  *		as long as they're not prepared transactions.
  */
 void
-StandbyReleaseOldLocks(int nxids, TransactionId *xids)
+StandbyReleaseOldLocks(TransactionId oldestRunningXid)
 {
 	ListCell   *cell,
 			   *prev,
@@ -741,26 +737,8 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)
 
 		if (StandbyTransactionIdIsPrepared(lock->xid))
 			remove = false;
-		else
-		{
-			int			i;
-			bool		found = false;
-
-			for (i = 0; i < nxids; i++)
-			{
-				if (lock->xid == xids[i])
-				{
-					found = true;
-					break;
-				}
-			}
-
-			/*
-			 * If its not a running transaction, remove it.
-			 */
-			if (!found)
-				remove = true;
-		}
+		else if (TransactionIdPrecedes(lock->xid, oldestRunningXid))
+			remove = true;
 
 		if (remove)
 		{
@@ -815,13 +793,8 @@ standby_redo(XLogReaderState *record)
 		xl_running_xacts *xlrec = (xl_running_xacts *) XLogRecGetData(record);
 		RunningTransactionsData running;
 
-		running.xcnt = xlrec->xcnt;
-		running.subxcnt = xlrec->subxcnt;
-		running.subxid_overflow = xlrec->subxid_overflow;
 		running.nextXid = xlrec->nextXid;
-		running.latestCompletedXid = xlrec->latestCompletedXid;
 		running.oldestRunningXid = xlrec->oldestRunningXid;
-		running.xids = xlrec->xids;
 
 		ProcArrayApplyRecoveryInfo(&running);
 	}
@@ -929,27 +902,8 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 
-	/*
-	 * GetRunningTransactionData() acquired ProcArrayLock, we must release it.
-	 * For Hot Standby this can be done before inserting the WAL record
-	 * because ProcArrayApplyRecoveryInfo() rechecks the commit status using
-	 * the clog. For logical decoding, though, the lock can't be released
-	 * early because the clog might be "in the future" from the POV of the
-	 * historic snapshot. This would allow for situations where we're waiting
-	 * for the end of a transaction listed in the xl_running_xacts record
-	 * which, according to the WAL, has committed before the xl_running_xacts
-	 * record. Fortunately this routine isn't executed frequently, and it's
-	 * only a shared lock.
-	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	recptr = LogCurrentRunningXacts(running);
 
-	/* Release lock if we kept it longer ... */
-	if (wal_level >= WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
 
@@ -971,41 +925,21 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 	xl_running_xacts xlrec;
 	XLogRecPtr	recptr;
 
-	xlrec.xcnt = CurrRunningXacts->xcnt;
-	xlrec.subxcnt = CurrRunningXacts->subxcnt;
-	xlrec.subxid_overflow = CurrRunningXacts->subxid_overflow;
 	xlrec.nextXid = CurrRunningXacts->nextXid;
 	xlrec.oldestRunningXid = CurrRunningXacts->oldestRunningXid;
-	xlrec.latestCompletedXid = CurrRunningXacts->latestCompletedXid;
 
 	/* Header */
 	XLogBeginInsert();
 	XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
-	XLogRegisterData((char *) (&xlrec), MinSizeOfXactRunningXacts);
-
-	/* array of TransactionIds */
-	if (xlrec.xcnt > 0)
-		XLogRegisterData((char *) CurrRunningXacts->xids,
-						 (xlrec.xcnt + xlrec.subxcnt) * sizeof(TransactionId));
+	XLogRegisterData((char *) (&xlrec), SizeOfXactRunningXacts);
 
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
-	if (CurrRunningXacts->subxid_overflow)
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
-	else
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u+%u running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
+	elog(trace_recovery(DEBUG2),
+		 "snapshot of running transaction ids (lsn %X/%X oldest xid %u next xid %u)",
+		 (uint32) (recptr >> 32), (uint32) recptr,
+		 CurrRunningXacts->oldestRunningXid,
+		 CurrRunningXacts->nextXid);
 
 	/*
 	 * Ensure running_xacts information is synced to disk not too far in the
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index fe98898..2013df4 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -588,8 +588,13 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		/*
+		 * Ok, this xid is not running anymore. But it might be a
+		 * subtransaction whose parent is still running.
+		 */
+		if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
 			break;
+
 		xid = SubTransGetParent(xid);
 	}
 
@@ -620,8 +625,9 @@ ConditionalXactLockTableWait(TransactionId xid)
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
 			break;
+
 		xid = SubTransGetParent(xid);
 	}
 
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5..1d4de19 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ec..75af22e 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,7 +16,7 @@ WALWriteLock						8
 ControlFileLock						9
 CheckpointLock						10
 CLogControlLock						11
-SubtransControlLock					12
+CSNLogControlLock					12
 MultiXactGenLock					13
 MultiXactOffsetControlLock			14
 MultiXactMemberControlLock			15
@@ -47,6 +47,8 @@ CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
-BackendRandomLock					43
-LogicalRepWorkerLock				44
-CLogTruncationLock					45
+CommitSeqNoLock						43
+BackendRandomLock				44
+
+LogicalRepWorkerLock				45
+CLogTruncationLock				46
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index a4cb4d3..fd27d60 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -183,7 +183,9 @@
 
 #include "postgres.h"
 
+#include "access/clog.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -3887,7 +3889,7 @@ static bool
 XidIsConcurrent(TransactionId xid)
 {
 	Snapshot	snap;
-	uint32		i;
+	XLogRecPtr	csn;
 
 	Assert(TransactionIdIsValid(xid));
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -3900,11 +3902,11 @@ XidIsConcurrent(TransactionId xid)
 	if (TransactionIdFollowsOrEquals(xid, snap->xmax))
 		return true;
 
-	for (i = 0; i < snap->xcnt; i++)
-	{
-		if (xid == snap->xip[i])
-			return true;
-	}
+	csn = TransactionIdGetCommitSeqNo(xid);
+	if (COMMITSEQNO_IS_INPROGRESS(csn))
+		return true;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return csn >= snap->snapshotcsn;
 
 	return false;
 }
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index bfa8499..a761a79 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -364,7 +364,7 @@ InitProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->pid = MyProcPid;
 	/* backendId, databaseId and roleId will be filled in later */
 	MyProc->backendId = InvalidBackendId;
@@ -539,7 +539,7 @@ InitAuxiliaryProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->backendId = InvalidBackendId;
 	MyProc->databaseId = InvalidOid;
 	MyProc->roleId = InvalidOid;
@@ -770,7 +770,7 @@ static void
 RemoveProcFromArray(int code, Datum arg)
 {
 	Assert(MyProc != NULL);
-	ProcArrayRemove(MyProc, InvalidTransactionId);
+	ProcArrayRemove(MyProc);
 }
 
 /*
diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c
index 973397c..93390c1 100644
--- a/src/backend/utils/adt/enum.c
+++ b/src/backend/utils/adt/enum.c
@@ -72,8 +72,7 @@ check_safe_enum_use(HeapTuple enumval_tup)
 	 * into syscache; but just in case not, let's check the xmin directly.
 	 */
 	xmin = HeapTupleHeaderGetXmin(enumval_tup->t_data);
-	if (!TransactionIdIsInProgress(xmin) &&
-		TransactionIdDidCommit(xmin))
+	if (TransactionIdDidCommit(xmin))
 		return;
 
 	/* It is a new enum value, so check to see if the whole enum is new */
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index 5dd996f..0660eed 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -22,6 +22,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -53,6 +54,8 @@ typedef uint64 txid;
 
 /*
  * Snapshot containing 8byte txids.
+ *
+ * FIXME: this could be a fixed-length datatype now.
  */
 typedef struct
 {
@@ -63,17 +66,16 @@ typedef struct
 	 */
 	int32		__varsz;
 
-	uint32		nxip;			/* number of txids in xip array */
-	txid		xmin;
 	txid		xmax;
-	/* in-progress txids, xmin <= xip[i] < xmax: */
-	txid		xip[FLEXIBLE_ARRAY_MEMBER];
+	/*
+	 * FIXME: this is change in on-disk format if someone created a column
+	 * with txid datatype. Dump+reload won't load either.
+	 */
+	CommitSeqNo	snapshotcsn;
 } TxidSnapshot;
 
-#define TXID_SNAPSHOT_SIZE(nxip) \
-	(offsetof(TxidSnapshot, xip) + sizeof(txid) * (nxip))
-#define TXID_SNAPSHOT_MAX_NXIP \
-	((MaxAllocSize - offsetof(TxidSnapshot, xip)) / sizeof(txid))
+#define TXID_SNAPSHOT_SIZE \
+	(offsetof(TxidSnapshot, snapshotcsn) + sizeof(CommitSeqNo))
 
 /*
  * Epoch values from xact.c
@@ -183,60 +185,12 @@ convert_xid(TransactionId xid, const TxidEpoch *state)
 }
 
 /*
- * txid comparator for qsort/bsearch
- */
-static int
-cmp_txid(const void *aa, const void *bb)
-{
-	txid		a = *(const txid *) aa;
-	txid		b = *(const txid *) bb;
-
-	if (a < b)
-		return -1;
-	if (a > b)
-		return 1;
-	return 0;
-}
-
-/*
- * Sort a snapshot's txids, so we can use bsearch() later.  Also remove
- * any duplicates.
- *
- * For consistency of on-disk representation, we always sort even if bsearch
- * will not be used.
- */
-static void
-sort_snapshot(TxidSnapshot *snap)
-{
-	txid		last = 0;
-	int			nxip,
-				idx1,
-				idx2;
-
-	if (snap->nxip > 1)
-	{
-		qsort(snap->xip, snap->nxip, sizeof(txid), cmp_txid);
-
-		/* remove duplicates */
-		nxip = snap->nxip;
-		idx1 = idx2 = 0;
-		while (idx1 < nxip)
-		{
-			if (snap->xip[idx1] != last)
-				last = snap->xip[idx2++] = snap->xip[idx1];
-			else
-				snap->nxip--;
-			idx1++;
-		}
-	}
-}
-
-/*
  * check txid visibility.
  */
 static bool
 is_visible_txid(txid value, const TxidSnapshot *snap)
 {
+#ifdef BROKEN
 	if (value < snap->xmin)
 		return true;
 	else if (value >= snap->xmax)
@@ -262,50 +216,8 @@ is_visible_txid(txid value, const TxidSnapshot *snap)
 		}
 		return true;
 	}
-}
-
-/*
- * helper functions to use StringInfo for TxidSnapshot creation.
- */
-
-static StringInfo
-buf_init(txid xmin, txid xmax)
-{
-	TxidSnapshot snap;
-	StringInfo	buf;
-
-	snap.xmin = xmin;
-	snap.xmax = xmax;
-	snap.nxip = 0;
-
-	buf = makeStringInfo();
-	appendBinaryStringInfo(buf, (char *) &snap, TXID_SNAPSHOT_SIZE(0));
-	return buf;
-}
-
-static void
-buf_add_txid(StringInfo buf, txid xid)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	/* do this before possible realloc */
-	snap->nxip++;
-
-	appendBinaryStringInfo(buf, (char *) &xid, sizeof(xid));
-}
-
-static TxidSnapshot *
-buf_finalize(StringInfo buf)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	SET_VARSIZE(snap, buf->len);
-
-	/* buf is not needed anymore */
-	buf->data = NULL;
-	pfree(buf);
-
-	return snap;
+#endif
+	return false;
 }
 
 /*
@@ -350,54 +262,29 @@ str2txid(const char *s, const char **endp)
 static TxidSnapshot *
 parse_snapshot(const char *str)
 {
-	txid		xmin;
-	txid		xmax;
-	txid		last_val = 0,
-				val;
 	const char *str_start = str;
 	const char *endp;
-	StringInfo	buf;
+	TxidSnapshot *snap;
+	uint32		csn_hi,
+				csn_lo;
 
-	xmin = str2txid(str, &endp);
-	if (*endp != ':')
-		goto bad_format;
-	str = endp + 1;
+	snap = palloc0(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
-	xmax = str2txid(str, &endp);
+	snap->xmax = str2txid(str, &endp);
 	if (*endp != ':')
 		goto bad_format;
 	str = endp + 1;
 
 	/* it should look sane */
-	if (xmin == 0 || xmax == 0 || xmin > xmax)
+	if (snap->xmax == 0)
 		goto bad_format;
 
-	/* allocate buffer */
-	buf = buf_init(xmin, xmax);
-
-	/* loop over values */
-	while (*str != '\0')
-	{
-		/* read next value */
-		val = str2txid(str, &endp);
-		str = endp;
-
-		/* require the input to be in order */
-		if (val < xmin || val >= xmax || val < last_val)
-			goto bad_format;
-
-		/* skip duplicates */
-		if (val != last_val)
-			buf_add_txid(buf, val);
-		last_val = val;
-
-		if (*str == ',')
-			str++;
-		else if (*str != '\0')
-			goto bad_format;
-	}
+	if (sscanf(str, "%X/%X", &csn_hi, &csn_lo) != 2)
+		goto bad_format;
+	snap->snapshotcsn = ((uint64) csn_hi) << 32 | csn_lo;
 
-	return buf_finalize(buf);
+	return snap;
 
 bad_format:
 	ereport(ERROR,
@@ -477,8 +364,6 @@ Datum
 txid_current_snapshot(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap;
-	uint32		nxip,
-				i;
 	TxidEpoch	state;
 	Snapshot	cur;
 
@@ -488,35 +373,13 @@ txid_current_snapshot(PG_FUNCTION_ARGS)
 
 	load_xid_epoch(&state);
 
-	/*
-	 * Compile-time limits on the procarray (MAX_BACKENDS processes plus
-	 * MAX_BACKENDS prepared transactions) guarantee nxip won't be too large.
-	 */
-	StaticAssertStmt(MAX_BACKENDS * 2 <= TXID_SNAPSHOT_MAX_NXIP,
-					 "possible overflow in txid_current_snapshot()");
-
 	/* allocate */
-	nxip = cur->xcnt;
-	snap = palloc(TXID_SNAPSHOT_SIZE(nxip));
+	snap = palloc(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
 	/* fill */
-	snap->xmin = convert_xid(cur->xmin, &state);
 	snap->xmax = convert_xid(cur->xmax, &state);
-	snap->nxip = nxip;
-	for (i = 0; i < nxip; i++)
-		snap->xip[i] = convert_xid(cur->xip[i], &state);
-
-	/*
-	 * We want them guaranteed to be in ascending order.  This also removes
-	 * any duplicate xids.  Normally, an XID can only be assigned to one
-	 * backend, but when preparing a transaction for two-phase commit, there
-	 * is a transient state when both the original backend and the dummy
-	 * PGPROC entry reserved for the prepared transaction hold the same XID.
-	 */
-	sort_snapshot(snap);
-
-	/* set size after sorting, because it may have removed duplicate xips */
-	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE(snap->nxip));
+	snap->snapshotcsn = cur->snapshotcsn;
 
 	PG_RETURN_POINTER(snap);
 }
@@ -547,19 +410,12 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData str;
-	uint32		i;
 
 	initStringInfo(&str);
 
-	appendStringInfo(&str, TXID_FMT ":", snap->xmin);
 	appendStringInfo(&str, TXID_FMT ":", snap->xmax);
-
-	for (i = 0; i < snap->nxip; i++)
-	{
-		if (i > 0)
-			appendStringInfoChar(&str, ',');
-		appendStringInfo(&str, TXID_FMT, snap->xip[i]);
-	}
+	appendStringInfo(&str, "%X/%X", (uint32) (snap->snapshotcsn >> 32),
+					 (uint32) snap->snapshotcsn);
 
 	PG_RETURN_CSTRING(str.data);
 }
@@ -574,6 +430,7 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 Datum
 txid_snapshot_recv(PG_FUNCTION_ARGS)
 {
+#ifdef BROKEN
 	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
 	TxidSnapshot *snap;
 	txid		last = 0;
@@ -582,11 +439,6 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	txid		xmin,
 				xmax;
 
-	/* load and validate nxip */
-	nxip = pq_getmsgint(buf, 4);
-	if (nxip < 0 || nxip > TXID_SNAPSHOT_MAX_NXIP)
-		goto bad_format;
-
 	xmin = pq_getmsgint64(buf);
 	xmax = pq_getmsgint64(buf);
 	if (xmin == 0 || xmax == 0 || xmin > xmax || xmax > MAX_TXID)
@@ -619,6 +471,7 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(snap);
 
 bad_format:
+#endif
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 			 errmsg("invalid external txid_snapshot data")));
@@ -637,14 +490,13 @@ txid_snapshot_send(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData buf;
-	uint32		i;
 
 	pq_begintypsend(&buf);
-	pq_sendint(&buf, snap->nxip, 4);
+#ifdef BROKEN
 	pq_sendint64(&buf, snap->xmin);
 	pq_sendint64(&buf, snap->xmax);
-	for (i = 0; i < snap->nxip; i++)
-		pq_sendint64(&buf, snap->xip[i]);
+#endif
+	pq_sendint64(&buf, snap->snapshotcsn);
 	PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
 }
 
@@ -665,14 +517,18 @@ txid_visible_in_snapshot(PG_FUNCTION_ARGS)
 /*
  * txid_snapshot_xmin(txid_snapshot) returns int8
  *
- *		return snapshot's xmin
+ *             return snapshot's xmin
  */
 Datum
 txid_snapshot_xmin(PG_FUNCTION_ARGS)
 {
+	/* FIXME: we don't store xmin in the TxidSnapshot anymore. Maybe we still should? */
+#ifdef BROKEN
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 
 	PG_RETURN_INT64(snap->xmin);
+#endif
+	PG_RETURN_INT64(0);
 }
 
 /*
@@ -687,47 +543,6 @@ txid_snapshot_xmax(PG_FUNCTION_ARGS)
 
 	PG_RETURN_INT64(snap->xmax);
 }
-
-/*
- * txid_snapshot_xip(txid_snapshot) returns setof int8
- *
- *		return in-progress TXIDs in snapshot.
- */
-Datum
-txid_snapshot_xip(PG_FUNCTION_ARGS)
-{
-	FuncCallContext *fctx;
-	TxidSnapshot *snap;
-	txid		value;
-
-	/* on first call initialize snap_state and get copy of snapshot */
-	if (SRF_IS_FIRSTCALL())
-	{
-		TxidSnapshot *arg = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
-
-		fctx = SRF_FIRSTCALL_INIT();
-
-		/* make a copy of user snapshot */
-		snap = MemoryContextAlloc(fctx->multi_call_memory_ctx, VARSIZE(arg));
-		memcpy(snap, arg, VARSIZE(arg));
-
-		fctx->user_fctx = snap;
-	}
-
-	/* return values one-by-one */
-	fctx = SRF_PERCALL_SETUP();
-	snap = fctx->user_fctx;
-	if (fctx->call_cntr < snap->nxip)
-	{
-		value = snap->xip[fctx->call_cntr];
-		SRF_RETURN_NEXT(fctx, Int64GetDatum(value));
-	}
-	else
-	{
-		SRF_RETURN_DONE(fctx);
-	}
-}
-
 /*
  * Report the status of a recent transaction ID, or null for wrapped,
  * truncated away or otherwise too old XIDs.
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 214dc71..c58d6ad 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -75,6 +75,8 @@ provider postgresql {
 	probe checkpoint__done(int, int, int, int, int);
 	probe clog__checkpoint__start(bool);
 	probe clog__checkpoint__done(bool);
+	probe csnlog__checkpoint__start(bool);
+	probe csnlog__checkpoint__done(bool);
 	probe subtrans__checkpoint__start(bool);
 	probe subtrans__checkpoint__done(bool);
 	probe multixact__checkpoint__start(bool);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8..f3d9b41 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -152,19 +152,11 @@ static Snapshot CatalogSnapshot = NULL;
 static Snapshot HistoricSnapshot = NULL;
 
 /*
- * These are updated by GetSnapshotData.  We initialize them this way
- * for the convenience of TransactionIdIsInProgress: even in bootstrap
- * mode, we don't want it to say that BootstrapTransactionId is in progress.
- *
- * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
- * InvalidTransactionId, to ensure that no one tries to use a stale
- * value. Readers should ensure that it has been set to something else
- * before using it.
+ * These are updated by GetSnapshotData.  We initialize them this way,
+ * because even in bootstrap mode, we don't want it to say that
+ * BootstrapTransactionId is in progress.
  */
 TransactionId TransactionXmin = FirstNormalTransactionId;
-TransactionId RecentXmin = FirstNormalTransactionId;
-TransactionId RecentGlobalXmin = InvalidTransactionId;
-TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /* (table, ctid) => (cmin, cmax) mapping during timetravel */
 static HTAB *tuplecid_data = NULL;
@@ -238,9 +230,7 @@ typedef struct SerializedSnapshotData
 {
 	TransactionId xmin;
 	TransactionId xmax;
-	uint32		xcnt;
-	int32		subxcnt;
-	bool		suboverflowed;
+	CommitSeqNo snapshotcsn;
 	bool		takenDuringRecovery;
 	CommandId	curcid;
 	TimestampTz whenTaken;
@@ -579,26 +569,18 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 	 * Even though we are not going to use the snapshot it computes, we must
 	 * call GetSnapshotData, for two reasons: (1) to be sure that
 	 * CurrentSnapshotData's XID arrays have been allocated, and (2) to update
-	 * RecentXmin and RecentGlobalXmin.  (We could alternatively include those
+	 * RecentGlobalXmin.  (We could alternatively include those
 	 * two variables in exported snapshot files, but it seems better to have
 	 * snapshot importers compute reasonably up-to-date values for them.)
+	 *
+	 * FIXME: neither of those reasons hold anymore. Can we drop this?
 	 */
 	CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
 	/*
 	 * Now copy appropriate fields from the source snapshot.
 	 */
-	CurrentSnapshot->xmin = sourcesnap->xmin;
 	CurrentSnapshot->xmax = sourcesnap->xmax;
-	CurrentSnapshot->xcnt = sourcesnap->xcnt;
-	Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
-	memcpy(CurrentSnapshot->xip, sourcesnap->xip,
-		   sourcesnap->xcnt * sizeof(TransactionId));
-	CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
-	Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
-	memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
-		   sourcesnap->subxcnt * sizeof(TransactionId));
-	CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
 	CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
 	/* NB: curcid should NOT be copied, it's a local matter */
 
@@ -660,50 +642,17 @@ static Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
-	Size		subxipoff;
-	Size		size;
 
 	Assert(snapshot != InvalidSnapshot);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-	if (snapshot->subxcnt > 0)
-		size += snapshot->subxcnt * sizeof(TransactionId);
-
-	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	memcpy(newsnap, snapshot, sizeof(SnapshotData));
 
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
 	newsnap->copied = true;
 
-	/* setup XID array */
-	if (snapshot->xcnt > 0)
-	{
-		newsnap->xip = (TransactionId *) (newsnap + 1);
-		memcpy(newsnap->xip, snapshot->xip,
-			   snapshot->xcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->xip = NULL;
-
-	/*
-	 * Setup subXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (snapshot->subxcnt > 0 &&
-		(!snapshot->suboverflowed || snapshot->takenDuringRecovery))
-	{
-		newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
-		memcpy(newsnap->subxip, snapshot->subxip,
-			   snapshot->subxcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->subxip = NULL;
-
 	return newsnap;
 }
 
@@ -984,7 +933,7 @@ SnapshotResetXmin(void)
 
 	if (pairingheap_is_empty(&RegisteredSnapshots))
 	{
-		MyPgXact->xmin = InvalidTransactionId;
+		ProcArrayResetXmin(MyProc);
 		return;
 	}
 
@@ -992,7 +941,7 @@ SnapshotResetXmin(void)
 										pairingheap_first(&RegisteredSnapshots));
 
 	if (TransactionIdPrecedes(MyPgXact->xmin, minSnapshot->xmin))
-		MyPgXact->xmin = minSnapshot->xmin;
+		ProcArrayResetXmin(MyProc);
 }
 
 /*
@@ -1159,13 +1108,8 @@ char *
 ExportSnapshot(Snapshot snapshot)
 {
 	TransactionId topXid;
-	TransactionId *children;
-	ExportedSnapshot *esnap;
-	int			nchildren;
-	int			addTopXid;
 	StringInfoData buf;
 	FILE	   *f;
-	int			i;
 	MemoryContext oldcxt;
 	char		path[MAXPGPATH];
 	char		pathtmp[MAXPGPATH];
@@ -1185,9 +1129,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * Get our transaction ID if there is one, to include in the snapshot.
+	 * This will assign a transaction ID if we do not yet have one.
 	 */
-	topXid = GetTopTransactionIdIfAny();
+	topXid = GetTopTransactionId();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1200,20 +1144,6 @@ ExportSnapshot(Snapshot snapshot)
 				 errmsg("cannot export a snapshot from a subtransaction")));
 
 	/*
-	 * We do however allow previous committed subtransactions to exist.
-	 * Importers of the snapshot must see them as still running, so get their
-	 * XIDs to add them to the snapshot.
-	 */
-	nchildren = xactGetCommittedChildren(&children);
-
-	/*
-	 * Generate file path for the snapshot.  We start numbering of snapshots
-	 * inside the transaction from 1.
-	 */
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d",
-			 MyProc->backendId, MyProc->lxid, list_length(exportedSnapshots) + 1);
-
-	/*
 	 * Copy the snapshot into TopTransactionContext, add it to the
 	 * exportedSnapshots list, and mark it pseudo-registered.  We do this to
 	 * ensure that the snapshot's xmin is honored for the rest of the
@@ -1222,10 +1152,7 @@ ExportSnapshot(Snapshot snapshot)
 	snapshot = CopySnapshot(snapshot);
 
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
-	esnap->snapfile = pstrdup(path);
-	esnap->snapshot = snapshot;
-	exportedSnapshots = lappend(exportedSnapshots, esnap);
+	exportedSnapshots = lappend(exportedSnapshots, snapshot);
 	MemoryContextSwitchTo(oldcxt);
 
 	snapshot->regd_count++;
@@ -1238,7 +1165,7 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
+	appendStringInfo(&buf, "xid:%u\n", topXid);
 	appendStringInfo(&buf, "pid:%d\n", MyProcPid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
@@ -1247,42 +1174,10 @@ ExportSnapshot(Snapshot snapshot)
 	appendStringInfo(&buf, "xmin:%u\n", snapshot->xmin);
 	appendStringInfo(&buf, "xmax:%u\n", snapshot->xmax);
 
-	/*
-	 * We must include our own top transaction ID in the top-xid data, since
-	 * by definition we will still be running when the importing transaction
-	 * adopts the snapshot, but GetSnapshotData never includes our own XID in
-	 * the snapshot.  (There must, therefore, be enough room to add it.)
-	 *
-	 * However, it could be that our topXid is after the xmax, in which case
-	 * we shouldn't include it because xip[] members are expected to be before
-	 * xmax.  (We need not make the same check for subxip[] members, see
-	 * snapshot.h.)
-	 */
-	addTopXid = (TransactionIdIsValid(topXid) &&
-				 TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
-	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
-	for (i = 0; i < snapshot->xcnt; i++)
-		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
-	if (addTopXid)
-		appendStringInfo(&buf, "xip:%u\n", topXid);
-
-	/*
-	 * Similarly, we add our subcommitted child XIDs to the subxid data. Here,
-	 * we have to cope with possible overflow.
-	 */
-	if (snapshot->suboverflowed ||
-		snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
-		appendStringInfoString(&buf, "sof:1\n");
-	else
-	{
-		appendStringInfoString(&buf, "sof:0\n");
-		appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
-		for (i = 0; i < snapshot->subxcnt; i++)
-			appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
-		for (i = 0; i < nchildren; i++)
-			appendStringInfo(&buf, "sxp:%u\n", children[i]);
-	}
 	appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
+	appendStringInfo(&buf, "snapshotcsn:%X/%X\n",
+					 (uint32) (snapshot->snapshotcsn >> 32),
+					 (uint32) snapshot->snapshotcsn);
 
 	/*
 	 * Now write the text representation into a file.  We first write to a
@@ -1342,85 +1237,6 @@ pg_export_snapshot(PG_FUNCTION_ARGS)
 
 
 /*
- * Parsing subroutines for ImportSnapshot: parse a line with the given
- * prefix followed by a value, and advance *s to the next line.  The
- * filename is provided for use in error messages.
- */
-static int
-parseIntFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	int			val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static TransactionId
-parseXidFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	TransactionId val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%u", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static void
-parseVxidFromText(const char *prefix, char **s, const char *filename,
-				  VirtualTransactionId *vxid)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-}
-
-/*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
  *		filename in SNAPSHOT_EXPORT_DIR.  Load the snapshot from that file.
@@ -1429,170 +1245,7 @@ parseVxidFromText(const char *prefix, char **s, const char *filename,
 void
 ImportSnapshot(const char *idstr)
 {
-	char		path[MAXPGPATH];
-	FILE	   *f;
-	struct stat stat_buf;
-	char	   *filebuf;
-	int			xcnt;
-	int			i;
-	VirtualTransactionId src_vxid;
-	int			src_pid;
-	Oid			src_dbid;
-	int			src_isolevel;
-	bool		src_readonly;
-	SnapshotData snapshot;
-
-	/*
-	 * Must be at top level of a fresh transaction.  Note in particular that
-	 * we check we haven't acquired an XID --- if we have, it's conceivable
-	 * that the snapshot would show it as not running, making for very screwy
-	 * behavior.
-	 */
-	if (FirstSnapshotSet ||
-		GetTopTransactionIdIfAny() != InvalidTransactionId ||
-		IsSubTransaction())
-		ereport(ERROR,
-				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
-				 errmsg("SET TRANSACTION SNAPSHOT must be called before any query")));
-
-	/*
-	 * If we are in read committed mode then the next query would execute with
-	 * a new snapshot thus making this function call quite useless.
-	 */
-	if (!IsolationUsesXactSnapshot())
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("a snapshot-importing transaction must have isolation level SERIALIZABLE or REPEATABLE READ")));
-
-	/*
-	 * Verify the identifier: only 0-9, A-F and hyphens are allowed.  We do
-	 * this mainly to prevent reading arbitrary files.
-	 */
-	if (strspn(idstr, "0123456789ABCDEF-") != strlen(idstr))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* OK, read the file */
-	snprintf(path, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", idstr);
-
-	f = AllocateFile(path, PG_BINARY_R);
-	if (!f)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* get the size of the file so that we know how much memory we need */
-	if (fstat(fileno(f), &stat_buf))
-		elog(ERROR, "could not stat file \"%s\": %m", path);
-
-	/* and read the file into a palloc'd string */
-	filebuf = (char *) palloc(stat_buf.st_size + 1);
-	if (fread(filebuf, stat_buf.st_size, 1, f) != 1)
-		elog(ERROR, "could not read file \"%s\": %m", path);
-
-	filebuf[stat_buf.st_size] = '\0';
-
-	FreeFile(f);
-
-	/*
-	 * Construct a snapshot struct by parsing the file content.
-	 */
-	memset(&snapshot, 0, sizeof(snapshot));
-
-	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
-	src_pid = parseIntFromText("pid:", &filebuf, path);
-	/* we abuse parseXidFromText a bit here ... */
-	src_dbid = parseXidFromText("dbid:", &filebuf, path);
-	src_isolevel = parseIntFromText("iso:", &filebuf, path);
-	src_readonly = parseIntFromText("ro:", &filebuf, path);
-
-	snapshot.xmin = parseXidFromText("xmin:", &filebuf, path);
-	snapshot.xmax = parseXidFromText("xmax:", &filebuf, path);
-
-	snapshot.xcnt = xcnt = parseIntFromText("xcnt:", &filebuf, path);
-
-	/* sanity-check the xid count before palloc */
-	if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	snapshot.xip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-	for (i = 0; i < xcnt; i++)
-		snapshot.xip[i] = parseXidFromText("xip:", &filebuf, path);
-
-	snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
-
-	if (!snapshot.suboverflowed)
-	{
-		snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
-
-		/* sanity-check the xid count before palloc */
-		if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-		snapshot.subxip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-		for (i = 0; i < xcnt; i++)
-			snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
-	}
-	else
-	{
-		snapshot.subxcnt = 0;
-		snapshot.subxip = NULL;
-	}
-
-	snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
-
-	/*
-	 * Do some additional sanity checking, just to protect ourselves.  We
-	 * don't trouble to check the array elements, just the most critical
-	 * fields.
-	 */
-	if (!VirtualTransactionIdIsValid(src_vxid) ||
-		!OidIsValid(src_dbid) ||
-		!TransactionIdIsNormal(snapshot.xmin) ||
-		!TransactionIdIsNormal(snapshot.xmax))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	/*
-	 * If we're serializable, the source transaction must be too, otherwise
-	 * predicate.c has problems (SxactGlobalXmin could go backwards).  Also, a
-	 * non-read-only transaction can't adopt a snapshot from a read-only
-	 * transaction, as predicate.c handles the cases very differently.
-	 */
-	if (IsolationIsSerializable())
-	{
-		if (src_isolevel != XACT_SERIALIZABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a serializable transaction cannot import a snapshot from a non-serializable transaction")));
-		if (src_readonly && !XactReadOnly)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a non-read-only serializable transaction cannot import a snapshot from a read-only transaction")));
-	}
-
-	/*
-	 * We cannot import a snapshot that was taken in a different database,
-	 * because vacuum calculates OldestXmin on a per-database basis; so the
-	 * source transaction's xmin doesn't protect us from data loss.  This
-	 * restriction could be removed if the source transaction were to mark its
-	 * xmin as being globally applicable.  But that would require some
-	 * additional syntax, since that has to be known when the snapshot is
-	 * initially taken.  (See pgsql-hackers discussion of 2011-10-21.)
-	 */
-	if (src_dbid != MyDatabaseId)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot import a snapshot from a different database")));
-
-	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
+	Assert(false);
 }
 
 /*
@@ -1831,7 +1484,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		if (NormalTransactionIdFollows(xlimit, recentXmin))
 			return xlimit;
 	}
-
 	return recentXmin;
 }
 
@@ -2042,13 +1694,7 @@ EstimateSnapshotSpace(Snapshot snap)
 	Assert(snap != InvalidSnapshot);
 	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = add_size(sizeof(SerializedSnapshotData),
-					mul_size(snap->xcnt, sizeof(TransactionId)));
-	if (snap->subxcnt > 0 &&
-		(!snap->suboverflowed || snap->takenDuringRecovery))
-		size = add_size(size,
-						mul_size(snap->subxcnt, sizeof(TransactionId)));
+	size = sizeof(SerializedSnapshotData);
 
 	return size;
 }
@@ -2063,51 +1709,20 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 
-	Assert(snapshot->subxcnt >= 0);
-
 	/* Copy all required fields */
 	serialized_snapshot.xmin = snapshot->xmin;
 	serialized_snapshot.xmax = snapshot->xmax;
-	serialized_snapshot.xcnt = snapshot->xcnt;
-	serialized_snapshot.subxcnt = snapshot->subxcnt;
-	serialized_snapshot.suboverflowed = snapshot->suboverflowed;
 	serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
 	serialized_snapshot.curcid = snapshot->curcid;
 	serialized_snapshot.whenTaken = snapshot->whenTaken;
 	serialized_snapshot.lsn = snapshot->lsn;
 
-	/*
-	 * Ignore the SubXID array if it has overflowed, unless the snapshot was
-	 * taken during recovery - in that case, top-level XIDs are in subxip as
-	 * well, and we mustn't lose them.
-	 */
-	if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
-		serialized_snapshot.subxcnt = 0;
+	serialized_snapshot.snapshotcsn = snapshot->snapshotcsn;
 
 	/* Copy struct to possibly-unaligned buffer */
 	memcpy(start_address,
 		   &serialized_snapshot, sizeof(SerializedSnapshotData));
 
-	/* Copy XID array */
-	if (snapshot->xcnt > 0)
-		memcpy((TransactionId *) (start_address +
-								  sizeof(SerializedSnapshotData)),
-			   snapshot->xip, snapshot->xcnt * sizeof(TransactionId));
-
-	/*
-	 * Copy SubXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		Size		subxipoff = sizeof(SerializedSnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-
-		memcpy((TransactionId *) (start_address + subxipoff),
-			   snapshot->subxip, snapshot->subxcnt * sizeof(TransactionId));
-	}
 }
 
 /*
@@ -2121,52 +1736,21 @@ Snapshot
 RestoreSnapshot(char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
-	Size		size;
 	Snapshot	snapshot;
-	TransactionId *serialized_xids;
 
 	memcpy(&serialized_snapshot, start_address,
 		   sizeof(SerializedSnapshotData));
-	serialized_xids = (TransactionId *)
-		(start_address + sizeof(SerializedSnapshotData));
-
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
-		+ serialized_snapshot.xcnt * sizeof(TransactionId)
-		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
 
 	/* Copy all required fields */
-	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	snapshot->satisfies = HeapTupleSatisfiesMVCC;
 	snapshot->xmin = serialized_snapshot.xmin;
 	snapshot->xmax = serialized_snapshot.xmax;
-	snapshot->xip = NULL;
-	snapshot->xcnt = serialized_snapshot.xcnt;
-	snapshot->subxip = NULL;
-	snapshot->subxcnt = serialized_snapshot.subxcnt;
-	snapshot->suboverflowed = serialized_snapshot.suboverflowed;
+	snapshot->snapshotcsn = serialized_snapshot.snapshotcsn;
 	snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
 	snapshot->curcid = serialized_snapshot.curcid;
 	snapshot->whenTaken = serialized_snapshot.whenTaken;
 	snapshot->lsn = serialized_snapshot.lsn;
-
-	/* Copy XIDs, if present. */
-	if (serialized_snapshot.xcnt > 0)
-	{
-		snapshot->xip = (TransactionId *) (snapshot + 1);
-		memcpy(snapshot->xip, serialized_xids,
-			   serialized_snapshot.xcnt * sizeof(TransactionId));
-	}
-
-	/* Copy SubXIDs, if present. */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
-			serialized_snapshot.xcnt;
-		memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt,
-			   serialized_snapshot.subxcnt * sizeof(TransactionId));
-	}
-
 	/* Set the copied flag so that the caller will set refcounts correctly. */
 	snapshot->regd_count = 0;
 	snapshot->active_count = 0;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index f9da9e1..33cd74c 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -10,28 +10,6 @@
  * the passed-in buffer.  The caller must hold not only a pin, but at least
  * shared buffer content lock on the buffer containing the tuple.
  *
- * NOTE: When using a non-MVCC snapshot, we must check
- * TransactionIdIsInProgress (which looks in the PGXACT array)
- * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
- * pg_xact).  Otherwise we have a race condition: we might decide that a
- * just-committed transaction crashed, because none of the tests succeed.
- * xact.c is careful to record commit/abort in pg_xact before it unsets
- * MyPgXact->xid in the PGXACT array.  That fixes that problem, but it
- * also means there is a window where TransactionIdIsInProgress and
- * TransactionIdDidCommit will both return true.  If we check only
- * TransactionIdDidCommit, we could consider a tuple committed when a
- * later GetSnapshotData call will still think the originating transaction
- * is in progress, which leads to application-level inconsistency.  The
- * upshot is that we gotta check TransactionIdIsInProgress first in all
- * code paths, except for a few cases where we are looking at
- * subtransactions of our own main transaction and so there can't be any
- * race condition.
- *
- * When using an MVCC snapshot, we rely on XidInMVCCSnapshot rather than
- * TransactionIdIsInProgress, but the logic is otherwise the same: do not
- * check pg_xact until after deciding that the xact is no longer in progress.
- *
- *
  * Summary of visibility functions:
  *
  *	 HeapTupleSatisfiesMVCC()
@@ -80,7 +58,10 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
 
 /* local functions */
-static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+static bool XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					 TransactionIdStatus *hintstatus);
+static bool CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot);
+static bool IsMovedTupleVisible(HeapTuple htup, Buffer buffer);
 
 /*
  * SetHintBits()
@@ -120,7 +101,7 @@ SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
-		XLogRecPtr	commitLSN = TransactionIdGetCommitLSN(xid);
+		XLogRecPtr		commitLSN = TransactionIdGetCommitLSN(xid);
 
 		if (BufferIsPermanent(buffer) && XLogNeedsFlush(commitLSN) &&
 			BufferGetLSNAtomic(buffer) < commitLSN)
@@ -176,6 +157,8 @@ bool
 HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -186,45 +169,10 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -258,17 +206,18 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot, &hintstatus);
+
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 
@@ -298,12 +247,13 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (!visible)
+		{
+			/* it must have aborted or crashed */
 			return true;
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
+		}
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -313,16 +263,15 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return true;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot, &hintstatus);
+	if (hintstatus == XID_ABORTED)
 	{
 		/* it must have aborted or crashed */
 		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 					InvalidTransactionId);
-		return true;
 	}
+	if (!visible)
+		return true;
 
 	/* xmax transaction committed */
 
@@ -377,51 +326,15 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
 		/*
 		 * An invalid Xmin can be left behind by a speculative insertion that
 		 * is canceled by super-deleting the tuple.  This also applies to
 		 * TOAST tuples created during speculative insertion.
 		 */
-		else if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
+		if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
 			return false;
 	}
 
@@ -461,6 +374,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -471,45 +385,15 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
+			if (IsMovedTupleVisible(htup, buffer))
+				return HeapTupleMayBeUpdated;
+			else
 				return HeapTupleInvisible;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
 		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return HeapTupleInvisible;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
 				return HeapTupleInvisible;	/* inserted after scan started */
@@ -543,9 +427,11 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				 * left in this Xmax; otherwise, report the tuple as
 				 * locked/updated.
 				 */
-				if (!TransactionIdIsInProgress(xmax))
+				xidstatus = TransactionIdGetStatus(xmax);
+				if (xidstatus != XID_INPROGRESS)
 					return HeapTupleMayBeUpdated;
-				return HeapTupleBeingUpdated;
+				else
+					return HeapTupleBeingUpdated;
 			}
 
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -589,17 +475,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			else
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return HeapTupleInvisible;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return HeapTupleInvisible;
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			if (xidstatus == XID_COMMITTED)
+			{
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetXmin(tuple));
+			}
+			else
+			{
+				if (xidstatus == XID_ABORTED)
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return HeapTupleInvisible;
+			}
 		}
 	}
 
@@ -649,17 +539,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
 
-		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-			return HeapTupleBeingUpdated;
-
-		if (TransactionIdDidCommit(xmax))
-			return HeapTupleUpdated;
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
+		{
+			case XID_INPROGRESS:
+				return HeapTupleBeingUpdated;
+			case XID_COMMITTED:
+				return HeapTupleUpdated;
+			case XID_ABORTED:
+				break;
+		}
 
 		/*
 		 * By here, the update in the Xmax is either aborted or crashed, but
 		 * what about the other members?
 		 */
-
 		if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
 		{
 			/*
@@ -687,15 +581,18 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;	/* updated before scan started */
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return HeapTupleBeingUpdated;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return HeapTupleMayBeUpdated;
+		case XID_INPROGRESS:
+			return HeapTupleBeingUpdated;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return HeapTupleMayBeUpdated;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -740,6 +637,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -753,45 +651,10 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -825,35 +688,39 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+		else
 		{
-			/*
-			 * Return the speculative token to caller.  Caller can worry about
-			 * xmax, since it requires a conclusively locked row version, and
-			 * a concurrent update to this tuple is a conflict of its
-			 * purposes.
-			 */
-			if (HeapTupleHeaderIsSpeculative(tuple))
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			switch (xidstatus)
 			{
-				snapshot->speculativeToken =
-					HeapTupleHeaderGetSpeculativeToken(tuple);
-
-				Assert(snapshot->speculativeToken != 0);
+				case XID_INPROGRESS:
+					/*
+					 * Return the speculative token to caller.  Caller can worry about
+					 * xmax, since it requires a conclusively locked row version, and
+					 * a concurrent update to this tuple is a conflict of its
+					 * purposes.
+					 */
+					if (HeapTupleHeaderIsSpeculative(tuple))
+					{
+						snapshot->speculativeToken =
+							HeapTupleHeaderGetSpeculativeToken(tuple);
+
+						Assert(snapshot->speculativeToken != 0);
+					}
+
+					snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
+					/* XXX shouldn't we fall through to look at xmax? */
+					return true;		/* in insertion by other */
+				case XID_COMMITTED:
+					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+								HeapTupleHeaderGetRawXmin(tuple));
+					break;
+				case XID_ABORTED:
+					/* it must have aborted or crashed */
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return false;
 			}
-
-			snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
-			/* XXX shouldn't we fall through to look at xmax? */
-			return true;		/* in insertion by other */
-		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
-		else
-		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
 		}
 	}
 
@@ -883,15 +750,19 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
 		{
-			snapshot->xmax = xmax;
-			return true;
+			case XID_INPROGRESS:
+				snapshot->xmax = xmax;
+				return true;
+			case XID_COMMITTED:
+				return false;
+			case XID_ABORTED:
+				/* it must have aborted or crashed */
+				return true;
 		}
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -901,19 +772,20 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
-			snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
-		return true;
-	}
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
-	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return true;
+		case XID_INPROGRESS:
+			if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+				snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
+			return true;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return true;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -942,28 +814,14 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  *		transactions shown as in-progress by the snapshot
  *		transactions started after the snapshot was taken
  *		changes made by the current command
- *
- * Notice that here, we will not update the tuple status hint bits if the
- * inserting/deleting transaction is still running according to our snapshot,
- * even if in reality it's committed or aborted by now.  This is intentional.
- * Checking the true transaction state would require access to high-traffic
- * shared data structures, creating contention we'd rather do without, and it
- * would not change the result of our visibility check anyway.  The hint bits
- * will be updated by the first visitor that has a snapshot new enough to see
- * the inserting/deleting transaction as done.  In the meantime, the cost of
- * leaving the hint bits unset is basically that each HeapTupleSatisfiesMVCC
- * call will need to run TransactionIdIsCurrentTransactionId in addition to
- * XidInMVCCSnapshot (but it would have to do the latter anyway).  In the old
- * coding where we tried to set the hint bits as soon as possible, we instead
- * did TransactionIdIsInProgress in each call --- to no avail, as long as the
- * inserting/deleting transaction was still running --- which was more cycles
- * and more contention on the PGXACT array.
  */
 bool
 HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 					   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -974,45 +832,10 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!XidInMVCCSnapshot(xvac, snapshot))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (XidInMVCCSnapshot(xvac, snapshot))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
 				return false;	/* inserted after scan started */
@@ -1054,25 +877,29 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetXmin(tuple),
+										   snapshot, &hintstatus);
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 	else
 	{
 		/* xmin is committed, but maybe not according to our snapshot */
-		if (!HeapTupleHeaderXminFrozen(tuple) &&
-			XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;		/* treat as still in progress */
+		if (!HeapTupleHeaderXminFrozen(tuple))
+		{
+			visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot);
+			if (!visible)
+				return false;		/* treat as still in progress */
+		}
 	}
 
 	/* by here, the inserting transaction has committed */
@@ -1102,12 +929,15 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		if (XidInMVCCSnapshot(xmax, snapshot))
-			return true;
-		if (TransactionIdDidCommit(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (visible)
 			return false;		/* updating transaction committed */
-		/* it must have aborted or crashed */
-		return true;
+		else
+		{
+			/* it must have aborted or crashed */
+			return true;
+		}
 	}
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
@@ -1120,25 +950,28 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 				return false;	/* deleted before scan started */
 		}
 
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
-			return true;
-
-		if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple),
+									   snapshot, &hintstatus);
+		if (hintstatus == XID_COMMITTED)
+		{
+			/* xmax transaction committed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
+						HeapTupleHeaderGetRawXmax(tuple));
+		}
+		if (hintstatus == XID_ABORTED)
 		{
 			/* it must have aborted or crashed */
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 						InvalidTransactionId);
-			return true;
 		}
-
-		/* xmax transaction committed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
-					HeapTupleHeaderGetRawXmax(tuple));
+		if (!visible)
+			return true;		/* treat as still in progress */
 	}
 	else
 	{
 		/* xmax is committed, but maybe not according to our snapshot */
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
+		visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot);
+		if (!visible)
 			return true;		/* treat as still in progress */
 	}
 
@@ -1147,7 +980,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	return false;
 }
 
-
 /*
  * HeapTupleSatisfiesVacuum
  *
@@ -1155,16 +987,22 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
  *	we mainly want to know is if a tuple is potentially visible to *any*
  *	running transaction.  If so, it can't be removed yet by VACUUM.
  *
- * OldestXmin is a cutoff XID (obtained from GetOldestXmin()).  Tuples
- * deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
- * still be visible to some open transaction, so we can't remove them,
- * even if we see that the deleting transaction has committed.
+ * OldestSnapshot is a cutoff snapshot (obtained from GetOldestSnapshot()).
+ * Tuples deleted by XIDs that are still visible to OldestSnapshot are deemed
+ * "recently dead"; they might still be visible to some open transaction,
+ * so we can't remove them, even if we see that the deleting transaction
+ * has committed.
+ *
+ * Note: predicate.c calls this with a current snapshot, rather than one obtained
+ * from GetOldestSnapshot(). So even if this function determines that a tuple
+ * is not visible to anyone anymore, we can't "kill" the tuple right here.
  */
 HTSV_Result
 HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1179,44 +1017,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
 			return HEAPTUPLE_DEAD;
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
-				return HEAPTUPLE_DEAD;
-			}
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						InvalidTransactionId);
-		}
 		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
+			if (IsMovedTupleVisible(htup, buffer))
+				return HEAPTUPLE_LIVE;
 			else
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
 				return HEAPTUPLE_DEAD;
-			}
 		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return HEAPTUPLE_INSERT_IN_PROGRESS;
@@ -1230,7 +1041,10 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* deleting subtransaction must have aborted */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 		{
 			/*
 			 * It'd be possible to discern between INSERT/DELETE in progress
@@ -1242,7 +1056,7 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			 */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 						HeapTupleHeaderGetRawXmin(tuple));
 		else
@@ -1293,7 +1107,8 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			}
 			else
 			{
-				if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+				xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+				if (xidstatus == XID_INPROGRESS)
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 							InvalidTransactionId);
@@ -1323,13 +1138,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* not LOCKED_ONLY, so it has to have an xmax */
 			Assert(TransactionIdIsValid(xmax));
 
-			if (TransactionIdIsInProgress(xmax))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			else if (TransactionIdDidCommit(xmax))
-				/* there are still lockers around -- can't return DEAD here */
-				return HEAPTUPLE_RECENTLY_DEAD;
-			/* updating transaction aborted */
-			return HEAPTUPLE_LIVE;
+			switch(TransactionIdGetStatus(xmax))
+			{
+				case XID_INPROGRESS:
+					return HEAPTUPLE_DELETE_IN_PROGRESS;
+				case XID_COMMITTED:
+					/* there are still lockers around -- can't return DEAD here */
+					return HEAPTUPLE_RECENTLY_DEAD;
+				case XID_ABORTED:
+					/* updating transaction aborted */
+					return HEAPTUPLE_LIVE;
+			}
 		}
 
 		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
@@ -1339,8 +1158,12 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 		/* not LOCKED_ONLY, so it has to have an xmax */
 		Assert(TransactionIdIsValid(xmax));
 
-		/* multi is not running -- updating xact cannot be */
-		Assert(!TransactionIdIsInProgress(xmax));
+		/*
+		 * multi is not running -- updating xact cannot be (this assertion
+		 * won't catch a running subtransaction)
+		 */
+		Assert(!TransactionIdIsActive(xmax));
+
 		if (TransactionIdDidCommit(xmax))
 		{
 			if (!TransactionIdPrecedes(xmax, OldestXmin))
@@ -1359,9 +1182,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 	{
-		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 			return HEAPTUPLE_DELETE_IN_PROGRESS;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 						HeapTupleHeaderGetRawXmax(tuple));
 		else
@@ -1451,127 +1276,99 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 }
 
 /*
- * XidInMVCCSnapshot
- *		Is the given XID still-in-progress according to the snapshot?
+ * XidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
+ *
+ * If 'known_committed' is true, xid is known to be committed already, even
+ * though it might not be visible to the snapshot. Passing 'true' can save
+ * some cycles.
  *
- * Note: GetSnapshotData never stores either top xid or subxids of our own
- * backend into a snapshot, so these xids will not be reported as "running"
- * by this function.  This is OK for current uses, because we always check
- * TransactionIdIsCurrentTransactionId first, except for known-committed
- * XIDs which could not be ours anyway.
+ * On return, *hintstatus is set to indicate if the transaction had committed,
+ * or aborted, whether or not it's not visible to us.
  */
 static bool
-XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
+XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					 TransactionIdStatus *hintstatus)
 {
-	uint32		i;
+	CommitSeqNo csn;
 
-	/*
-	 * Make a quick range check to eliminate most XIDs without looking at the
-	 * xip arrays.  Note that this is OK even if we convert a subxact XID to
-	 * its parent below, because a subxact with XID < xmin has surely also got
-	 * a parent with XID < xmin, while one with XID >= xmax must belong to a
-	 * parent that was not yet committed at the time of this snapshot.
-	 */
-
-	/* Any xid < xmin is not in-progress */
-	if (TransactionIdPrecedes(xid, snapshot->xmin))
-		return false;
-	/* Any xid >= xmax is in-progress */
-	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
-		return true;
+	*hintstatus = XID_INPROGRESS;
 
 	/*
-	 * Snapshot information is stored slightly differently in snapshots taken
-	 * during recovery.
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here).
+	 *
+	 * We can't do anything useful with xmin, because the xmin only tells us
+	 * whether we see it as completed. We have to check the transaction log to
+	 * see if the transaction committed or aborted, in any case.
 	 */
-	if (!snapshot->takenDuringRecovery)
-	{
-		/*
-		 * If the snapshot contains full subxact data, the fastest way to
-		 * check things is just to compare the given XID against both subxact
-		 * XIDs and top-level XIDs.  If the snapshot overflowed, we have to
-		 * use pg_subtrans to convert a subxact XID to its parent XID, but
-		 * then we need only look at top-level XIDs not subxacts.
-		 */
-		if (!snapshot->suboverflowed)
-		{
-			/* we have full data, so search subxip */
-			int32		j;
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
 
-			for (j = 0; j < snapshot->subxcnt; j++)
-			{
-				if (TransactionIdEquals(xid, snapshot->subxip[j]))
-					return true;
-			}
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-			/* not there, fall through to search xip[] */
-		}
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		*hintstatus = XID_COMMITTED;
+		if (csn < snapshot->snapshotcsn)
+			return true;
 		else
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
-
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
-
-		for (i = 0; i < snapshot->xcnt; i++)
-		{
-			if (TransactionIdEquals(xid, snapshot->xip[i]))
-				return true;
-		}
+			return false;
 	}
 	else
 	{
-		int32		j;
+		if (csn == COMMITSEQNO_ABORTED)
+			*hintstatus = XID_ABORTED;
+		return false;
+	}
+}
 
-		/*
-		 * In recovery we store all xids in the subxact array because it is by
-		 * far the bigger array, and we mostly don't know which xids are
-		 * top-level and which are subxacts. The xip array is empty.
-		 *
-		 * We start by searching subtrans, if we overflowed.
-		 */
-		if (snapshot->suboverflowed)
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
+/*
+ * CommittedXidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
+ *
+ * This is the same as XidVisibleInSnapshot, but the caller knows that the
+ * given XID committed. The only question is whether it's visible to our
+ * snapshot or not.
+ */
+static bool
+CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot)
+{
+	CommitSeqNo csn;
 
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
+	/*
+	 * Make a quick range check to eliminate most XIDs without looking at the
+	 * CSN log.
+	 */
+	if (TransactionIdPrecedes(xid, snapshot->xmin))
+		return true;
+
+	/*
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here.
+	 */
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
+
+	csn = TransactionIdGetCommitSeqNo(xid);
 
+	if (!COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		elog(WARNING, "transaction %u was hinted as committed, but was not marked as committed in the transaction log", xid);
 		/*
-		 * We now have either a top-level xid higher than xmin or an
-		 * indeterminate xid. We don't know whether it's top level or subxact
-		 * but it doesn't matter. If it's present, the xid is visible.
+		 * We have contradicting evidence on whether the transaction committed or
+		 * not. Let's assume that it did. That seems better than erroring out.
 		 */
-		for (j = 0; j < snapshot->subxcnt; j++)
-		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-				return true;
-		}
+		return true;
 	}
 
-	return false;
+	if (csn < snapshot->snapshotcsn)
+		return true;
+	else
+		return false;
 }
 
+
 /*
  * Is the tuple really only locked?  That is, is it not updated?
  *
@@ -1585,6 +1382,7 @@ bool
 HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 {
 	TransactionId xmax;
+	TransactionIdStatus	xidstatus;
 
 	/* if there's no valid Xmax, then there's obviously no update either */
 	if (tuple->t_infomask & HEAP_XMAX_INVALID)
@@ -1612,9 +1410,11 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 
 	if (TransactionIdIsCurrentTransactionId(xmax))
 		return false;
-	if (TransactionIdIsInProgress(xmax))
+
+	xidstatus = TransactionIdGetStatus(xmax);
+	if (xidstatus == XID_INPROGRESS)
 		return false;
-	if (TransactionIdDidCommit(xmax))
+	if (xidstatus == XID_COMMITTED)
 		return false;
 
 	/*
@@ -1655,6 +1455,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 	TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+	TransactionIdStatus hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1666,7 +1467,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's one of our txids, toplevel is also in there */
-	else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+	else if (TransactionIdInArray(xmin, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin = HeapTupleHeaderGetRawCommandId(tuple);
@@ -1677,7 +1478,8 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * cmin/cmax was stored in a combocid. So we need to lookup the actual
 		 * values externally.
 		 */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1690,34 +1492,11 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;		/* inserted after scan started */
 		/* fall through */
 	}
-	/* committed before our xmin horizon. Do a normal visibility check. */
-	else if (TransactionIdPrecedes(xmin, snapshot->xmin))
-	{
-		Assert(!(HeapTupleHeaderXminCommitted(tuple) &&
-				 !TransactionIdDidCommit(xmin)));
-
-		/* check for hint bit first, consult clog afterwards */
-		if (!HeapTupleHeaderXminCommitted(tuple) &&
-			!TransactionIdDidCommit(xmin))
-			return false;
-		/* fall through */
-	}
-	/* beyond our xmax horizon, i.e. invisible */
-	else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
-	{
-		return false;
-	}
-	/* check if it's a committed transaction in [xmin, xmax) */
-	else if (TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
-	{
-		/* fall through */
-	}
-
 	/*
-	 * none of the above, i.e. between [xmin, xmax) but hasn't committed. I.e.
-	 * invisible.
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
 	 */
-	else
+	else if (!XidVisibleInSnapshot(xmin, snapshot, &hintstatus))
 	{
 		return false;
 	}
@@ -1741,14 +1520,15 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	}
 
 	/* check if it's one of our txids, toplevel is also in there */
-	if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+	if (TransactionIdInArray(xmax, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin;
 		CommandId	cmax = HeapTupleHeaderGetRawCommandId(tuple);
 
 		/* Lookup actual cmin/cmax values */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1762,26 +1542,74 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		else
 			return false;		/* deleted before scan started */
 	}
-	/* below xmin horizon, normal transaction state is valid */
-	else if (TransactionIdPrecedes(xmax, snapshot->xmin))
-	{
-		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
-				 !TransactionIdDidCommit(xmax)));
+	/*
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
+	 */
+	if (XidVisibleInSnapshot(xmax, snapshot, &hintstatus))
+		return false;
+	else
+		return true;
+}
 
-		/* check hint bit first */
-		if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
-			return false;
 
-		/* check clog */
-		return !TransactionIdDidCommit(xmax);
+/*
+ * Check the visibility on a tuple with HEAP_MOVED flags set.
+ *
+ * Returns true if the tuple is visible, false otherwise. These flags are
+ * no longer used, any such tuples must've come from binary upgrade of a
+ * pre-9.0 system, so we can assume that the xid is long finished by now.
+ */
+static bool
+IsMovedTupleVisible(HeapTuple htup, Buffer buffer)
+{
+	HeapTupleHeader tuple = htup->t_data;
+	TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+	TransactionIdStatus xidstatus;
+
+	/*
+	 * Check that the xvac is not a live transaction. This should never
+	 * happen, because HEAP_MOVED flags are not set by current code.
+	 */
+	if (TransactionIdIsCurrentTransactionId(xvac))
+		elog(ERROR, "HEAP_MOVED tuple with in-progress xvac: %u", xvac);
+
+	xidstatus = TransactionIdGetStatus(xvac);
+
+	if (tuple->t_infomask & HEAP_MOVED_OFF)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+	}
+	/* Used by pre-9.0 binary upgrades */
+	else if (tuple->t_infomask & HEAP_MOVED_IN)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
 	}
-	/* above xmax horizon, we cannot possibly see the deleting transaction */
-	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
-		return true;
-	/* xmax is between [xmin, xmax), check known committed array */
-	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
-		return false;
-	/* xmax is between [xmin, xmax), but known not to have committed yet */
 	else
-		return true;
+	{
+		elog(ERROR, "IsMovedTupleVisible() called on a non-moved tuple");
+		return true; /* keep compiler quiet */
+	}
 }
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 7303bbe..4d16fec 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -198,12 +198,12 @@ static const char *backend_options = "--single -F -O -j -c search_path=pg_catalo
 static const char *const subdirs[] = {
 	"global",
 	"pg_wal/archive_status",
+	"pg_csnlog",
 	"pg_commit_ts",
 	"pg_dynshmem",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
-	"pg_subtrans",
 	"pg_twophase",
 	"pg_multixact",
 	"pg_multixact/members",
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 7bae090..0755ffd 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -17,16 +17,19 @@
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
  */
-typedef int XidStatus;
+typedef int CLogXidStatus;
+
+#define CLOG_XID_STATUS_IN_PROGRESS		0x00
+#define CLOG_XID_STATUS_COMMITTED		0x01
+#define CLOG_XID_STATUS_ABORTED			0x02
 
-#define TRANSACTION_STATUS_IN_PROGRESS		0x00
-#define TRANSACTION_STATUS_COMMITTED		0x01
-#define TRANSACTION_STATUS_ABORTED			0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED	0x03
+/*
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet. We don't create these anymore, but accept
+ * them in existing clog, if we've been pg_upgraded from an older version.
+ */
+#define CLOG_XID_STATUS_SUB_COMMITTED	0x03
 
 typedef struct xl_clog_truncate
 {
@@ -35,9 +38,9 @@ typedef struct xl_clog_truncate
 	Oid			oldestXactDb;
 } xl_clog_truncate;
 
-extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern void CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn);
+extern CLogXidStatus CLogGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
 extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
diff --git a/src/include/access/csnlog.h b/src/include/access/csnlog.h
new file mode 100644
index 0000000..19c59d3
--- /dev/null
+++ b/src/include/access/csnlog.h
@@ -0,0 +1,33 @@
+/*
+ * csnlog.h
+ *
+ * Commit-Sequence-Number log.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clog.h
+ */
+#ifndef CSNLOG_H
+#define CSNLOG_H
+
+#include "access/xlog.h"
+
+extern void CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn);
+extern CommitSeqNo CSNLogGetCommitSeqNo(TransactionId xid);
+extern TransactionId CSNLogGetNextInProgressXid(TransactionId start,
+												TransactionId end);
+
+extern Size CSNLOGShmemBuffers(void);
+extern Size CSNLOGShmemSize(void);
+extern void CSNLOGShmemInit(void);
+extern void BootStrapCSNLOG(void);
+extern void StartupCSNLOG(TransactionId oldestActiveXID);
+extern void TrimCSNLOG(void);
+extern void ShutdownCSNLOG(void);
+extern void CheckPointCSNLOG(void);
+extern void ExtendCSNLOG(TransactionId newestXact);
+extern void TruncateCSNLOG(TransactionId oldestXact);
+
+#endif   /* CSNLOG_H */
diff --git a/src/include/access/mvccvars.h b/src/include/access/mvccvars.h
new file mode 100644
index 0000000..66de5a8
--- /dev/null
+++ b/src/include/access/mvccvars.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvccvars.h
+ *	  Shared memory variables for XID assignment and snapshots
+ *
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/mvccvars.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVCCVARS_H
+#define MVCCVARS_H
+
+#include "port/atomics.h"
+
+/*
+ * VariableCache is a data structure in shared memory that is used to track
+ * OID and XID assignment state.  For largely historical reasons, there is
+ * just one struct with different fields that are protected by different
+ * LWLocks.
+ *
+ * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
+ * used just to generate useful messages when xidWarnLimit or xidStopLimit
+ * are exceeded.
+ */
+typedef struct VariableCacheData
+{
+	/*
+	 * These fields are protected by OidGenLock.
+	 */
+	Oid			nextOid;		/* next OID to assign */
+	uint32		oidCount;		/* OIDs available before must do XLOG work */
+
+	/*
+	 * These fields are protected by XidGenLock.
+	 */
+	TransactionId nextXid;		/* next XID to assign */
+
+	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
+	TransactionId xidVacLimit;	/* start forcing autovacuums here */
+	TransactionId xidWarnLimit; /* start complaining here */
+	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
+	TransactionId xidWrapLimit; /* where the world ends */
+	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
+
+
+	/*
+	 * Fields related to MVCC snapshots.
+	 *
+	 * lastCommitSeqNo is the CSN assigned to last committed transaction.
+	 * It is protected by CommitSeqNoLock.
+	 *
+	 * latestCompletedXid is the highest XID that has committed. Anything
+	 * > this is seen by still in-progress by everyone. Use atomic ops to
+	 * update.
+	 *
+	 * oldestActiveXid is the XID of the oldest transaction that's still
+	 * in-progress. (Or rather, the oldest XID among all still in-progress
+	 * transactions; it's not necessarily the one that started first).
+	 * Must hold ProcArrayLock in shared mode, and use atomic ops, to update.
+	 */
+	pg_atomic_uint64 nextCommitSeqNo;
+	pg_atomic_uint32 latestCompletedXid;
+	pg_atomic_uint32 oldestActiveXid;
+
+	/*
+	 * These fields are protected by CommitTsLock
+	 */
+	TransactionId oldestCommitTsXid;
+	TransactionId newestCommitTsXid;
+
+	/*
+	 * These fields are protected by CLogTruncationLock
+	 */
+	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
+} VariableCacheData;
+
+typedef VariableCacheData *VariableCache;
+
+/* in transam/varsup.c */
+extern PGDLLIMPORT VariableCache ShmemVariableCache;
+
+#endif   /* MVCCVARS_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index d829a6f..1d38423 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -105,6 +105,8 @@ typedef struct SlruSharedData
 } SlruSharedData;
 
 typedef SlruSharedData *SlruShared;
+typedef struct HTAB HTAB;
+typedef struct PageSlotEntry PageSlotEntry;
 
 /*
  * SlruCtlData is an unshared structure that points to the active information
@@ -113,6 +115,7 @@ typedef SlruSharedData *SlruShared;
 typedef struct SlruCtlData
 {
 	SlruShared	shared;
+	HTAB *pageToSlot;
 
 	/*
 	 * This flag tells whether to fsync writes (true for pg_xact and multixact
@@ -145,6 +148,8 @@ extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
 extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
 						   TransactionId xid);
+extern int SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno,
+						   TransactionId xid);
 extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
 extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
 extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 41716d7..92267be 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,20 +11,9 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
+/* these are in csnlog.c now */
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
 
-extern Size SUBTRANSShmemSize(void);
-extern void SUBTRANSShmemInit(void);
-extern void BootStrapSUBTRANS(void);
-extern void StartupSUBTRANS(TransactionId oldestActiveXID);
-extern void ShutdownSUBTRANS(void);
-extern void CheckPointSUBTRANS(void);
-extern void ExtendSUBTRANS(TransactionId newestXact);
-extern void TruncateSUBTRANS(TransactionId oldestXact);
-
-#endif							/* SUBTRANS_H */
+#endif   /* SUBTRANS_H */
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 86076de..f58b5be 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -93,57 +93,6 @@
 #define FirstBootstrapObjectId	10000
 #define FirstNormalObjectId		16384
 
-/*
- * VariableCache is a data structure in shared memory that is used to track
- * OID and XID assignment state.  For largely historical reasons, there is
- * just one struct with different fields that are protected by different
- * LWLocks.
- *
- * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
- * used just to generate useful messages when xidWarnLimit or xidStopLimit
- * are exceeded.
- */
-typedef struct VariableCacheData
-{
-	/*
-	 * These fields are protected by OidGenLock.
-	 */
-	Oid			nextOid;		/* next OID to assign */
-	uint32		oidCount;		/* OIDs available before must do XLOG work */
-
-	/*
-	 * These fields are protected by XidGenLock.
-	 */
-	TransactionId nextXid;		/* next XID to assign */
-
-	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
-	TransactionId xidVacLimit;	/* start forcing autovacuums here */
-	TransactionId xidWarnLimit; /* start complaining here */
-	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
-	TransactionId xidWrapLimit; /* where the world ends */
-	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
-
-	/*
-	 * These fields are protected by CommitTsLock
-	 */
-	TransactionId oldestCommitTsXid;
-	TransactionId newestCommitTsXid;
-
-	/*
-	 * These fields are protected by ProcArrayLock.
-	 */
-	TransactionId latestCompletedXid;	/* newest XID that has committed or
-										 * aborted */
-
-	/*
-	 * These fields are protected by CLogTruncationLock
-	 */
-	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
-
-} VariableCacheData;
-
-typedef VariableCacheData *VariableCache;
-
 
 /* ----------------
  *		extern declarations
@@ -153,15 +102,39 @@ typedef VariableCacheData *VariableCache;
 /* in transam/xact.c */
 extern bool TransactionStartedDuringRecovery(void);
 
-/* in transam/varsup.c */
-extern PGDLLIMPORT VariableCache ShmemVariableCache;
-
 /*
  * prototypes for functions in transam/transam.c
  */
 extern bool TransactionIdDidCommit(TransactionId transactionId);
 extern bool TransactionIdDidAbort(TransactionId transactionId);
-extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
+
+
+#define COMMITSEQNO_INPROGRESS	UINT64CONST(0x0)
+#define COMMITSEQNO_ABORTED		UINT64CONST(0x1)
+#define COMMITSEQNO_FROZEN		UINT64CONST(0x2)
+#define COMMITSEQNO_COMMITTING	UINT64CONST(0x3)
+#define COMMITSEQNO_FIRST_NORMAL UINT64CONST(0x4)
+
+#define COMMITSEQNO_IS_INPROGRESS(csn) ((csn) == COMMITSEQNO_INPROGRESS)
+#define COMMITSEQNO_IS_ABORTED(csn) ((csn) == COMMITSEQNO_ABORTED)
+#define COMMITSEQNO_IS_FROZEN(csn) ((csn) == COMMITSEQNO_FROZEN)
+#define COMMITSEQNO_IS_NORMAL(csn) ((csn) >= COMMITSEQNO_FIRST_NORMAL)
+#define COMMITSEQNO_IS_COMMITTING(csn) ((csn) == COMMITSEQNO_COMMITTING)
+#define COMMITSEQNO_IS_COMMITTED(csn) ((csn) >= COMMITSEQNO_FROZEN && !COMMITSEQNO_IS_SUBTRANS(csn))
+
+#define CSN_SUBTRANS_BIT		(UINT64CONST( 1<<63 ))
+
+#define COMMITSEQNO_IS_SUBTRANS(csn) ((csn) & CSN_SUBTRANS_BIT)
+
+typedef enum
+{
+	XID_COMMITTED,
+	XID_ABORTED,
+	XID_INPROGRESS
+} TransactionIdStatus;
+
+extern CommitSeqNo TransactionIdGetCommitSeqNo(TransactionId xid);
+extern TransactionIdStatus TransactionIdGetStatus(TransactionId transactionId);
 extern void TransactionIdAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index ad5aad9..75cd7d3 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -135,7 +135,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XLOG_XACT_ABORT				0x20
 #define XLOG_XACT_COMMIT_PREPARED	0x30
 #define XLOG_XACT_ABORT_PREPARED	0x40
-#define XLOG_XACT_ASSIGNMENT		0x50
+/* free opcode 0x50 */
 /* free opcode 0x60 */
 /* free opcode 0x70 */
 
@@ -334,7 +334,6 @@ extern TransactionId GetCurrentTransactionId(void);
 extern TransactionId GetCurrentTransactionIdIfAny(void);
 extern TransactionId GetStableLatestTransactionId(void);
 extern SubTransactionId GetCurrentSubTransactionId(void);
-extern void MarkCurrentTransactionIdLoggedIfAny(void);
 extern bool SubTransactionIsActive(SubTransactionId subxid);
 extern CommandId GetCurrentCommandId(bool used);
 extern TimestampTz GetCurrentTransactionStartTimestamp(void);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 66bfb77..c2f557e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -52,11 +52,6 @@ extern bool InRecovery;
  * we haven't yet processed a RUNNING_XACTS or shutdown-checkpoint WAL record
  * to initialize our master-transaction tracking system.
  *
- * When the transaction tracking is initialized, we enter the SNAPSHOT_PENDING
- * state. The tracked information might still be incomplete, so we can't allow
- * connections yet, but redo functions must update the in-memory state when
- * appropriate.
- *
  * In SNAPSHOT_READY mode, we have full knowledge of transactions that are
  * (or were) running in the master at the current WAL location. Snapshots
  * can be taken, and read-only queries can be run.
@@ -65,13 +60,12 @@ typedef enum
 {
 	STANDBY_DISABLED,
 	STANDBY_INITIALIZED,
-	STANDBY_SNAPSHOT_PENDING,
 	STANDBY_SNAPSHOT_READY
 } HotStandbyState;
 
 extern HotStandbyState standbyState;
 
-#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_PENDING)
+#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_READY)
 
 /*
  * Recovery target type.
diff --git a/src/include/c.h b/src/include/c.h
index 9066e3c..38da089 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -414,6 +414,13 @@ typedef uint32 CommandId;
 #define InvalidCommandId	(~(CommandId)0)
 
 /*
+ * CommitSeqNo is currently an LSN, but keep use a separate datatype for clarity.
+ */
+typedef uint64 CommitSeqNo;
+
+#define InvalidCommitSeqNo		((CommitSeqNo) 0)
+
+/*
  * Array indexing support
  */
 #define MAXDIM 6
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 8b33b4e..3655c79 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5034,8 +5034,6 @@ DATA(insert OID = 2945 (  txid_snapshot_xmin		PGNSP PGUID 12 1  0 0 0 f f f f t
 DESCR("get xmin of snapshot");
 DATA(insert OID = 2946 (  txid_snapshot_xmax		PGNSP PGUID 12 1  0 0 0 f f f f t f i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xmax _null_ _null_ _null_ ));
 DESCR("get xmax of snapshot");
-DATA(insert OID = 2947 (  txid_snapshot_xip			PGNSP PGUID 12 1 50 0 0 f f f f t t i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xip _null_ _null_ _null_ ));
-DESCR("get set of in-progress txids in snapshot");
 DATA(insert OID = 2948 (  txid_visible_in_snapshot	PGNSP PGUID 12 1  0 0 0 f f f f t f i s 2 0 16 "20 2970" _null_ _null_ _null_ _null_ _null_ txid_visible_in_snapshot _null_ _null_ _null_ ));
 DESCR("is txid visible in snapshot?");
 DATA(insert OID = 3360 (  txid_status				PGNSP PGUID 12 1  0 0 0 f f f f t f v s 1 0 25 "20" _null_ _null_ _null_ _null_ _null_ txid_status _null_ _null_ _null_ ));
diff --git a/src/include/port/atomics/generic.h b/src/include/port/atomics/generic.h
index 4245436..1b7323a 100644
--- a/src/include/port/atomics/generic.h
+++ b/src/include/port/atomics/generic.h
@@ -327,17 +327,7 @@ pg_atomic_read_u64_impl(volatile pg_atomic_uint64 *ptr)
 static inline uint64
 pg_atomic_read_u64_impl(volatile pg_atomic_uint64 *ptr)
 {
-	uint64 old = 0;
-
-	/*
-	 * 64 bit reads aren't safe on all platforms. In the generic
-	 * implementation implement them as a compare/exchange with 0. That'll
-	 * fail or succeed, but always return the old value. Possible might store
-	 * a 0, but only if the prev. value also was a 0 - i.e. harmless.
-	 */
-	pg_atomic_compare_exchange_u64_impl(ptr, &old, 0);
-
-	return old;
+	return *(&ptr->value);
 }
 #endif /* PG_HAVE_8BYTE_SINGLE_COPY_ATOMICITY && !PG_HAVE_ATOMIC_U64_SIMULATION */
 #endif /* PG_HAVE_ATOMIC_READ_U64 */
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 7653717..6e93a90 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -20,30 +20,14 @@ typedef enum
 	/*
 	 * Initial state, we can't do much yet.
 	 */
-	SNAPBUILD_START = -1,
+	SNAPBUILD_START,
 
 	/*
-	 * Collecting committed transactions, to build the initial catalog
-	 * snapshot.
+	 * Found a point after hitting built_full_snapshot where all transactions
+	 * that were running at that point finished. Till we reach that we hold
+	 * off calling any commit callbacks.
 	 */
-	SNAPBUILD_BUILDING_SNAPSHOT = 0,
-
-	/*
-	 * We have collected enough information to decode tuples in transactions
-	 * that started after this.
-	 *
-	 * Once we reached this we start to collect changes. We cannot apply them
-	 * yet, because they might be based on transactions that were still
-	 * running when FULL_SNAPSHOT was reached.
-	 */
-	SNAPBUILD_FULL_SNAPSHOT = 1,
-
-	/*
-	 * Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
-	 * were running at that point finished. Till we reach that we hold off
-	 * calling any commit callbacks.
-	 */
-	SNAPBUILD_CONSISTENT = 2
+	SNAPBUILD_CONSISTENT
 } SnapBuildState;
 
 /* forward declare so we don't have to expose the struct to the public */
@@ -57,10 +41,8 @@ struct ReorderBuffer;
 struct xl_heap_new_cid;
 struct xl_running_xacts;
 
-extern void CheckPointSnapBuild(void);
-
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
-						TransactionId xmin_horizon, XLogRecPtr start_lsn,
+						XLogRecPtr start_lsn,
 						bool need_full_snapshot);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
@@ -85,6 +67,7 @@ extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 					   XLogRecPtr lsn, struct xl_heap_new_cid *cid);
 extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
 							 struct xl_running_xacts *running);
-extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+extern void SnapBuildProcessInitialSnapshot(SnapBuild *builder, XLogRecPtr lsn,
+								TransactionId xmin, TransactionId xmax);
 
 #endif							/* SNAPBUILD_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132..d491a00 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -197,7 +197,7 @@ typedef enum BuiltinTrancheIds
 {
 	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
 	LWTRANCHE_COMMITTS_BUFFERS,
-	LWTRANCHE_SUBTRANS_BUFFERS,
+	LWTRANCHE_CSNLOG_BUFFERS,
 	LWTRANCHE_MXACTOFFSET_BUFFERS,
 	LWTRANCHE_MXACTMEMBER_BUFFERS,
 	LWTRANCHE_ASYNC_BUFFERS,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 7dbaa81..fdfcd7c 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -22,24 +22,6 @@
 #include "storage/proclist_types.h"
 
 /*
- * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
- * for non-aborted subtransactions of its current top transaction.  These
- * have to be treated as running XIDs by other backends.
- *
- * We also keep track of whether the cache overflowed (ie, the transaction has
- * generated at least one subtransaction that didn't fit in the cache).
- * If none of the caches have overflowed, we can assume that an XID that's not
- * listed anywhere in the PGPROC array is not a running transaction.  Else we
- * have to look at pg_subtrans.
- */
-#define PGPROC_MAX_CACHED_SUBXIDS 64	/* XXX guessed-at value */
-
-struct XidCache
-{
-	TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
-};
-
-/*
  * Flags for PGXACT->vacuumFlags
  *
  * Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
@@ -155,8 +137,6 @@ struct PGPROC
 	 */
 	SHM_QUEUE	myProcLocks[NUM_LOCK_PARTITIONS];
 
-	struct XidCache subxids;	/* cache for subtransaction XIDs */
-
 	/* Support for group XID clearing. */
 	/* true, if member of ProcArray group waiting for XID clear */
 	bool		procArrayGroupMember;
@@ -203,6 +183,9 @@ extern PGDLLIMPORT struct PGXACT *MyPgXact;
  * considerably on systems with many CPU cores, by reducing the number of
  * cache lines needing to be fetched.  Thus, think very carefully before adding
  * anything else here.
+ *
+ * XXX: GetSnapshotData no longer does that, so perhaps we should put these
+ * back to PGPROC for simplicity's sake.
  */
 typedef struct PGXACT
 {
@@ -212,15 +195,17 @@ typedef struct PGXACT
 
 	TransactionId xmin;			/* minimal running XID as it was when we were
 								 * starting our xact, excluding LAZY VACUUM:
-								 * vacuum must not remove tuples deleted by
 								 * xid >= xmin ! */
 
+	CommitSeqNo	snapshotcsn;	/* oldest snapshot in use in this backend:
+								 * vacuum must not remove tuples deleted by
+								 * xacts with commit seqno > snapshotcsn !
+								 * XXX: currently unused, vacuum uses just xmin, still.
+								 */
+
 	uint8		vacuumFlags;	/* vacuum-related flags, see above */
-	bool		overflowed;
 	bool		delayChkpt;		/* true if this proc delays checkpoint start;
 								 * previously called InCommit */
-
-	uint8		nxids;
 } PGXACT;
 
 /*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 174c537..1e54b5d 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -58,25 +58,18 @@
 extern Size ProcArrayShmemSize(void);
 extern void CreateSharedProcArray(void);
 extern void ProcArrayAdd(PGPROC *proc);
-extern void ProcArrayRemove(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayRemove(PGPROC *proc);
 
-extern void ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayEndTransaction(PGPROC *proc);
 extern void ProcArrayClearTransaction(PGPROC *proc);
+extern void ProcArrayResetXmin(PGPROC *proc);
 
-extern void ProcArrayInitRecovery(TransactionId initializedUptoXID);
+extern void ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID);
 extern void ProcArrayApplyRecoveryInfo(RunningTransactions running);
 extern void ProcArrayApplyXidAssignment(TransactionId topxid,
 							int nsubxids, TransactionId *subxids);
 
 extern void RecordKnownAssignedTransactionIds(TransactionId xid);
-extern void ExpireTreeKnownAssignedTransactionIds(TransactionId xid,
-									  int nsubxids, TransactionId *subxids,
-									  TransactionId max_xid);
-extern void ExpireAllKnownAssignedTransactionIds(void);
-extern void ExpireOldKnownAssignedTransactionIds(TransactionId xid);
-
-extern int	GetMaxSnapshotXidCount(void);
-extern int	GetMaxSnapshotSubxidCount(void);
 
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
@@ -86,8 +79,9 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
 
-extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
+extern TransactionId GetRecentGlobalXmin(void);
+extern TransactionId GetRecentGlobalDataXmin(void);
 extern TransactionId GetOldestXmin(Relation rel, int flags);
 extern TransactionId GetOldestActiveTransactionId(void);
 extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
@@ -100,9 +94,8 @@ extern PGPROC *BackendPidGetProcWithLock(int pid);
 extern int	BackendXidGetPid(TransactionId xid);
 extern bool IsBackendPid(int pid);
 
-extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
-					  bool excludeXmin0, bool allDbs, int excludeVacuum,
-					  int *nvxids);
+extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
+					  bool allDbs, int excludeVacuum, int *nvxids);
 extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
 extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);
 
@@ -114,10 +107,6 @@ extern int	CountUserBackends(Oid roleid);
 extern bool CountOtherDBBackends(Oid databaseId,
 					 int *nbackends, int *nprepared);
 
-extern void XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid);
-
 extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 								TransactionId catalog_xmin, bool already_locked);
 
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index f5404b4..80d0917 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -50,10 +50,7 @@ extern void StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid
 extern void StandbyReleaseLockTree(TransactionId xid,
 					   int nsubxids, TransactionId *subxids);
 extern void StandbyReleaseAllLocks(void);
-extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
-
-#define MinSizeOfXactRunningXacts offsetof(xl_running_xacts, xids)
-
+extern void StandbyReleaseOldLocks(TransactionId oldestRunningXid);
 
 /*
  * Declarations for GetRunningTransactionData(). Similar to Snapshots, but
@@ -69,14 +66,8 @@ extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
 
 typedef struct RunningTransactionsData
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
-	TransactionId oldestRunningXid; /* *not* oldestXmin */
-	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId *xids;		/* array of (sub)xids still running */
+	TransactionId oldestRunningXid;		/* *not* oldestXmin */
 } RunningTransactionsData;
 
 typedef RunningTransactionsData *RunningTransactions;
diff --git a/src/include/storage/standbydefs.h b/src/include/storage/standbydefs.h
index a0af678..2bc167e 100644
--- a/src/include/storage/standbydefs.h
+++ b/src/include/storage/standbydefs.h
@@ -46,16 +46,13 @@ typedef struct xl_standby_locks
  */
 typedef struct xl_running_xacts
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
 	TransactionId oldestRunningXid; /* *not* oldestXmin */
 	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
 } xl_running_xacts;
 
+#define SizeOfXactRunningXacts (offsetof(xl_running_xacts, latestCompletedXid) + sizeof(TransactionId))
+
 /*
  * Invalidations for standby, currently only when transactions without an
  * assigned xid commit.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index fc64153..bbef99b 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -57,9 +57,6 @@ extern TimestampTz GetOldSnapshotThresholdTimestamp(void);
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
-extern TransactionId RecentXmin;
-extern PGDLLIMPORT TransactionId RecentGlobalXmin;
-extern TransactionId RecentGlobalDataXmin;
 
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 074cc81..30eef00 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -58,37 +58,18 @@ typedef struct SnapshotData
 	 * just zeroes in special snapshots.  (But xmin and xmax are used
 	 * specially by HeapTupleSatisfiesDirty.)
 	 *
-	 * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
-	 * the effects of all older XIDs except those listed in the snapshot. xmin
-	 * is stored as an optimization to avoid needing to search the XID arrays
-	 * for most tuples.
+	 * An MVCC snapshot can see the effects of those XIDs that committed
+	 * after snapshotlsn. xmin and xmax are stored as an optimization, to
+	 * avoid checking the commit LSN for most tuples.
 	 */
 	TransactionId xmin;			/* all XID < xmin are visible to me */
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
 
 	/*
-	 * For normal MVCC snapshot this contains the all xact IDs that are in
-	 * progress, unless the snapshot was taken during recovery in which case
-	 * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
-	 * it contains *committed* transactions between xmin and xmax.
-	 *
-	 * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
-	 */
-	TransactionId *xip;
-	uint32		xcnt;			/* # of xact ids in xip[] */
-
-	/*
-	 * For non-historic MVCC snapshots, this contains subxact IDs that are in
-	 * progress (and other transactions that are in progress if taken during
-	 * recovery). For historic snapshot it contains *all* xids assigned to the
-	 * replayed transaction, including the toplevel xid.
-	 *
-	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
-	 * out any that are >= xmax
+	 * This snapshot can see the effects of all transactions with CSN <=
+	 * snapshotcsn.
 	 */
-	TransactionId *subxip;
-	int32		subxcnt;		/* # of xact ids in subxip[] */
-	bool		suboverflowed;	/* has the subxip array overflowed? */
+	CommitSeqNo	snapshotcsn;
 
 	bool		takenDuringRecovery;	/* recovery-shaped snapshot? */
 	bool		copied;			/* false if it's a static snapshot */
@@ -102,6 +83,14 @@ typedef struct SnapshotData
 	uint32		speculativeToken;
 
 	/*
+	 * this_xip contains *all* xids assigned to the replayed transaction,
+	 * including the toplevel xid. Used only in a historic MVCC snapshot,
+	 * used in logical decoding.
+	 */
+	TransactionId *this_xip;
+	uint32		this_xcnt;			/* # of xact ids in this_xip[] */
+
+	/*
 	 * Book-keeping information, used by the snapshot manager
 	 */
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
diff --git a/src/test/modules/mvcctorture/Makefile b/src/test/modules/mvcctorture/Makefile
new file mode 100644
index 0000000..cc4ebc8
--- /dev/null
+++ b/src/test/modules/mvcctorture/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/mvcctorture/Makefile
+
+MODULE_big	= mvcctorture
+OBJS		= mvcctorture.o
+
+EXTENSION = mvcctorture
+DATA = mvcctorture--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/mvcctorture
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/mvcctorture/README b/src/test/modules/mvcctorture/README
new file mode 100644
index 0000000..915b001
--- /dev/null
+++ b/src/test/modules/mvcctorture/README
@@ -0,0 +1,25 @@
+A litte helper module for testing MVCC performance.
+
+The populate_mvcc_test_table function can be used to create a test table,
+with given number of rows. Each row in the table is stamped with a different
+xmin, and XMIN_COMMITTED hint bit can be set or not. Furthermore, the
+xmins values are shuffled, to defeat caching in transam.c and clog.c as badly
+as possible.
+
+The test table is always called "mvcc_test_table". You'll have to drop it
+yourself between tests.
+
+For example:
+
+-- Create a test table with 10 million rows, without setting hint bits
+select populate_mvcc_test_table(10000000, false);
+
+-- See how long it takes to scan it
+\timing
+select count(*) from mvcc_test_table;
+
+
+
+If you do the above, but have another psql session open, in a transaction
+that's done some updates, i.e. is holding backthe xmin horizon, you will
+see the worst-case performance of the CSN patch.
diff --git a/src/test/modules/mvcctorture/mvcctorture--1.0.sql b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
new file mode 100644
index 0000000..652a6a3
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/mvcctorture/mvcctorture--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION mvcctorture" to load this file. \quit
+
+CREATE FUNCTION populate_mvcc_test_table(int4, bool)
+RETURNS void
+AS 'MODULE_PATHNAME', 'populate_mvcc_test_table'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/mvcctorture/mvcctorture.c b/src/test/modules/mvcctorture/mvcctorture.c
new file mode 100644
index 0000000..a89a2e6
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.c
@@ -0,0 +1,129 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvctorture.c
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/mvcctorture.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/hio.h"
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "executor/spi.h"
+#include "funcapi.h"
+#include "nodes/makefuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(populate_mvcc_test_table);
+
+Datum
+populate_mvcc_test_table(PG_FUNCTION_ARGS)
+{
+	uint32		nrows = PG_GETARG_UINT32(0);
+	bool		set_xmin_committed = PG_GETARG_BOOL(1);
+	RangeVar   *rv;
+	Relation	rel;
+	Datum		values[1];
+	bool		isnull[1];
+	HeapTuple	tup;
+	TransactionId *xids;
+	int			ret;
+	int			i;
+	Buffer		buffer;
+	Buffer		vmbuffer = InvalidBuffer;
+
+	/* Connect to SPI manager */
+	if ((ret = SPI_connect()) < 0)
+		/* internal error */
+		elog(ERROR, "populate_mvcc_test_table: SPI_connect returned %d", ret);
+
+	SPI_execute("CREATE TABLE mvcc_test_table(i int4)", false, 0);
+
+	SPI_finish();
+
+	/* Generate a different XID for each tuple */
+	xids = (TransactionId *) palloc0(nrows * sizeof(TransactionId));
+	for (i = 0; i < nrows; i++)
+	{
+		BeginInternalSubTransaction(NULL);
+		xids[i] = GetCurrentTransactionId();
+		ReleaseCurrentSubTransaction();
+	}
+
+	rv = makeRangeVar(NULL, "mvcc_test_table", -1);
+
+	rel = heap_openrv(rv, RowExclusiveLock);
+
+	/* shuffle */
+	for (i = 0; i < nrows - 1; i++)
+	{
+		int x = i + (random() % (nrows - i));
+		TransactionId tmp;
+
+		tmp = xids[i];
+		xids[i] = xids[x];
+		xids[x] = tmp;
+	}
+
+	for (i = 0; i < nrows; i++)
+	{
+		values[0] = Int32GetDatum(i);
+		isnull[0] = false;
+
+		tup = heap_form_tuple(RelationGetDescr(rel), values, isnull);
+
+		/* Fill the header fields, like heap_prepare_insert does */
+		tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
+		tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+		tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
+		if (set_xmin_committed)
+			tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup->t_data, xids[i]);
+		HeapTupleHeaderSetCmin(tup->t_data, 1);
+		HeapTupleHeaderSetXmax(tup->t_data, 0);		/* for cleanliness */
+		tup->t_tableOid = RelationGetRelid(rel);
+
+		heap_freetuple(tup);
+
+		/*
+		 * Find buffer to insert this tuple into.  If the page is all visible,
+		 * this will also pin the requisite visibility map page.
+		 */
+		buffer = RelationGetBufferForTuple(rel, tup->t_len,
+										   InvalidBuffer,
+										   0, NULL,
+										   &vmbuffer, NULL);
+		RelationPutHeapTuple(rel, buffer, tup, false);
+
+		if (PageIsAllVisible(BufferGetPage(buffer)))
+		{
+			PageClearAllVisible(BufferGetPage(buffer));
+			visibilitymap_clear(rel,
+								ItemPointerGetBlockNumber(&(tup->t_self)),
+								vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+
+		MarkBufferDirty(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
+
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
+	heap_close(rel, NoLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/mvcctorture/mvcctorture.control b/src/test/modules/mvcctorture/mvcctorture.control
new file mode 100644
index 0000000..1b5feb9
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.control
@@ -0,0 +1,5 @@
+# mvcctorture extension
+comment = 'populate a table with a mix of different XIDs'
+default_version = '1.0'
+module_pathname = '$libdir/mvcctorture'
+relocatable = true
diff --git a/src/test/regress/expected/txid.out b/src/test/regress/expected/txid.out
index 015dae3..a53ada2 100644
--- a/src/test/regress/expected/txid.out
+++ b/src/test/regress/expected/txid.out
@@ -1,199 +1,44 @@
 -- txid_snapshot data type and related functions
 -- i/o
-select '12:13:'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
  txid_snapshot 
 ---------------
- 12:13:
-(1 row)
-
-select '12:18:14,16'::txid_snapshot;
- txid_snapshot 
----------------
- 12:18:14,16
-(1 row)
-
-select '12:16:14,14'::txid_snapshot;
- txid_snapshot 
----------------
- 12:16:14
+ 12:0/ABCDABCD
 (1 row)
 
 -- errors
-select '31:12:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "31:12:"
-LINE 1: select '31:12:'::txid_snapshot;
-               ^
-select '0:1:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "0:1:"
-LINE 1: select '0:1:'::txid_snapshot;
-               ^
-select '12:13:0'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:13:0"
-LINE 1: select '12:13:0'::txid_snapshot;
-               ^
-select '12:16:14,13'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:16:14,13"
-LINE 1: select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
+ERROR:  invalid input syntax for type txid_snapshot: "0:0/ABCDABCD"
+LINE 1: select '0:0/ABCDABCD'::txid_snapshot;
                ^
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
-                                                                snap                                                                 
--------------------------------------------------------------------------------------------------------------------------------------
- 12:13:
- 12:20:13,15,18
- 100001:100009:100005,100007,100008
- 100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131
-(4 rows)
+     snap      
+---------------
+ 12:0/ABCDABCD
+(1 row)
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
- txid_snapshot_xmin | txid_snapshot_xmax | txid_snapshot_xip 
---------------------+--------------------+-------------------
-                 12 |                 20 |                13
-                 12 |                 20 |                15
-                 12 |                 20 |                18
-             100001 |             100009 |            100005
-             100001 |             100009 |            100007
-             100001 |             100009 |            100008
-                100 |                150 |               101
-                100 |                150 |               102
-                100 |                150 |               103
-                100 |                150 |               104
-                100 |                150 |               105
-                100 |                150 |               106
-                100 |                150 |               107
-                100 |                150 |               108
-                100 |                150 |               109
-                100 |                150 |               110
-                100 |                150 |               111
-                100 |                150 |               112
-                100 |                150 |               113
-                100 |                150 |               114
-                100 |                150 |               115
-                100 |                150 |               116
-                100 |                150 |               117
-                100 |                150 |               118
-                100 |                150 |               119
-                100 |                150 |               120
-                100 |                150 |               121
-                100 |                150 |               122
-                100 |                150 |               123
-                100 |                150 |               124
-                100 |                150 |               125
-                100 |                150 |               126
-                100 |                150 |               127
-                100 |                150 |               128
-                100 |                150 |               129
-                100 |                150 |               130
-                100 |                150 |               131
-(37 rows)
+ txid_snapshot_xmax 
+--------------------
+                 12
+(1 row)
 
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
- id | txid_visible_in_snapshot 
-----+--------------------------
- 11 | t
- 12 | t
- 13 | f
- 14 | t
- 15 | f
- 16 | t
- 17 | t
- 18 | f
- 19 | t
- 20 | f
- 21 | f
-(11 rows)
 
 -- test bsearch
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
- id  | txid_visible_in_snapshot 
------+--------------------------
-  90 | t
-  91 | t
-  92 | t
-  93 | t
-  94 | t
-  95 | t
-  96 | t
-  97 | t
-  98 | t
-  99 | t
- 100 | t
- 101 | f
- 102 | f
- 103 | f
- 104 | f
- 105 | f
- 106 | f
- 107 | f
- 108 | f
- 109 | f
- 110 | f
- 111 | f
- 112 | f
- 113 | f
- 114 | f
- 115 | f
- 116 | f
- 117 | f
- 118 | f
- 119 | f
- 120 | f
- 121 | f
- 122 | f
- 123 | f
- 124 | f
- 125 | f
- 126 | f
- 127 | f
- 128 | f
- 129 | f
- 130 | f
- 131 | f
- 132 | t
- 133 | t
- 134 | t
- 135 | t
- 136 | t
- 137 | t
- 138 | t
- 139 | t
- 140 | t
- 141 | t
- 142 | t
- 143 | t
- 144 | t
- 145 | t
- 146 | t
- 147 | t
- 148 | t
- 149 | t
- 150 | f
- 151 | f
- 152 | f
- 153 | f
- 154 | f
- 155 | f
- 156 | f
- 157 | f
- 158 | f
- 159 | f
- 160 | f
-(71 rows)
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
  ?column? 
@@ -208,98 +53,45 @@ select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
  f
 (1 row)
 
+/*
 -- test 64bitness
-select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
-                            txid_snapshot                            
----------------------------------------------------------------------
- 1000100010001000:1000100010001100:1000100010001012,1000100010001013
-(1 row)
 
+select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
 select txid_visible_in_snapshot('1000100010001012', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- f
-(1 row)
-
 select txid_visible_in_snapshot('1000100010001015', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- t
-(1 row)
 
 -- test 64bit overflow
 SELECT txid_snapshot '1:9223372036854775807:3';
-      txid_snapshot      
--------------------------
- 1:9223372036854775807:3
-(1 row)
-
 SELECT txid_snapshot '1:9223372036854775808:3';
-ERROR:  invalid input syntax for type txid_snapshot: "1:9223372036854775808:3"
-LINE 1: SELECT txid_snapshot '1:9223372036854775808:3';
-                             ^
+
 -- test txid_current_if_assigned
 BEGIN;
 SELECT txid_current_if_assigned() IS NULL;
- ?column? 
-----------
- t
-(1 row)
-
 SELECT txid_current() \gset
 SELECT txid_current_if_assigned() IS NOT DISTINCT FROM BIGINT :'txid_current';
- ?column? 
-----------
- t
-(1 row)
-
 COMMIT;
+
 -- test xid status functions
 BEGIN;
 SELECT txid_current() AS committed \gset
 COMMIT;
+
 BEGIN;
 SELECT txid_current() AS rolledback \gset
 ROLLBACK;
+
 BEGIN;
 SELECT txid_current() AS inprogress \gset
-SELECT txid_status(:committed) AS committed;
- committed 
------------
- committed
-(1 row)
 
+SELECT txid_status(:committed) AS committed;
 SELECT txid_status(:rolledback) AS rolledback;
- rolledback 
-------------
- aborted
-(1 row)
-
 SELECT txid_status(:inprogress) AS inprogress;
- inprogress  
--------------
- in progress
-(1 row)
-
 SELECT txid_status(1); -- BootstrapTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(2); -- FrozenTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(3); -- in regress testing FirstNormalTransactionId will always be behind oldestXmin
- txid_status 
--------------
- 
-(1 row)
 
 COMMIT;
+
 BEGIN;
 CREATE FUNCTION test_future_xid_status(bigint)
 RETURNS void
@@ -311,14 +103,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
-NOTICE:  Got expected error for xid in the future
- test_future_xid_status 
-------------------------
- 
-(1 row)
-
 ROLLBACK;
+*/
diff --git a/src/test/regress/sql/txid.sql b/src/test/regress/sql/txid.sql
index bd6decf..6775e04 100644
--- a/src/test/regress/sql/txid.sql
+++ b/src/test/regress/sql/txid.sql
@@ -1,32 +1,22 @@
 -- txid_snapshot data type and related functions
 
 -- i/o
-select '12:13:'::txid_snapshot;
-select '12:18:14,16'::txid_snapshot;
-select '12:16:14,14'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
 
 -- errors
-select '31:12:'::txid_snapshot;
-select '0:1:'::txid_snapshot;
-select '12:13:0'::txid_snapshot;
-select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
 
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
 
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
-
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
@@ -35,7 +25,7 @@ where nr = 2;
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
@@ -43,6 +33,7 @@ select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
 select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
 
+/*
 -- test 64bitness
 
 select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
@@ -92,8 +83,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
 ROLLBACK;
+*/

#107

Amit Kapila

amit.kapila16@gmail.com

over 8 years ago

In reply to: Alexander Kuzmenkov (#106)

Re: Proposal for CSN based snapshots

On Tue, Aug 1, 2017 at 7:41 PM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Hi all,

So I did some more experiments on this patch.

* I fixed the bug with duplicate tuples I mentioned in the previous letter.
Indeed, the oldestActiveXid could be advanced past the transaction's xid
before it set the clog status. This happened because the oldestActiveXid is
calculated based on the CSN log contents, and we wrote to CSN log before
writing to clog. The fix is to write to clog before CSN log
(TransactionIdAsyncCommitTree)

* We can remove the exclusive locking on CSNLogControlLock when setting the
CSN for a transaction (CSNLogSetPageStatus). When we assign a CSN to a
transaction and its children, the atomicity is guaranteed by using an
intermediate state (COMMITSEQNO_COMMITTING), so it doesn't matter if this
function is not atomic in itself. The shared lock should suffice here.

* On throughputs of about 100k TPS, we allocate ~1k CSN log pages per
second. This is done with exclusive locking on CSN control lock, and
noticeably increases contention. To alleviate this, I allocate new pages in
batches (ExtendCSNLOG).

* When advancing oldestActiveXid, we scan CSN log to find an xid that is
still in progress. To do that, we increment the xid and query its CSN using
the high level function, acquiring and releasing the lock and looking up the
log page for each xid. I wrote some code to acquire the lock only once and
then scan the pages (CSNLogSetPageStatus).

* On bigger buffers the linear page lookup code that the SLRU uses now
becomes slow. I added a shared dynahash table to speed up this lookup.

* I merged in recent changes from master (up to 7e1fb4). Unfortunately I
didn't have enough time to fix the logical replication and snapshot import,
so now it's completely broken.

I ran some pgbench with these tweaks (tpcb-like, 72 cores, scale 500). The
throughput is good on lower number of clients (on 50 clients it's 35% higher
than on the master), but then it degrades steadily. After 250 clients it's
already lower than master; see the attached graph. In perf reports the
CSN-related things have almost vanished, and I see lots of time spent
working with clog. This is probably the situation where by making some parts
faster, the contention in other parts becomes worse and overall we have a
performance loss.

Yeah, this happens sometimes and I have also observed this behavior.

Hilariously, at some point I saw a big performance
increase after adding some debug printfs. I wanted to try some things with
the clog next, but for now I'm out of time.

What problem exactly you are seeing in the clog, is it the contention
around CLOGControlLock or generally accessing CLOG is slower. If
former, then we already have a patch [1]https://commitfest.postgresql.org/14/358/ to address it.

[1]: https://commitfest.postgresql.org/14/358/

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#108

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 8 years ago

In reply to: Amit Kapila (#107)

Re: Proposal for CSN based snapshots

What problem exactly you are seeing in the clog, is it the contention
around CLOGControlLock or generally accessing CLOG is slower. If
former, then we already have a patch [1] to address it.

It's the contention around CLogControlLock. Thank you for the pointer,
next time I'll try it with the group clog update patch.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#109

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

over 8 years ago

In reply to: Alexander Kuzmenkov (#108)

3 attachment(s)

Re: Proposal for CSN based snapshots

Here is some news about the CSN patch.

* I merged it with master (58bd60995f), which now has the clog group
update. With this optimization, CSN is now faster than the master by
about 15% on 100 to 400 clients (72 cores, pgbench tpcb-like, scale
500). It does not degrade faster than master as it did before. The
numbers of clients greater than 400 were not measured.

* Querying for CSN of subtransactions was not implemented in the
previous version of the patch, so I added it. I tested the performance
on the tpcb-like pgbench script with some savepoints added, and it was
significantly worse than on the master. The main culprit seems to be the
ProcArrayLock taken in GetSnapshotData, GetRecentGlobalXmin,
ProcArrayEndTransaction. Although it is only taken in shared mode, just
reading the current lock mode and writing the same value back takes
about 10% CPU. Maybe we could do away with some of these locks, but
there is some interplay with imported snapshots and replication slots
which I don't understand well. I plan to investigate this next.

I am attaching the latest version of the patch and the graphs of
tps/clients for tpcb and tpcb with savepoints.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

savepoints.pngimage/png; name=savepoints.pngDownload

�PNG


IHDR�d�?8PLTE����s___������???�������� |�@��������������������������p��`��@��@���`��`��@��@��Uk/�P@������������ �������k�����z��r�E����P�������������p��.�W"�"�d����������������U��������22������������������fffMMM333@�����**�����@��0`�����@�� Ai����@����������������$tRNS@��f	pHYs���+ IDATx�����:F���8���u�3��Kh�CI1m��>���i���������W~�1@� �d�1@� �d�1@� �d�1@� �d�1@� �d�1@� �d�1@� �d�1@��c������z��rl���[�u�Z>v��"��<�����g��.�o�����y��kQ�	8�=����������y�-�GY�������q�m�@� �c��@���i��,�/��K�����[� {\(W����u���@� ��!V�w)Akn��n�����>+�v�x�1�Lt�����g �s���6v�.���(o�����W����]�_�d�� ��]t�P�y�RjS�oC��:�P�eS�]��^�zZj���' �3�i����� v��sh��w ����A0�g����d�-��@e�%�U����m�
�r��rh.
��~�=����%���n��y��r�l^�E���A�gG�����I�� �9&��W
��_�d����������~M^#�m�� {�������L����E�� � {������{����VHo@�*9p@��
@�6�H���i^��|�b��T��U����q���.^_�GQ�_�d��������S��u9�4�,�����'~�=�P��	����b��7i���}���gf���%�nb����Z���2`�$�rN�C
 �d�1@� �d�1@� �d�1@� �d�1@� �d�1@� �d�����!�%Mm��������)�=
�������{@���5�G���s���a����� �c3x�A� ����vs�cD h���?Tz���1"�L���vs�cD ������*����H��}o7�>�@����vs�c�1�� �/�m��/(��=p#/*��r"��
��n(2���(�k��:m��"*�$a/��!u��
E&T�!{9�l��[A.��!u��
E&T�LB�D���i7�P�F����!u��
E&T��� �_xW��
E&��������L RPxA2"��7���Pd"7�R@� ��HF��LPH�(2���l) c��A�H$#�
E&(�@�H��D��1@� B$���"�R �L$�F"[
� c!�qC�	J)E&q#�-�d�1��d����%��"�����B2�@�D2B�Pd�@
@��D�HdK!�d ���~��L$�F"[
��~���y�@����@�A���Q�@��� ���%��"�����BD��/Qa�@y�LU7�Mm����t�B���@�������u��5���D��6/9��3��g`L�/hM�<�����n:m��p�_���y��I����<����OA(^����t�6a�~�J��"U��n^���u�����A�h���]�S$����t�N��x�n��m^&r*fP��Vc._v����vz��s�pC���`Vt!l��Cz���m������U�B���t�6�-�>
�$���"9��f��FZ$�?��"Y	�0�(�Z@xp���Ey�����O�4��@m?�/�����N��yL�:��� ���D^%@�x6 ��������DVP7��-jS
��@��F�!�j:��t9�*\�Q#Ew`~��� ��lM����LdUPd"7�R@@�:������z!<������e�@����
B���2�H�����A �@ !�l-�]Il�2A	 ���>���y�@����V�@��U8�!�qC�	J)��=�'�?��"��@N��|��|�oX/��7���BV������� �#��pn�����=�d����%��"�����B>�	���g0����UH$#�
E&(������pC�	@
�
��@�D2B�Pd�@
@��D�HdK!�d ��l���C$�y���B���?�n��	@
M%�A��p	h�`�v�""��LPH�(2���l)�R��t�\	we B��N
�m^&(�P"go&�n��	@
%�u���VS��iASS7�N{�E����mXP���P�i/QQ��NL��iIl�0�S	�<#�x��4���T�7Me�M�m�B����L�$���P~�)������r���mt���"f\�������S����vz__�$�"�[�`��>���N��@��D�HdKi����B�s�^��N�Pd"7�RZ���A���e����N����x��y�[,�B���t�6�5PT�T�L�� ��7���(��:��Hd�
E&r��=����'<8�����x�z�����4�2��;
#Z ��@�M��ff"� ��������DN��@1�E�� F�@�7I$#�
E&r*��(2���l) c��A�H$#�
E&(�8'��������L R���
����@� {��"!n(2A	 �	8��D��Pd���`��I�lr�I~�d����%��"�����B��q��8�g�
�`�O ��7����;��l}9��7�@��r`/@� B$���"�R �L$�F"[
� c!�qC�	J)E&q#�-�d�1��d����%��"�����B2�@�D2B�Pd�@
@��D�HdK!�d D"!n(2A	 ��D"n$����A2"��7���Pd"7�RZ��5�1u�8�t���E� �	@=,�?i:m��0+
;Z`�6M��5��]S��@�i[P(2���l)eP���M��[S�6���z8��av��Ld(���o����K	��N�F{	����G(��?��"�������A���f���\�4�n�i�(/v�=.`� �$�i��������>���N�F�����:/�	@��`��n:m�%�a�F���|Fp���+T	��$^>%��~XHl�2�[	��&�	�=�h�����w.�l���*��^������`F���Ra	�����D����Hn����L�H���	�H��D��"Xo�m���Dv�4L�%�u-/�G��v��
AB�GB��Pg7���	��D"n$����A2"��7���Pd"7�R@� ��HF��LPH�(2���l) c��A�H$#�
E&(�@�H��D��1@� B$���"�R �L$�F"[
� c!�qC�	J)E&q#�-�d�1��d����%��"�����B2�`�6[_N$#�
E&(��2� �S�pC�	@� F�-�H`����HF��LPH�(2���l)`
H`�R���$���"�R��	x0Hd�
E&)NM��tAvA ]�}�"!n(2A	 �Pd"7�R���+S���������X�i/�r`/z��L����V}���� ^��3���|�������q�2������Pd"7�Rj�6�y�4��
c������R�����D�HdKi��T�v3���4�M�mc	p��
 �h��}��>���N���EjS�I�!���\�O
��N����_�t�v��
@b����J����I��4�t2������9����]���~�(�u^&r�W�?���m��B���t�N�GWCM��i	@�o8�Pt�;
W�E�1�/�I�5%������w����e"�����}0�=���k�����^_����\B������
^{�1&>��@p��������J
 �h�����������E�� `;���
|N���?�+�\����$��/m�a"�1�G�<
x��@'�n���n���de"+x��&]~9�F���k��G�|*����D �W
bE�d����� dkW@"�m&�*B��l.�N���0�H���6���  D�l��!�D�4LPH�\u�"��:
������A���YJ/��2���A�����BL��L�h�Nd���� �\����D�� ��P�`g��|�Z��,#t���0a"�m&(��\9�#�l�a���N�s�>�}�?v�
�' ����e���Hb��	J)�'��NLd�N� ��L@m�;��0��|���G�,�7x���w��&B�F~���B��M~�?�8�'�[�a�b����|�B��A���8+�d����%�G�<�$�?��"�������@ ?�7��?��7������g������� �#��2CH����9�2�8�����HF��L�/JS�o
��
��8�"�����:%��������Bh��9�%d�)���o�L�fn2+%N���K��)�L�����@1V�����^,����L�/����L�4Z����h�����u\/^$J�)�G����)j������:�/�n723��(�iec�����Y32-i�����N{����u������v#7���.&�gf9P���P�i[��^)�m�`��S������w+b�Ik�g�M5��v�i���^����(k����r�VX���v�i;�Z���������;�&���aB	 �%�x��4�M��t.��6���{'�[�a"g�"������v:]U�h
��P$�1U;f��xBnq�������HD�������E��T����!��t�N�k��E@b��	�%��������Q��+���A�n���m���&`����zh�i��5������y�	@����������Iz��
�*"��E�H\$z
�������*�������^��u�\�w�{g^^�.~9S��u9�C_Q�|�w���O�;�9�	�S�n�kB�@K��&��X;+�����&�tiq<S�"��������"u�����L5Dp<�D��'��~��F~�����|��D��.�mX�������4�3�|-P��Vu;����!Hl��
���C�0j�
`�vu��@� B���]��z���������WA��k8*k�;E�����Y����Wq����|�G��@`�	�5��d@�H!#�(����S���	�W����~����@�h�	J�}��K#?���A�Hd�
E&"�5	I,�	p1@� B$�7���*74���7qb�P��M R������4@��*�\,D"�A��
�qC�	�%@4 �&V��"�qC�	@��K�6W�@XVfR��<�����@+["Z�8:,��
�&(���@�@�n�r�������y��-�
 9��?,T
�@B���]F�;�@7[�����}�^1�H���� ���?�k����J����.@8�,VX��/8
�L�s`pm�~ �7[����aX�������@��ZV=�{@�
 (ykl(E���k��@�D2BIn�_&�m��@�Hd4�F��P�<qm�_�@ ��1�@�;��	!����>���dA������&,��Z8CHxK	�\G��	;	%��@�X�A�_�J�������9�t�T��o�lb+>L���G��sx5����N{�8{��dwk�g64	@c����j��*�h:m%@��	;���B���(��T��
��������m:m-�6a	�6W�a�P$�gO������m:m-%���%���D�T�@�*�'�"�������A��m�M<3,���
��^��$����
�����QS�t����Mf�%�����`6P�6���"��KP�*�=Z��������v�i��)N	@����5C���D�[*Z������m�2��?DN�D�a
O%�`��J`�58���TU/���e�^��_VQ�\�������sd�1&>
�0`���z��w���EC_���L|��A{�����8��-u���D ������R�Q"#�Ov���J�g�g��
� Z ���|�b=��_c#'@�E�$y9�� �����J!�-�z ������?�{����X��D��@���e�%��!
_��
��=�<�~�Y|�W+�g�5_����\l����Z����R��q����=�r�����@[
P�A���U��_����R&0.�"��07�
d �-�@�HdH��HaK=H����O
�i�v���d%��d����"���<b'�-�����g�P�������3���O�(����d6R��p�����[��6V�:[�(�Il-k��|��$��!��V�������c��l) (�fe`����&
��8N�#� ~.����rj0����	oBph�0�-�q�93��n}����&�������S��pb�#H &�0'�Oa������@��$#��\>�Il��X/�W�i�/	�B�"�]�l8?.�����k_�L\� B��L ��#�a������Qd,I�����f�Y/�7��,���A��Q��yjP��8:J������B����~9v��<�#6+i���* ���K�
�O
�j��b�PHf\���S�?Z�������w�R���*�c?�����T�o��`��X�'�'v2J!/��{�D��G����@A��q�(����O��!
�Y���@�N���)�4���'��_�ZM�m�����>Y�"3X�"����Q+;���b�enIqN����� ������d7c>'_�?�\E�@��r|]��4�������W*���2@�:r_W%P��M�m<�/�=����y
@vI��knE�oX������_���8����f��@��������������N{������<wv|]��n�`�����J3p�}�>�t�V��"������}u���	c�h���&>�I��x��4}�5��]S��@�i[\ �q|=>0����X�<#��[S�����M�m��2�������_pz`Px�@�<C�������K	��N�����N��p������<���}�O^�`��>���N���?0f\@mt�����Ao�pm0Fp5���"l7����������A��k8>0�2T�=��5��3�X��M�m����a`P�,�y�bl��6�������DX2.[~��������h��V����& ����qEgx�W�
��H��Em�N(~��� \�'���c�K���l>�A����l��|u�>_6����-��"'��--�������1�i���a"��e�o����=����p#z��`e����p�L�*L�%800x�Fp�7�������?�����m6h������@��k�6N}1��W�q�H�0��$����$Sx$��8/IDAT�����L��2(�S����Tu�N���D5�C�	@�����
��
��Z ,�r��@���x�>����oX�;T��EKF��i��F&����-{�� B��.1q,��K%n�7!*+�|���`�T��fe�9���;���q@
�E&����^N������2.����p���������I��-��������	,Yx �N{�z+' �U�WFx������*���6���n������������5_�iK����w�q�z��,�.�H�
��R�n#�"����	_��
���
@�D�qX)����[J�
�8�+#����Y�
��,@��HFx��'N�*[*�	@�D�������
���X�S*�
|70H��"���Wn�
��h�*	o)a!�~������sd��,������j����$��	k���X�� B$�*r�	_oT+�hK�"��A�v����X� ��-u����7���
�j9{��@�D2B�n��z�T�g��
����������
o���
!��o�� ��
ua� �����`�� B!���q��C{�g�_��: D"�C*n�z��� �-�@�8��F�nYp������'�=�YV!��d���q@|�W�?�-�������|�,p� /�7�w���n�'Uh���+c�[�^��������^���{���� B��i�%u��5���{����HF���'�=����Q	P�M�x7c��5U�<�W�"l7�����*7�����W%	�H;��A	
c������?+e�g�NV+��������7��6��
+x��S��m�`��>���N��4��.�-w�m��	O��||��O(��#��J�j��KS�6��
���z7|�f���/�W�#C��c�W��������S���Y�	�(��x"o
���v�i� p�g���/�w5!�*���TC.��;k�������2�.s�?�&��v�_	@�h��86��{S)h�Y�K�=w^�d^��eSdN�f�:�So-[��;�e�����4J�	�<��;
�94{���C��l'+Zg=������n"�_x������3���c�~p�����g���X�|S��M6~e�
7o��rU��
����h����?����"�����}q}tY�0o�cxd�<�"�^�=�doQ�
A��5��d����|u�o\o/Zt�E��(2���M|?`��582����O�����;�C4s`��������A���F&VN5:<q}t ���F&����" 1<q}xJ;�`?�BOF��D"nh�R�`?8'	B����D"n��RG���C2�@U�j�����B�Hd�
E&)(� ��HF��LPH�(2���l) c��A�H$#�
E&(�@�H��D��1@� B$���"�R �L$�F"[
� c!�qC�	J)E&q#�-�d�1��d����%��"�����B2�@�D2B�Pd�@
@��D�HdK!�d D"!n(2A	 ��D"n$����Q#es��bm����9�t�/j�t�����P�i[�Y��d����DV%@uk����j������N�Pd"7�Rj��L�bP�����M��tp�"F�4�����f������W 	�
��}lo7������:$���"9�������m:m�� �� ��7��Y��}Ho7������$�z��m7�����j$ �!d%@<%�)e[W�z��4#�o/g��_�z�C�x}{��� ���s���u%+A����G�Ym��#��e�h���[MM�l����p0���-^i��m�LQ��������&�s��g������,�0C�D7HU	��c��o`]���W�n�	����|���MF�d:������	���]^2��.
�s��W��fuo�o�����i�kQ�5���WO�X���������9�e��r�V���\�}��X~��~�j9(l5���C����B.@�L�:�Ev9p@�i{��r|�/�l��7����X��i�����R�������
����T�=��w�Q{�}/���rm�b��w�s�#3�\�?�~�S�
@d7	�����(�g~S�o^��)�A�n���=�Q���vcc��4�zZv�x3|^�}���|�b��\������>�C�Y}/���r5���S=�Q��t�#�l�?�~�����]��V�~3�G�#�b�c��fL���KV����q��>�����+���CGsEz�G��������,��ut��E9���y�Sw�����C��^������i*D;���z<�N������o���������r���z��u����Oo|}�Cx������
�j�!X��e�[���F��������k��T45� h���:�]�[���P��s���
�����������Y�W�����������]�O�Ns�t����G���* D���������v�Ec���,m��\��Z�c�x%��#��k�0,��,�~����L4g�y�;N(�"E��n�m��l^����<D���'	_�q��n���[+���8��� ��q�p�`g��������~��)����Y\,U[#=���_/�\���ku��X�g����t0���U�5qP���f�?���p�~��A��-��N��������?N��y�x�\��Z��`�}�����:X���	`Sn�~@=)��Y��LX���E���~���>\����m��`�5��|wb����}�{Y�n����]��8P7\��6Pz���������i����@�l��s����oA	 D�0�ck�����tM���s�F���N���}�2�E��
���c��=��*<}��������b���x�����A@�H��'�?Z��������#�[\���\tb���W��������Z[�y���$h���E��8_����!��=s�Z�HA���'��!S�!T]�-�A�F�)�@��1�B���
[�h����w�d����D	���A��3���@J�@ R|�
x���-�;,���o���F����0�"��u��r����~F>��?P�������z���j�+�o�=P�7�@J�����~"���/����`����m��	kG�2�[��":�����`�������s��Z�)p�Z0��-�
��[4*�����x7B0���a=�o�������S,oTT��-:\����bykSkpe����!(f�7����g=`�����c�x{��.���td���wy3������3���wO�g�i�Pa�����Y<�z���s��������m>\��B�s��}&[���O��&��5G��`k=`p�9���j����d���8(����9C���M6>�,N.q����5;�}������f_<VpV���8V��Y��~2c���O|D�	@`�x�<{0����
�_�;���������}`��~��'��J�&�#[�7��`?��af=V�h�<
���=4p�b'2@/��b�����m����X�m���9�>�]>h+���l|s�@��W��������1~��~+x�,��T`;������+��Y���^��3��O�[���G����)��i���}=�#��E	 �B��Af�g��{���+�3p�����z��������[+vr&`���>�^E7���f�x��X�'�7�k��#���
�w�^���DYd ���������N�/0c��J���W@���8M7`������y��0��CZp��@\��HD������M����|��	�� %�1�a�p{N��	1Dk��6pC�	�nh��0�R�
�PdB���.�A���i7�P��F���~��uo723��
��v6G���i7�P����M�Ff&T��Z("0���n)��?^>
6��{d�(��@��;d�(��*`�r`�A�l��P*p@� �d�1@� �d�1@��(��	�wU�3U6�jy�����l�;&��i���E�n8&B�Q���1�xQ,n��
@�+���H��s��oS�	�1O��4��pLtc���������������	@�k��*I��nM9���[��m�rG�D�����m�cC7����u�6�
g���n<^7�i��Ft�n!�o$nj������u��+�����F;�s2���0n���9���nD'!�f	�������o1S� ���i�rcf"�x���3����r#:y�PK�M����3�.��}�rcf"��]�CzQ�3�+K�:����`���'0�6�j��I��l��3���;��C~�����n����w��p��e���r#:�}j���;*{�)�X}��������.������k�����o!�h��������W��v��)eQ��&��|��35��Kw$���t�n�L�q�??��g�ba#�C`����K=�7�]w�iMU�]	���������UzQm��1�H�
@�`�z��(��io�im07f&F��x����X��l����9Rf%@�A��a��>�\&fb���������������Z3�����J5�A�+�?�%���� ����L,�3���v����)����O"�����3���<8p,b�40�#d�=��������Y7z��9�97�c"�c�
<0�#c�
�n��_���T��S�w�Jy7f�~S�^�_���y��65����,M��H�p�S�
�D7�]������?�m#��������(���c:UM���e���-`�2��#jb<���a����B��StK5����1Z��~�F��� E�����Y�$g��{PX&\w$MLi����n���S��)���F�<����  �1@� �d�1@� �d�1@� �d�1@� �d�1@� �d�1@� ��*��-�������_�H����qV��3@%��jc^���o�@?����G�z��'A�D^J���@� ��5���h^��b\Xt���������[�� ��6V�.���(o���0\X>�P��Dm ���*KjS�o}���g�l*�8e�}�)M}�J�� ��9j/�1CL������T��l�#�:��-S�Pv!?zi
��/�%�80���ZX��j��!@� �e{��G�+y�Z��}�*�W�5��(�^��6hh<����A�Y�
��i�!�#�3�Z@`����*S5O}(�z�p�i��W�E=�4u ������L@3[8�i�O�8�Vn�t
;�}f��\,��U']�j�� c��A2� c��A2� c��A2� c��f$�6�IEND�B`�

tpcb.pngimage/png; name=tpcb.pngDownload

csn-v7.patchtext/x-patch; name=csn-v7.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 641b3b8f4e..6a23fd32be 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17662,10 +17662,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
    </indexterm>
 
    <indexterm>
-    <primary>txid_snapshot_xip</primary>
-   </indexterm>
-
-   <indexterm>
     <primary>txid_snapshot_xmax</primary>
    </indexterm>
 
@@ -17712,11 +17708,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
        <entry>get current snapshot</entry>
       </row>
       <row>
-       <entry><literal><function>txid_snapshot_xip(<parameter>txid_snapshot</parameter>)</function></literal></entry>
-       <entry><type>setof bigint</type></entry>
-       <entry>get in-progress transaction IDs in snapshot</entry>
-      </row>
-      <row>
        <entry><literal><function>txid_snapshot_xmax(<parameter>txid_snapshot</parameter>)</function></literal></entry>
        <entry><type>bigint</type></entry>
        <entry>get <literal>xmax</literal> of snapshot</entry>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d20f0381f3..6f57ca3ba8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2023,8 +2023,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	if (all_dead)
 		*all_dead = first_call;
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = first_call;
@@ -2123,7 +2121,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 * planner's get_actual_variable_range() function to match.
 		 */
 		if (all_dead && *all_dead &&
-			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
+			!HeapTupleIsSurelyDead(heapTuple, GetRecentGlobalXmin()))
 			*all_dead = false;
 
 		/*
@@ -3782,9 +3780,8 @@ l2:
 				update_xact = InvalidTransactionId;
 
 			/*
-			 * There was no UPDATE in the MultiXact; or it aborted. No
-			 * TransactionIdIsInProgress() call needed here, since we called
-			 * MultiXactIdWait() above.
+			 * There was no UPDATE in the MultiXact; or it aborted. It cannot
+			 * be in-progress anymore, since we called MultiXactIdWait() above.
 			 */
 			if (!TransactionIdIsValid(update_xact) ||
 				TransactionIdDidAbort(update_xact))
@@ -5265,7 +5262,7 @@ heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode,
  * either here, or within MultiXactIdExpand.
  *
  * There is a similar race condition possible when the old xmax was a regular
- * TransactionId.  We test TransactionIdIsInProgress again just to narrow the
+ * TransactionId.  We test TransactionIdGetStatus again just to narrow the
  * window, but it's still possible to end up creating an unnecessary
  * MultiXactId.  Fortunately this is harmless.
  */
@@ -5276,6 +5273,7 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 						  TransactionId *result_xmax, uint16 *result_infomask,
 						  uint16 *result_infomask2)
 {
+	TransactionIdStatus xidstatus;
 	TransactionId new_xmax;
 	uint16		new_infomask,
 				new_infomask2;
@@ -5411,7 +5409,7 @@ l5:
 		new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_status);
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
-	else if (TransactionIdIsInProgress(xmax))
+	else if ((xidstatus = TransactionIdGetStatus(xmax)) == XID_INPROGRESS)
 	{
 		/*
 		 * If the XMAX is a valid, in-progress TransactionId, then we need to
@@ -5440,8 +5438,9 @@ l5:
 				/*
 				 * LOCK_ONLY can be present alone only when a page has been
 				 * upgraded by pg_upgrade.  But in that case,
-				 * TransactionIdIsInProgress() should have returned false.  We
-				 * assume it's no longer locked in this case.
+				 * TransactionIdGetStatus() should not have returned
+				 * XID_INPROGRESS.  We assume it's no longer locked in this
+				 * case.
 				 */
 				elog(WARNING, "LOCK_ONLY found for Xid in progress %u", xmax);
 				old_infomask |= HEAP_XMAX_INVALID;
@@ -5494,7 +5493,7 @@ l5:
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
 	else if (!HEAP_XMAX_IS_LOCKED_ONLY(old_infomask) &&
-			 TransactionIdDidCommit(xmax))
+			 xidstatus == XID_COMMITTED)
 	{
 		/*
 		 * It's a committed update, so we gotta preserve him as updater of the
@@ -5523,7 +5522,7 @@ l5:
 		/*
 		 * Can get here iff the locking/updating transaction was running when
 		 * the infomask was extracted from the tuple, but finished before
-		 * TransactionIdIsInProgress got to run.  Deal with it as if there was
+		 * TransactionIdGetStatus got to run.  Deal with it as if there was
 		 * no locker at all in the first place.
 		 */
 		old_infomask |= HEAP_XMAX_INVALID;
@@ -5556,15 +5555,11 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 						   LockTupleMode mode, bool *needwait)
 {
 	MultiXactStatus wantedstatus;
+	TransactionIdStatus xidstatus;
 
 	*needwait = false;
 	wantedstatus = get_mxact_status_for_lock(mode, false);
 
-	/*
-	 * Note: we *must* check TransactionIdIsInProgress before
-	 * TransactionIdDidAbort/Commit; see comment at top of tqual.c for an
-	 * explanation.
-	 */
 	if (TransactionIdIsCurrentTransactionId(xid))
 	{
 		/*
@@ -5574,7 +5569,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleSelfUpdated;
 	}
-	else if (TransactionIdIsInProgress(xid))
+	xidstatus = TransactionIdGetStatus(xid);
+
+	if (xidstatus == XID_INPROGRESS)
 	{
 		/*
 		 * If the locking transaction is running, what we do depends on
@@ -5594,9 +5591,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleMayBeUpdated;
 	}
-	else if (TransactionIdDidAbort(xid))
+	else if (xidstatus == XID_ABORTED)
 		return HeapTupleMayBeUpdated;
-	else if (TransactionIdDidCommit(xid))
+	else if (xidstatus == XID_COMMITTED)
 	{
 		/*
 		 * The other transaction committed.  If it was only a locker, then the
@@ -5609,7 +5606,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 * Note: the reason we worry about ISUPDATE here is because as soon as
 		 * a transaction ends, all its locks are gone and meaningless, and
 		 * thus we can ignore them; whereas its updates persist.  In the
-		 * TransactionIdIsInProgress case, above, we don't need to check
+		 * XID_INPROGRESS case, above, we don't need to check
 		 * because we know the lock is still "alive" and thus a conflict needs
 		 * always be checked.
 		 */
@@ -5623,9 +5620,7 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 
 		return HeapTupleMayBeUpdated;
 	}
-
-	/* Not in progress, not aborted, not committed -- must have crashed */
-	return HeapTupleMayBeUpdated;
+	return 0; /* not reached */
 }
 
 
@@ -6158,8 +6153,8 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	 * RecentGlobalXmin.  That's not pretty, but it doesn't seem worth
 	 * inventing a nicer API for this.
 	 */
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-	PageSetPrunable(page, RecentGlobalXmin);
+	Assert(TransactionIdIsValid(GetRecentGlobalXmin()));
+	PageSetPrunable(page, GetRecentGlobalXmin());
 
 	/* store transaction information of xact deleting the tuple */
 	tp.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
@@ -6410,7 +6405,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 */
 			if (TransactionIdPrecedes(xid, cutoff_xid))
 			{
-				Assert(!TransactionIdDidCommit(xid));
+				Assert(TransactionIdGetStatus(xid) == XID_ABORTED);
 				*flags |= FRM_INVALIDATE_XMAX;
 				xid = InvalidTransactionId; /* not strictly necessary */
 			}
@@ -6481,6 +6476,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		if (ISUPDATE_from_mxstatus(members[i].status))
 		{
 			TransactionId xid = members[i].xid;
+			TransactionIdStatus xidstatus;
 
 			/*
 			 * It's an update; should we keep it?  If the transaction is known
@@ -6488,18 +6484,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 * Note that an updater older than cutoff_xid cannot possibly be
 			 * committed, because HeapTupleSatisfiesVacuum would have returned
 			 * HEAPTUPLE_DEAD and we would not be trying to freeze the tuple.
-			 *
-			 * As with all tuple visibility routines, it's critical to test
-			 * TransactionIdIsInProgress before TransactionIdDidCommit,
-			 * because of race conditions explained in detail in tqual.c.
 			 */
-			if (TransactionIdIsCurrentTransactionId(xid) ||
-				TransactionIdIsInProgress(xid))
+			xidstatus = TransactionIdGetStatus(xid);
+			if (xidstatus == XID_INPROGRESS)
 			{
 				Assert(!TransactionIdIsValid(update_xid));
 				update_xid = xid;
 			}
-			else if (TransactionIdDidCommit(xid))
+			else if (xidstatus == XID_COMMITTED)
 			{
 				/*
 				 * The transaction committed, so we can tell caller to set
@@ -6537,8 +6529,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		else
 		{
 			/* We only keep lockers if they are still running */
-			if (TransactionIdIsCurrentTransactionId(members[i].xid) ||
-				TransactionIdIsInProgress(members[i].xid))
+			if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 			{
 				/* running locker cannot possibly be older than the cutoff */
 				Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
@@ -7012,6 +7003,7 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 		{
 			TransactionId memxid;
 			LOCKMODE	memlockmode;
+			TransactionIdStatus	xidstatus;
 
 			memlockmode = LOCKMODE_from_mxstatus(members[i].status);
 
@@ -7024,16 +7016,18 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 			if (TransactionIdIsCurrentTransactionId(memxid))
 				continue;
 
+			xidstatus = TransactionIdGetStatus(memxid);
+
 			if (ISUPDATE_from_mxstatus(members[i].status))
 			{
 				/* ignore aborted updaters */
-				if (TransactionIdDidAbort(memxid))
+				if (xidstatus == XID_ABORTED)
 					continue;
 			}
 			else
 			{
 				/* ignore lockers-only that are no longer in progress */
-				if (!TransactionIdIsInProgress(memxid))
+				if (xidstatus != XID_INPROGRESS)
 					continue;
 			}
 
@@ -7113,7 +7107,7 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 			if (!DoLockModesConflict(LOCKMODE_from_mxstatus(memstatus),
 									 LOCKMODE_from_mxstatus(status)))
 			{
-				if (remaining && TransactionIdIsInProgress(memxid))
+				if (remaining && TransactionIdGetStatus(memxid) == XID_INPROGRESS)
 					remain++;
 				continue;
 			}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52231ac417..6db44f6a50 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -23,6 +23,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
@@ -101,10 +102,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
-		OldestXmin = RecentGlobalXmin;
+		OldestXmin = GetRecentGlobalXmin();
 	else
 		OldestXmin =
-			TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+			TransactionIdLimitedForOldSnapshots(GetRecentGlobalDataXmin(),
 												relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index bef4255369..eff0d48127 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -530,8 +530,6 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amgettuple);
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	/*
 	 * The AM's amgettuple proc finds the next index entry matching the scan
 	 * keys, and puts the TID into scan->xs_ctup.t_self.  It should also set
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index a3f11da8d5..db92670e68 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -321,6 +321,9 @@ older than RecentGlobalXmin.  As collateral damage, this implementation
 also waits for running XIDs with no snapshots and for snapshots taken
 until the next transaction to allocate an XID commits.
 
+XXX: now that we use CSNs as snapshots, it would be more
+straightforward to use something based on CSNs instead of RecentGlobalXmin.
+
 Reclaiming a page doesn't actually change its state on disk --- we simply
 record it in the shared-memory free space map, from which it will be
 handed out the next time a new page is needed for a page split.  The
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 5c817b6510..18f6c053e2 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 static bool _bt_mark_page_halfdead(Relation rel, Buffer buf, BTStack stack);
@@ -760,7 +761,7 @@ _bt_page_recyclable(Page page)
 	 */
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	if (P_ISDELETED(opaque) &&
-		TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
+		TransactionIdPrecedes(opaque->btpo.xact, GetRecentGlobalXmin()))
 		return true;
 	return false;
 }
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a728..39dda72361 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -19,21 +19,10 @@
 static void
 standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
-	int			i;
-
 	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
 					 xlrec->nextXid,
 					 xlrec->latestCompletedXid,
 					 xlrec->oldestRunningXid);
-	if (xlrec->xcnt > 0)
-	{
-		appendStringInfo(buf, "; %d xacts:", xlrec->xcnt);
-		for (i = 0; i < xlrec->xcnt; i++)
-			appendStringInfo(buf, " %u", xlrec->xids[i]);
-	}
-
-	if (xlrec->subxid_overflow)
-		appendStringInfoString(buf, "; subxid ovf");
 }
 
 void
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3aafa79e52..ef09f3c86a 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -255,17 +255,6 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 	}
 }
 
-static void
-xact_desc_assignment(StringInfo buf, xl_xact_assignment *xlrec)
-{
-	int			i;
-
-	appendStringInfoString(buf, "subxacts:");
-
-	for (i = 0; i < xlrec->nsubxacts; i++)
-		appendStringInfo(buf, " %u", xlrec->xsub[i]);
-}
-
 void
 xact_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -285,18 +274,6 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 
 		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) rec;
-
-		/*
-		 * Note that we ignore the WAL record's xid, since we're more
-		 * interested in the top-level xid that issued the record and which
-		 * xids are being reported here.
-		 */
-		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
-		xact_desc_assignment(buf, xlrec);
-	}
 }
 
 const char *
@@ -321,9 +298,6 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ABORT_PREPARED:
 			id = "ABORT_PREPARED";
 			break;
-		case XLOG_XACT_ASSIGNMENT:
-			id = "ASSIGNMENT";
-			break;
 	}
 
 	return id;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index d7d5e90ef3..20aed5755f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -26,6 +26,7 @@
 #include "storage/bufmgr.h"
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 
@@ -521,7 +522,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
 		dt = (SpGistDeadTuple) PageGetItem(page, PageGetItemId(page, i));
 
 		if (dt->tupstate == SPGIST_REDIRECT &&
-			TransactionIdPrecedes(dt->xid, RecentGlobalXmin))
+			TransactionIdPrecedes(dt->xid, GetRecentGlobalXmin()))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47269..fea6d28e33 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,8 +12,8 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
+OBJS = clog.o commit_ts.o csnlog.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
+	timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
 	xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index e7dd19fd7b..55e25846b6 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -244,44 +244,24 @@ transaction Y as committed, then snapshot A must consider transaction Y as
 committed".
 
 What we actually enforce is strict serialization of commits and rollbacks
-with snapshot-taking: we do not allow any transaction to exit the set of
-running transactions while a snapshot is being taken.  (This rule is
-stronger than necessary for consistency, but is relatively simple to
-enforce, and it assists with some other issues as explained below.)  The
-implementation of this is that GetSnapshotData takes the ProcArrayLock in
-shared mode (so that multiple backends can take snapshots in parallel),
-but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
-while clearing MyPgXact->xid at transaction end (either commit or abort).
-(To reduce context switching, when multiple transactions commit nearly
-simultaneously, we have one backend take ProcArrayLock and clear the XIDs
-of multiple processes at once.)
-
-ProcArrayEndTransaction also holds the lock while advancing the shared
-latestCompletedXid variable.  This allows GetSnapshotData to use
-latestCompletedXid + 1 as xmax for its snapshot: there can be no
-transaction >= this xid value that the snapshot needs to consider as
-completed.
-
-In short, then, the rule is that no transaction may exit the set of
-currently-running transactions between the time we fetch latestCompletedXid
-and the time we finish building our snapshot.  However, this restriction
-only applies to transactions that have an XID --- read-only transactions
-can end without acquiring ProcArrayLock, since they don't affect anyone
-else's snapshot nor latestCompletedXid.
-
-Transaction start, per se, doesn't have any interlocking with these
-considerations, since we no longer assign an XID immediately at transaction
-start.  But when we do decide to allocate an XID, GetNewTransactionId must
-store the new XID into the shared ProcArray before releasing XidGenLock.
-This ensures that all top-level XIDs <= latestCompletedXid are either
-present in the ProcArray, or not running anymore.  (This guarantee doesn't
-apply to subtransaction XIDs, because of the possibility that there's not
-room for them in the subxid array; instead we guarantee that they are
-present or the overflow flag is set.)  If a backend released XidGenLock
-before storing its XID into MyPgXact, then it would be possible for another
-backend to allocate and commit a later XID, causing latestCompletedXid to
-pass the first backend's XID, before that value became visible in the
-ProcArray.  That would break GetOldestXmin, as discussed below.
+with snapshot-taking. Each commit is assigned a Commit Sequence Number, or
+CSN for short, using a monotonically increasing counter. A snapshot is
+represented by the value of the CSN counter, at the time the snapshot was
+taken. All (committed) transactions with a CSN <= the snapshot's CSN are
+considered as visible to the snapshot.
+
+When checking the visibility of a tuple, we need to look up the CSN
+of the xmin/xmax. For that purpose, we store the CSN of each
+transaction in the Commit Sequence Number log (csnlog).
+
+So, a snapshot is simply a CSN, such that all transactions that committed
+before that LSN are visible, and everything later is still considered as
+in-progress. However, to avoid consulting the csnlog every time the visibilty
+of a tuple is checked, we also record a lower and upper bound of the XIDs
+considered visible by the snapshot, in SnapshotData. When a snapshot is
+taken, xmax is set to the current nextXid value; any transaction that begins
+after the snapshot is surely still running. The xmin is tracked lazily in
+shared memory, by AdvanceRecentGlobalXmin().
 
 We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the
 subxid array) without taking ProcArrayLock.  This was once necessary to
@@ -293,42 +273,29 @@ once, rather than assume they can read it multiple times and get the same
 answer each time.  (Use volatile-qualified pointers when doing this, to
 ensure that the C compiler does exactly what you tell it to.)
 
-Another important activity that uses the shared ProcArray is GetOldestXmin,
-which must determine a lower bound for the oldest xmin of any active MVCC
-snapshot, system-wide.  Each individual backend advertises the smallest
-xmin of its own snapshots in MyPgXact->xmin, or zero if it currently has no
+Another important activity that uses the shared ProcArray is GetOldestSnapshot
+which must determine a lower bound for the oldest of any active MVCC
+snapshots, system-wide.  Each individual backend advertises the earliest
+of its own snapshots in MyPgXact->snapshotcsn, or zero if it currently has no
 live snapshots (eg, if it's between transactions or hasn't yet set a
-snapshot for a new transaction).  GetOldestXmin takes the MIN() of the
-valid xmin fields.  It does this with only shared lock on ProcArrayLock,
-which means there is a potential race condition against other backends
-doing GetSnapshotData concurrently: we must be certain that a concurrent
-backend that is about to set its xmin does not compute an xmin less than
-what GetOldestXmin returns.  We ensure that by including all the active
-XIDs into the MIN() calculation, along with the valid xmins.  The rule that
-transactions can't exit without taking exclusive ProcArrayLock ensures that
-concurrent holders of shared ProcArrayLock will compute the same minimum of
-currently-active XIDs: no xact, in particular not the oldest, can exit
-while we hold shared ProcArrayLock.  So GetOldestXmin's view of the minimum
-active XID will be the same as that of any concurrent GetSnapshotData, and
-so it can't produce an overestimate.  If there is no active transaction at
-all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
-for the xmin that might be computed by concurrent or later GetSnapshotData
-calls.  (We know that no XID less than this could be about to appear in
-the ProcArray, because of the XidGenLock interlock discussed above.)
-
-GetSnapshotData also performs an oldest-xmin calculation (which had better
-match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
-for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
-too expensive.  Note that while it is certain that two concurrent
-executions of GetSnapshotData will compute the same xmin for their own
-snapshots, as argued above, it is not certain that they will arrive at the
-same estimate of RecentGlobalXmin.  This is because we allow XID-less
-transactions to clear their MyPgXact->xmin asynchronously (without taking
-ProcArrayLock), so one execution might see what had been the oldest xmin,
-and another not.  This is OK since RecentGlobalXmin need only be a valid
-lower bound.  As noted above, we are already assuming that fetch/store
-of the xid fields is atomic, so assuming it for xmin as well is no extra
-risk.
+snapshot for a new transaction).  GetOldestSnapshot takes the MIN() of the
+snapshots.
+
+For freezing tuples, vacuum needs to know the oldest XID that is still
+considered running by any active transaction. That is, the oldest XID still
+considered running by the oldest active snapshot, as returned by
+GetOldestSnapshotCSN(). This value is somewhat expensive to calculate, so
+the most recently calculated value is kept in shared memory
+(SharedVariableCache->recentXmin), and is recalculated lazily by
+AdvanceRecentGlobalXmin() function. AdvanceRecentGlobalXmin() first scans
+the proc array, and makes note of the oldest active XID. That XID - 1 will
+become the new xmin. It then waits until all currently active snapshots have
+finished. Any snapshot that begins later will see the xmin as finished, so
+after all the active snapshots have finished, xmin will be visible to
+everyone. However, AdvanceRecentGlobalXmin() does not actually block waiting
+for anything; instead it contains a state machine that advances if possible,
+when AdvanceRecentGlobalXmin() is called. AdvanceRecentGlobalXmin() is
+called periodically by the WAL writer, so that it doesn't get very stale.
 
 
 pg_xact and pg_subtrans
@@ -343,21 +310,10 @@ from disk.  They also allow information to be permanent across server restarts.
 
 pg_xact records the commit status for each transaction that has been assigned
 an XID.  A transaction can be in progress, committed, aborted, or
-"sub-committed".  This last state means that it's a subtransaction that's no
-longer running, but its parent has not updated its state yet.  It is not
-necessary to update a subtransaction's transaction status to subcommit, so we
-can just defer it until main transaction commit.  The main role of marking
-transactions as sub-committed is to provide an atomic commit protocol when
-transaction status is spread across multiple clog pages. As a result, whenever
-transaction status spreads across multiple pages we must use a two-phase commit
-protocol: the first phase is to mark the subtransactions as sub-committed, then
-we mark the top level transaction and all its subtransactions committed (in
-that order).  Thus, subtransactions that have not aborted appear as in-progress
-even when they have already finished, and the subcommit status appears as a
-very short transitory state during main transaction commit.  Subtransaction
-abort is always marked in clog as soon as it occurs.  When the transaction
-status all fit in a single CLOG page, we atomically mark them all as committed
-without bothering with the intermediate sub-commit state.
+"committing". For committed transactions, the clog stores the commit WAL
+record's LSN. This last state means that the transaction is just about to
+write its commit WAL record, or just did so, but it hasn't yet updated the
+clog with the record's LSN.
 
 Savepoints are implemented using subtransactions.  A subtransaction is a
 transaction inside a transaction; its commit or abort status is not only
@@ -370,7 +326,7 @@ transaction.
 The "subtransaction parent" (pg_subtrans) mechanism records, for each
 transaction with an XID, the TransactionId of its parent transaction.  This
 information is stored as soon as the subtransaction is assigned an XID.
-Top-level transactions do not have a parent, so they leave their pg_subtrans
+Top-level transactions do not have a parent, so they leave their pg_csnlog
 entries set to the default value of zero (InvalidTransactionId).
 
 pg_subtrans is used to check whether the transaction in question is still
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 9003b22193..aed0b0bd24 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -33,6 +33,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
@@ -74,13 +75,6 @@
 	((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)
 
 /*
- * The number of subtransactions below which we consider to apply clog group
- * update optimization.  Testing reveals that the number higher than this can
- * hurt performance.
- */
-#define THRESHOLD_SUBTRANS_CLOG_OPT	5
-
-/*
  * Link to shared-memory data structures for CLOG control
  */
 static SlruCtlData ClogCtlData;
@@ -93,23 +87,23 @@ static bool CLOGPagePrecedes(int page1, int page2);
 static void WriteZeroPageXlogRec(int pageno);
 static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
 					 Oid oldestXidDb);
-static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
-						   XLogRecPtr lsn, int pageno,
-						   bool all_xact_same_page);
-static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
-						  XLogRecPtr lsn, int slotno);
-static void set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn);
-static bool TransactionGroupUpdateXidStatus(TransactionId xid,
-								XidStatus status, XLogRecPtr lsn, int pageno);
-static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
-								   TransactionId *subxids, XidStatus status,
-								   XLogRecPtr lsn, int pageno);
+static void CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno,
+				  bool all_xacts_same_page);
+static void CLogSetStatusBit(TransactionId xid, CLogXidStatus status,
+				 XLogRecPtr lsn, int slotno);
+static bool CLogGroupUpdateXidStatus(TransactionId xid, int nsubxids,
+						 TransactionId *subxids, CLogXidStatus status,
+						 XLogRecPtr lsn, int pageno);
+static void CLogSetPageStatusInternal(TransactionId xid, int nsubxids,
+						  TransactionId *subxids, CLogXidStatus status,
+						  XLogRecPtr lsn, int pageno);
+
 
 
 /*
- * TransactionIdSetTreeStatus
+ * CLogSetTreeStatus
  *
  * Record the final state of transaction entries in the commit log for
  * a transaction and its subtransaction tree. Take care to ensure this is
@@ -127,30 +121,13 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * caller guarantees the commit record is already flushed in that case.  It
  * should be InvalidXLogRecPtr for abort cases, too.
  *
- * In the commit case, atomicity is limited by whether all the subxids are in
- * the same CLOG page as xid.  If they all are, then the lock will be grabbed
- * only once, and the status will be set to committed directly.  Otherwise
- * we must
- *	 1. set sub-committed all subxids that are not on the same page as the
- *		main xid
- *	 2. atomically set committed the main xid and the subxids on the same page
- *	 3. go over the first bunch again and set them committed
- * Note that as far as concurrent checkers are concerned, main transaction
- * commit as a whole is still atomic.
- *
- * Example:
- *		TransactionId t commits and has subxids t1, t2, t3, t4
- *		t is on page p1, t1 is also on p1, t2 and t3 are on p2, t4 is on p3
- *		1. update pages2-3:
- *					page2: set t2,t3 as sub-committed
- *					page3: set t4 as sub-committed
- *		2. update page1:
- *					set t1 as sub-committed,
- *					then set t as committed,
-					then set t1 as committed
- *		3. update pages2-3:
- *					page2: set t2,t3 as committed
- *					page3: set t4 as committed
+ * The atomicity is limited by whether all the subxids are in the same CLOG
+ * page as xid.  If they all are, then the lock will be grabbed only once,
+ * and the status will be set to committed directly.  Otherwise there is
+ * a window that the parent will be seen as committed, while (some of) the
+ * children are still seen as in-progress. That's OK with the current use,
+ * as visibility checking code will not rely on the CLOG for recent
+ * transactions (CSNLOG will be used instead).
  *
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; functions in transam.c are the intended callers.
@@ -160,147 +137,75 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * cache yet.
  */
 void
-TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
+CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn)
 {
-	int			pageno = TransactionIdToPage(xid);	/* get page of parent */
+	TransactionId topXid;
+	int			pageno;
 	int			i;
+	int			offset;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED);
-
-	/*
-	 * See how many subxids, if any, are on the same page as the parent, if
-	 * any.
-	 */
-	for (i = 0; i < nsubxids; i++)
-	{
-		if (TransactionIdToPage(subxids[i]) != pageno)
-			break;
-	}
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	/*
-	 * Do all items fit on a single page?
+	 * Update the clog page-by-page. On first iteration, we will set the
+	 * status of the top-XID, and any subtransactions on the same page.
 	 */
-	if (i == nsubxids)
-	{
-		/*
-		 * Set the parent and all subtransactions in a single call
-		 */
-		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
-								   pageno, true);
-	}
-	else
-	{
-		int			nsubxids_on_first_page = i;
-
-		/*
-		 * If this is a commit then we care about doing this correctly (i.e.
-		 * using the subcommitted intermediate status).  By here, we know
-		 * we're updating more than one page of clog, so we must mark entries
-		 * that are *not* on the first page so that they show as subcommitted
-		 * before we then return to update the status to fully committed.
-		 *
-		 * To avoid touching the first page twice, skip marking subcommitted
-		 * for the subxids on that first page.
-		 */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-			set_status_by_pages(nsubxids - nsubxids_on_first_page,
-								subxids + nsubxids_on_first_page,
-								TRANSACTION_STATUS_SUB_COMMITTED, lsn);
-
-		/*
-		 * Now set the parent and subtransactions on same page as the parent,
-		 * if any
-		 */
-		pageno = TransactionIdToPage(xid);
-		TransactionIdSetPageStatus(xid, nsubxids_on_first_page, subxids, status,
-								   lsn, pageno, false);
-
-		/*
-		 * Now work through the rest of the subxids one clog page at a time,
-		 * starting from the second page onwards, like we did above.
-		 */
-		set_status_by_pages(nsubxids - nsubxids_on_first_page,
-							subxids + nsubxids_on_first_page,
-							status, lsn);
-	}
-}
-
-/*
- * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
- * transactions, chunking in the separate CLOG pages involved. We never
- * pass the whole transaction tree to this function, only subtransactions
- * that are on different pages to the top level transaction id.
- */
-static void
-set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn)
-{
-	int			pageno = TransactionIdToPage(subxids[0]);
-	int			offset = 0;
-	int			i = 0;
-
-	while (i < nsubxids)
+	pageno = TransactionIdToPage(xid);	/* get page of parent */
+	topXid = xid;
+	offset = 0;
+	i = 0;
+	for (;;)
 	{
 		int			num_on_page = 0;
 
-		while (TransactionIdToPage(subxids[i]) == pageno && i < nsubxids)
+		while (i < nsubxids && TransactionIdToPage(subxids[i]) == pageno)
 		{
 			num_on_page++;
 			i++;
 		}
 
-		TransactionIdSetPageStatus(InvalidTransactionId,
-								   num_on_page, subxids + offset,
-								   status, lsn, pageno, false);
+		CLogSetPageStatus(topXid,
+						  num_on_page, subxids + offset,
+						  status, lsn, pageno,
+						  nsubxids == num_on_page);
+
+		if (i == nsubxids)
+			break;
+
 		offset = i;
 		pageno = TransactionIdToPage(subxids[offset]);
+		topXid = InvalidTransactionId;
 	}
 }
 
 /*
- * Record the final state of transaction entries in the commit log for all
- * entries on a single page.  Atomic only on this page.
+ * Record the final state of transaction entries in the commit log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as CLogSetTreeStatus()
  */
 static void
-TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
-						   XLogRecPtr lsn, int pageno,
-						   bool all_xact_same_page)
+CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno,
+				  bool all_xact_same_page)
 {
-	/* Can't use group update when PGPROC overflows. */
-	StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
-					 "group clog threshold less than PGPROC cached subxids");
-
 	/*
 	 * When there is contention on CLogControlLock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
 	 * updates for multiple backends so that the number of times
 	 * CLogControlLock needs to be acquired is reduced.
 	 *
-	 * For this optimization to be safe, the XID in MyPgXact and the subxids
-	 * in MyProc must be the same as the ones for which we're setting the
-	 * status.  Check that this is the case.
-	 *
 	 * For this optimization to be efficient, we shouldn't have too many
 	 * sub-XIDs and all of the XIDs for which we're adjusting clog should be
 	 * on the same page.  Check those conditions, too.
 	 */
 	if (all_xact_same_page && xid == MyPgXact->xid &&
-		nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
-		nsubxids == MyPgXact->nxids &&
-		memcmp(subxids, MyProc->subxids.xids,
-			   nsubxids * sizeof(TransactionId)) == 0)
+		nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
 	{
 		/*
-		 * We don't try to do group update optimization if a process has
-		 * overflowed the subxids array in its PGPROC, since in that case we
-		 * don't have a complete list of XIDs for it.
-		 */
-		Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
-
-		/*
 		 * If we can immediately acquire CLogControlLock, we update the status
 		 * of our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
@@ -309,12 +214,13 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 		if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
-			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
-											   lsn, pageno);
+			CLogSetPageStatusInternal(xid, nsubxids, subxids, status,
+									  lsn, pageno);
 			LWLockRelease(CLogControlLock);
 			return;
 		}
-		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
+		else if (CLogGroupUpdateXidStatus(xid, nsubxids, subxids, status,
+										  lsn, pageno))
 		{
 			/* Group update mechanism has done the work. */
 			return;
@@ -325,8 +231,8 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 
 	/* Group update not applicable, or couldn't accept this page number. */
 	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
-	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
-									   lsn, pageno);
+	CLogSetPageStatusInternal(xid, nsubxids, subxids, status,
+							  lsn, pageno);
 	LWLockRelease(CLogControlLock);
 }
 
@@ -336,17 +242,15 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
  * We don't do any locking here; caller must handle that.
  */
 static void
-TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
-								   TransactionId *subxids, XidStatus status,
-								   XLogRecPtr lsn, int pageno)
+CLogSetPageStatusInternal(TransactionId xid, int nsubxids,
+						  TransactionId *subxids, CLogXidStatus status,
+						  XLogRecPtr lsn, int pageno)
 {
 	int			slotno;
 	int			i;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED ||
-		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(CLogControlLock, LW_EXCLUSIVE));
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -359,38 +263,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	 */
 	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
 
-	/*
-	 * Set the main transaction id, if any.
-	 *
-	 * If we update more than one xid on this page while it is being written
-	 * out, we might find that some of the bits go to disk and others don't.
-	 * If we are updating commits on the page with the top-level xid that
-	 * could break atomicity, so we subcommit the subxids first before we mark
-	 * the top-level commit.
-	 */
+	/* Set the main transaction id, if any. */
 	if (TransactionIdIsValid(xid))
-	{
-		/* Subtransactions first, if needed ... */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-		{
-			for (i = 0; i < nsubxids; i++)
-			{
-				Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-				TransactionIdSetStatusBit(subxids[i],
-										  TRANSACTION_STATUS_SUB_COMMITTED,
-										  lsn, slotno);
-			}
-		}
-
-		/* ... then the main transaction */
-		TransactionIdSetStatusBit(xid, status, lsn, slotno);
-	}
+		CLogSetStatusBit(xid, status, lsn, slotno);
 
 	/* Set the subtransactions */
 	for (i = 0; i < nsubxids; i++)
 	{
 		Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-		TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+		CLogSetStatusBit(subxids[i], status, lsn, slotno);
 	}
 
 	ClogCtl->shared->page_dirty[slotno] = true;
@@ -411,8 +292,9 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * number we need to update differs from those processes already waiting.
  */
 static bool
-TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
-								XLogRecPtr lsn, int pageno)
+CLogGroupUpdateXidStatus(TransactionId xid, int nsubxids,
+						 TransactionId *subxids, CLogXidStatus status,
+						 XLogRecPtr lsn, int pageno)
 {
 	volatile PROC_HDR *procglobal = ProcGlobal;
 	PGPROC	   *proc = MyProc;
@@ -431,6 +313,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberXidStatus = status;
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
+	proc->clogGroupNSubxids = nsubxids;
+	memcpy(&proc->clogGroupSubxids[0], subxids, nsubxids * sizeof(TransactionId));
 
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
@@ -511,20 +395,13 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[nextidx];
-		PGXACT	   *pgxact = &ProcGlobal->allPgXact[nextidx];
 
-		/*
-		 * Overflowed transactions should not use group XID status update
-		 * mechanism.
-		 */
-		Assert(!pgxact->overflowed);
-
-		TransactionIdSetPageStatusInternal(proc->clogGroupMemberXid,
-										   pgxact->nxids,
-										   proc->subxids.xids,
-										   proc->clogGroupMemberXidStatus,
-										   proc->clogGroupMemberLsn,
-										   proc->clogGroupMemberPage);
+		CLogSetPageStatusInternal(proc->clogGroupMemberXid,
+								  proc->clogGroupNSubxids,
+								  proc->clogGroupSubxids,
+								  proc->clogGroupMemberXidStatus,
+								  proc->clogGroupMemberLsn,
+								  proc->clogGroupMemberPage);
 
 		/* Move to next proc in list. */
 		nextidx = pg_atomic_read_u32(&proc->clogGroupNext);
@@ -563,7 +440,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
  * Must be called with CLogControlLock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+CLogSetStatusBit(TransactionId xid, CLogXidStatus status, XLogRecPtr lsn, int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -575,22 +452,12 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
 	/*
-	 * When replaying transactions during recovery we still need to perform
-	 * the two phases of subcommit and then commit. However, some transactions
-	 * are already correctly marked, so we just treat those as a no-op which
-	 * allows us to keep the following Assert as restrictive as possible.
-	 */
-	if (InRecovery && status == TRANSACTION_STATUS_SUB_COMMITTED &&
-		curval == TRANSACTION_STATUS_COMMITTED)
-		return;
-
-	/*
 	 * Current state change should be from 0 or subcommitted to target state
 	 * or we should already be there when replaying changes during recovery.
 	 */
 	Assert(curval == 0 ||
-		   (curval == TRANSACTION_STATUS_SUB_COMMITTED &&
-			status != TRANSACTION_STATUS_IN_PROGRESS) ||
+		   (curval == CLOG_XID_STATUS_SUB_COMMITTED &&
+			status != CLOG_XID_STATUS_IN_PROGRESS) ||
 		   curval == status);
 
 	/* note this assumes exclusive access to the clog page */
@@ -631,8 +498,8 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; TransactionLogFetch() in transam.c is the intended caller.
  */
-XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+CLogXidStatus
+CLogGetStatus(TransactionId xid, XLogRecPtr *lsn)
 {
 	int			pageno = TransactionIdToPage(xid);
 	int			byteno = TransactionIdToByte(xid);
@@ -640,7 +507,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	int			slotno;
 	int			lsnindex;
 	char	   *byteptr;
-	XidStatus	status;
+	CLogXidStatus status;
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
diff --git a/src/backend/access/transam/clog.c.orig b/src/backend/access/transam/clog.c.orig
new file mode 100644
index 0000000000..7a37c1c629
--- /dev/null
+++ b/src/backend/access/transam/clog.c.orig
@@ -0,0 +1,1028 @@
+/*-------------------------------------------------------------------------
+ *
+ * clog.c
+ *		PostgreSQL transaction-commit-log manager
+ *
+ * This module replaces the old "pg_log" access code, which treated pg_log
+ * essentially like a relation, in that it went through the regular buffer
+ * manager.  The problem with that was that there wasn't any good way to
+ * recycle storage space for transactions so old that they'll never be
+ * looked up again.  Now we use specialized access code so that the commit
+ * log can be broken into relatively small, independent segments.
+ *
+ * XLOG interactions: this module generates an XLOG record whenever a new
+ * CLOG page is initialized to zeroes.  Other writes of CLOG come from
+ * recording of transaction commit or abort in xact.c, which generates its
+ * own XLOG records for these events and will re-perform the status update
+ * on redo; so we need make no additional XLOG entry here.  For synchronous
+ * transaction commits, the XLOG is guaranteed flushed through the XLOG commit
+ * record before we are called to log a commit, so the WAL rule "write xlog
+ * before data" is satisfied automatically.  However, for async commits we
+ * must track the latest LSN affecting each CLOG page, so that we can flush
+ * XLOG that far and satisfy the WAL rule.  We don't have to worry about this
+ * for aborts (whether sync or async), since the post-crash assumption would
+ * be that such transactions failed anyway.
+ *
+ * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/access/transam/clog.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/clog.h"
+#include "access/mvccvars.h"
+#include "access/slru.h"
+#include "access/transam.h"
+#include "access/xlog.h"
+#include "access/xloginsert.h"
+#include "access/xlogutils.h"
+#include "miscadmin.h"
+#include "pgstat.h"
+#include "pg_trace.h"
+#include "storage/proc.h"
+
+/*
+ * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
+ * everywhere else in Postgres.
+ *
+ * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
+ * CLOG page numbering also wraps around at 0xFFFFFFFF/CLOG_XACTS_PER_PAGE,
+ * and CLOG segment numbering at
+ * 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
+ * explicit notice of that fact in this module, except when comparing segment
+ * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
+ */
+
+/* We need two bits per xact, so four xacts fit in a byte */
+#define CLOG_BITS_PER_XACT	2
+#define CLOG_XACTS_PER_BYTE 4
+#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
+#define CLOG_XACT_BITMASK	((1 << CLOG_BITS_PER_XACT) - 1)
+
+#define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)
+#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CLOG_XACTS_PER_PAGE)
+#define TransactionIdToByte(xid)	(TransactionIdToPgIndex(xid) / CLOG_XACTS_PER_BYTE)
+#define TransactionIdToBIndex(xid)	((xid) % (TransactionId) CLOG_XACTS_PER_BYTE)
+
+/* We store the latest async LSN for each group of transactions */
+#define CLOG_XACTS_PER_LSN_GROUP	32	/* keep this a power of 2 */
+#define CLOG_LSNS_PER_PAGE	(CLOG_XACTS_PER_PAGE / CLOG_XACTS_PER_LSN_GROUP)
+
+#define GetLSNIndex(slotno, xid)	((slotno) * CLOG_LSNS_PER_PAGE + \
+	((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)
+
+/*
+ * The number of subtransactions below which we consider to apply clog group
+ * update optimization.  Testing reveals that the number higher than this can
+ * hurt performance.
+ */
+#define THRESHOLD_SUBTRANS_CLOG_OPT	5
+
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
+static SlruCtlData ClogCtlData;
+
+#define ClogCtl (&ClogCtlData)
+
+
+static int	ZeroCLOGPage(int pageno, bool writeXlog);
+static bool CLOGPagePrecedes(int page1, int page2);
+static void WriteZeroPageXlogRec(int pageno);
+static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
+<<<<<<< HEAD
+								 Oid oldestXidDb);
+static void CLogSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids, CLogXidStatus status,
+						   XLogRecPtr lsn, int pageno);
+static void CLogSetStatusBit(TransactionId xid, CLogXidStatus status,
+						  XLogRecPtr lsn, int slotno);
+=======
+					 Oid oldestXidDb);
+static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids, XidStatus status,
+						   XLogRecPtr lsn, int pageno,
+						   bool all_xact_same_page);
+static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
+						  XLogRecPtr lsn, int slotno);
+static void set_status_by_pages(int nsubxids, TransactionId *subxids,
+					XidStatus status, XLogRecPtr lsn);
+static bool TransactionGroupUpdateXidStatus(TransactionId xid,
+								XidStatus status, XLogRecPtr lsn, int pageno);
+static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
+								   TransactionId *subxids, XidStatus status,
+								   XLogRecPtr lsn, int pageno);
+>>>>>>> baaf272ac908ea27c09076e34f62c45fa7d1e448
+
+
+/*
+ * CLogSetTreeStatus
+ *
+ * Record the final state of transaction entries in the commit log for
+ * a transaction and its subtransaction tree. Take care to ensure this is
+ * efficient, and as atomic as possible.
+ *
+ * xid is a single xid to set status for. This will typically be
+ * the top level transactionid for a top level commit or abort. It can
+ * also be a subtransaction when we record transaction aborts.
+ *
+ * subxids is an array of xids of length nsubxids, representing subtransactions
+ * in the tree of xid. In various cases nsubxids may be zero.
+ *
+ * lsn must be the WAL location of the commit record when recording an async
+ * commit.  For a synchronous commit it can be InvalidXLogRecPtr, since the
+ * caller guarantees the commit record is already flushed in that case.  It
+ * should be InvalidXLogRecPtr for abort cases, too.
+ *
+ * The atomicity is limited by whether all the subxids are in the same CLOG
+ * page as xid.  If they all are, then the lock will be grabbed only once,
+ * and the status will be set to committed directly.  Otherwise there is
+ * a window that the parent will be seen as committed, while (some of) the
+ * children are still seen as in-progress. That's OK with the current use,
+ * as visibility checking code will not rely on the CLOG for recent
+ * transactions (CSNLOG will be used instead).
+ *
+ * NB: this is a low-level routine and is NOT the preferred entry point
+ * for most uses; functions in transam.c are the intended callers.
+ *
+ * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
+ * but aren't yet in cache, as well as hinting pages not to fall out of
+ * cache yet.
+ */
+void
+CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn)
+{
+	TransactionId topXid;
+	int			pageno;
+	int			i;
+	int			offset;
+
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
+
+	/*
+	 * Update the clog page-by-page. On first iteration, we will set the
+	 * status of the top-XID, and any subtransactions on the same page.
+	 */
+<<<<<<< HEAD
+	pageno = TransactionIdToPage(xid);		/* get page of parent */
+	topXid = xid;
+	offset = 0;
+	i = 0;
+	for (;;)
+=======
+	if (i == nsubxids)
+	{
+		/*
+		 * Set the parent and all subtransactions in a single call
+		 */
+		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
+								   pageno, true);
+	}
+	else
+	{
+		int			nsubxids_on_first_page = i;
+
+		/*
+		 * If this is a commit then we care about doing this correctly (i.e.
+		 * using the subcommitted intermediate status).  By here, we know
+		 * we're updating more than one page of clog, so we must mark entries
+		 * that are *not* on the first page so that they show as subcommitted
+		 * before we then return to update the status to fully committed.
+		 *
+		 * To avoid touching the first page twice, skip marking subcommitted
+		 * for the subxids on that first page.
+		 */
+		if (status == TRANSACTION_STATUS_COMMITTED)
+			set_status_by_pages(nsubxids - nsubxids_on_first_page,
+								subxids + nsubxids_on_first_page,
+								TRANSACTION_STATUS_SUB_COMMITTED, lsn);
+
+		/*
+		 * Now set the parent and subtransactions on same page as the parent,
+		 * if any
+		 */
+		pageno = TransactionIdToPage(xid);
+		TransactionIdSetPageStatus(xid, nsubxids_on_first_page, subxids, status,
+								   lsn, pageno, false);
+
+		/*
+		 * Now work through the rest of the subxids one clog page at a time,
+		 * starting from the second page onwards, like we did above.
+		 */
+		set_status_by_pages(nsubxids - nsubxids_on_first_page,
+							subxids + nsubxids_on_first_page,
+							status, lsn);
+	}
+}
+
+/*
+ * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
+ * transactions, chunking in the separate CLOG pages involved. We never
+ * pass the whole transaction tree to this function, only subtransactions
+ * that are on different pages to the top level transaction id.
+ */
+static void
+set_status_by_pages(int nsubxids, TransactionId *subxids,
+					XidStatus status, XLogRecPtr lsn)
+{
+	int			pageno = TransactionIdToPage(subxids[0]);
+	int			offset = 0;
+	int			i = 0;
+
+	while (i < nsubxids)
+>>>>>>> baaf272ac908ea27c09076e34f62c45fa7d1e448
+	{
+		int			num_on_page = 0;
+
+		while (i < nsubxids && TransactionIdToPage(subxids[i]) == pageno)
+		{
+			num_on_page++;
+			i++;
+		}
+
+<<<<<<< HEAD
+		CLogSetPageStatus(topXid,
+						  num_on_page, subxids + offset,
+						  status, lsn, pageno);
+
+		if (i == nsubxids)
+			break;
+
+=======
+		TransactionIdSetPageStatus(InvalidTransactionId,
+								   num_on_page, subxids + offset,
+								   status, lsn, pageno, false);
+>>>>>>> baaf272ac908ea27c09076e34f62c45fa7d1e448
+		offset = i;
+		pageno = TransactionIdToPage(subxids[offset]);
+		topXid = InvalidTransactionId;
+	}
+}
+
+/*
+<<<<<<< HEAD
+ * Record the final state of transaction entries in the commit log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as CLogSetTreeStatus()
+ */
+static void
+CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno)
+=======
+ * Record the final state of transaction entries in the commit log for all
+ * entries on a single page.  Atomic only on this page.
+ */
+static void
+TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
+						   TransactionId *subxids, XidStatus status,
+						   XLogRecPtr lsn, int pageno,
+						   bool all_xact_same_page)
+{
+	/* Can't use group update when PGPROC overflows. */
+	StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
+					 "group clog threshold less than PGPROC cached subxids");
+
+	/*
+	 * When there is contention on CLogControlLock, we try to group multiple
+	 * updates; a single leader process will perform transaction status
+	 * updates for multiple backends so that the number of times
+	 * CLogControlLock needs to be acquired is reduced.
+	 *
+	 * For this optimization to be safe, the XID in MyPgXact and the subxids
+	 * in MyProc must be the same as the ones for which we're setting the
+	 * status.  Check that this is the case.
+	 *
+	 * For this optimization to be efficient, we shouldn't have too many
+	 * sub-XIDs and all of the XIDs for which we're adjusting clog should be
+	 * on the same page.  Check those conditions, too.
+	 */
+	if (all_xact_same_page && xid == MyPgXact->xid &&
+		nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
+		nsubxids == MyPgXact->nxids &&
+		memcmp(subxids, MyProc->subxids.xids,
+			   nsubxids * sizeof(TransactionId)) == 0)
+	{
+		/*
+		 * We don't try to do group update optimization if a process has
+		 * overflowed the subxids array in its PGPROC, since in that case we
+		 * don't have a complete list of XIDs for it.
+		 */
+		Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
+
+		/*
+		 * If we can immediately acquire CLogControlLock, we update the status
+		 * of our own XID and release the lock.  If not, try use group XID
+		 * update.  If that doesn't work out, fall back to waiting for the
+		 * lock to perform an update for this transaction only.
+		 */
+		if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE))
+		{
+			/* Got the lock without waiting!  Do the update. */
+			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
+											   lsn, pageno);
+			LWLockRelease(CLogControlLock);
+			return;
+		}
+		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
+		{
+			/* Group update mechanism has done the work. */
+			return;
+		}
+
+		/* Fall through only if update isn't done yet. */
+	}
+
+	/* Group update not applicable, or couldn't accept this page number. */
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
+									   lsn, pageno);
+	LWLockRelease(CLogControlLock);
+}
+
+/*
+ * Record the final state of transaction entry in the commit log
+ *
+ * We don't do any locking here; caller must handle that.
+ */
+static void
+TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
+								   TransactionId *subxids, XidStatus status,
+								   XLogRecPtr lsn, int pageno)
+>>>>>>> baaf272ac908ea27c09076e34f62c45fa7d1e448
+{
+	int			slotno;
+	int			i;
+
+<<<<<<< HEAD
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
+
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+=======
+	Assert(status == TRANSACTION_STATUS_COMMITTED ||
+		   status == TRANSACTION_STATUS_ABORTED ||
+		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
+	Assert(LWLockHeldByMeInMode(CLogControlLock, LW_EXCLUSIVE));
+>>>>>>> baaf272ac908ea27c09076e34f62c45fa7d1e448
+
+	/*
+	 * If we're doing an async commit (ie, lsn is valid), then we must wait
+	 * for any active write on the page slot to complete.  Otherwise our
+	 * update could reach disk in that write, which will not do since we
+	 * mustn't let it reach disk until we've done the appropriate WAL flush.
+	 * But when lsn is invalid, it's OK to scribble on a page while it is
+	 * write-busy, since we don't care if the update reaches disk sooner than
+	 * we think.
+	 */
+	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+
+	/* Set the main transaction id, if any. */
+	if (TransactionIdIsValid(xid))
+		CLogSetStatusBit(xid, status, lsn, slotno);
+
+	/* Set the subtransactions */
+	for (i = 0; i < nsubxids; i++)
+	{
+		Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+		CLogSetStatusBit(subxids[i], status, lsn, slotno);
+	}
+
+	ClogCtl->shared->page_dirty[slotno] = true;
+}
+
+/*
+ * When we cannot immediately acquire CLogControlLock in exclusive mode at
+ * commit time, add ourselves to a list of processes that need their XIDs
+ * status update.  The first process to add itself to the list will acquire
+ * CLogControlLock in exclusive mode and set transaction status as required
+ * on behalf of all group members.  This avoids a great deal of contention
+ * around CLogControlLock when many processes are trying to commit at once,
+ * since the lock need not be repeatedly handed off from one committing
+ * process to the next.
+ *
+ * Returns true when transaction status has been updated in clog; returns
+ * false if we decided against applying the optimization because the page
+ * number we need to update differs from those processes already waiting.
+ */
+static bool
+TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
+								XLogRecPtr lsn, int pageno)
+{
+	volatile PROC_HDR *procglobal = ProcGlobal;
+	PGPROC	   *proc = MyProc;
+	uint32		nextidx;
+	uint32		wakeidx;
+
+	/* We should definitely have an XID whose status needs to be updated. */
+	Assert(TransactionIdIsValid(xid));
+
+	/*
+	 * Add ourselves to the list of processes needing a group XID status
+	 * update.
+	 */
+	proc->clogGroupMember = true;
+	proc->clogGroupMemberXid = xid;
+	proc->clogGroupMemberXidStatus = status;
+	proc->clogGroupMemberPage = pageno;
+	proc->clogGroupMemberLsn = lsn;
+
+	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
+
+	while (true)
+	{
+		/*
+		 * Add the proc to list, if the clog page where we need to update the
+		 * current transaction status is same as group leader's clog page.
+		 *
+		 * There is a race condition here, which is that after doing the below
+		 * check and before adding this proc's clog update to a group, the
+		 * group leader might have already finished the group update for this
+		 * page and becomes group leader of another group. This will lead to a
+		 * situation where a single group can have different clog page
+		 * updates.  This isn't likely and will still work, just maybe a bit
+		 * less efficiently.
+		 */
+		if (nextidx != INVALID_PGPROCNO &&
+			ProcGlobal->allProcs[nextidx].clogGroupMemberPage != proc->clogGroupMemberPage)
+		{
+			proc->clogGroupMember = false;
+			return false;
+		}
+
+		pg_atomic_write_u32(&proc->clogGroupNext, nextidx);
+
+		if (pg_atomic_compare_exchange_u32(&procglobal->clogGroupFirst,
+										   &nextidx,
+										   (uint32) proc->pgprocno))
+			break;
+	}
+
+	/*
+	 * If the list was not empty, the leader will update the status of our
+	 * XID. It is impossible to have followers without a leader because the
+	 * first process that has added itself to the list will always have
+	 * nextidx as INVALID_PGPROCNO.
+	 */
+	if (nextidx != INVALID_PGPROCNO)
+	{
+		int			extraWaits = 0;
+
+		/* Sleep until the leader updates our XID status. */
+		pgstat_report_wait_start(WAIT_EVENT_CLOG_GROUP_UPDATE);
+		for (;;)
+		{
+			/* acts as a read barrier */
+			PGSemaphoreLock(proc->sem);
+			if (!proc->clogGroupMember)
+				break;
+			extraWaits++;
+		}
+		pgstat_report_wait_end();
+
+		Assert(pg_atomic_read_u32(&proc->clogGroupNext) == INVALID_PGPROCNO);
+
+		/* Fix semaphore count for any absorbed wakeups */
+		while (extraWaits-- > 0)
+			PGSemaphoreUnlock(proc->sem);
+		return true;
+	}
+
+	/* We are the leader.  Acquire the lock on behalf of everyone. */
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Now that we've got the lock, clear the list of processes waiting for
+	 * group XID status update, saving a pointer to the head of the list.
+	 * Trying to pop elements one at a time could lead to an ABA problem.
+	 */
+	nextidx = pg_atomic_exchange_u32(&procglobal->clogGroupFirst,
+									 INVALID_PGPROCNO);
+
+	/* Remember head of list so we can perform wakeups after dropping lock. */
+	wakeidx = nextidx;
+
+	/* Walk the list and update the status of all XIDs. */
+	while (nextidx != INVALID_PGPROCNO)
+	{
+		PGPROC	   *proc = &ProcGlobal->allProcs[nextidx];
+		PGXACT	   *pgxact = &ProcGlobal->allPgXact[nextidx];
+
+		/*
+		 * Overflowed transactions should not use group XID status update
+		 * mechanism.
+		 */
+		Assert(!pgxact->overflowed);
+
+		TransactionIdSetPageStatusInternal(proc->clogGroupMemberXid,
+										   pgxact->nxids,
+										   proc->subxids.xids,
+										   proc->clogGroupMemberXidStatus,
+										   proc->clogGroupMemberLsn,
+										   proc->clogGroupMemberPage);
+
+		/* Move to next proc in list. */
+		nextidx = pg_atomic_read_u32(&proc->clogGroupNext);
+	}
+
+	/* We're done with the lock now. */
+	LWLockRelease(CLogControlLock);
+
+	/*
+	 * Now that we've released the lock, go back and wake everybody up.  We
+	 * don't do this under the lock so as to keep lock hold times to a
+	 * minimum.
+	 */
+	while (wakeidx != INVALID_PGPROCNO)
+	{
+		PGPROC	   *proc = &ProcGlobal->allProcs[wakeidx];
+
+		wakeidx = pg_atomic_read_u32(&proc->clogGroupNext);
+		pg_atomic_write_u32(&proc->clogGroupNext, INVALID_PGPROCNO);
+
+		/* ensure all previous writes are visible before follower continues. */
+		pg_write_barrier();
+
+		proc->clogGroupMember = false;
+
+		if (proc != MyProc)
+			PGSemaphoreUnlock(proc->sem);
+	}
+
+	return true;
+}
+
+/*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CLogControlLock held
+ */
+static void
+CLogSetStatusBit(TransactionId xid, CLogXidStatus status, XLogRecPtr lsn, int slotno)
+{
+	int			byteno = TransactionIdToByte(xid);
+	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+	char	   *byteptr;
+	char		byteval;
+	char		curval;
+
+	byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
+	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+
+	/*
+	 * Current state change should be from 0 or subcommitted to target state
+	 * or we should already be there when replaying changes during recovery.
+	 */
+	Assert(curval == 0 ||
+		   (curval == CLOG_XID_STATUS_SUB_COMMITTED &&
+			status != CLOG_XID_STATUS_IN_PROGRESS) ||
+		   curval == status);
+
+	/* note this assumes exclusive access to the clog page */
+	byteval = *byteptr;
+	byteval &= ~(((1 << CLOG_BITS_PER_XACT) - 1) << bshift);
+	byteval |= (status << bshift);
+	*byteptr = byteval;
+
+	/*
+	 * Update the group LSN if the transaction completion LSN is higher.
+	 *
+	 * Note: lsn will be invalid when supplied during InRecovery processing,
+	 * so we don't need to do anything special to avoid LSN updates during
+	 * recovery. After recovery completes the next clog change will set the
+	 * LSN correctly.
+	 */
+	if (!XLogRecPtrIsInvalid(lsn))
+	{
+		int			lsnindex = GetLSNIndex(slotno, xid);
+
+		if (ClogCtl->shared->group_lsn[lsnindex] < lsn)
+			ClogCtl->shared->group_lsn[lsnindex] = lsn;
+	}
+}
+
+/*
+ * Interrogate the state of a transaction in the commit log.
+ *
+ * Aside from the actual commit status, this function returns (into *lsn)
+ * an LSN that is late enough to be able to guarantee that if we flush up to
+ * that LSN then we will have flushed the transaction's commit record to disk.
+ * The result is not necessarily the exact LSN of the transaction's commit
+ * record!	For example, for long-past transactions (those whose clog pages
+ * already migrated to disk), we'll return InvalidXLogRecPtr.  Also, because
+ * we group transactions on the same clog page to conserve storage, we might
+ * return the LSN of a later transaction that falls into the same group.
+ *
+ * NB: this is a low-level routine and is NOT the preferred entry point
+ * for most uses; TransactionLogFetch() in transam.c is the intended caller.
+ */
+CLogXidStatus
+CLogGetStatus(TransactionId xid, XLogRecPtr *lsn)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			byteno = TransactionIdToByte(xid);
+	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+	int			slotno;
+	int			lsnindex;
+	char	   *byteptr;
+	CLogXidStatus	status;
+
+	/* lock is acquired by SimpleLruReadPage_ReadOnly */
+
+	slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, xid);
+	byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
+
+	status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+
+	lsnindex = GetLSNIndex(slotno, xid);
+	*lsn = ClogCtl->shared->group_lsn[lsnindex];
+
+	LWLockRelease(CLogControlLock);
+
+	return status;
+}
+
+/*
+ * Number of shared CLOG buffers.
+ *
+ * On larger multi-processor systems, it is possible to have many CLOG page
+ * requests in flight at one time which could lead to disk access for CLOG
+ * page if the required page is not found in memory.  Testing revealed that we
+ * can get the best performance by having 128 CLOG buffers, more than that it
+ * doesn't improve performance.
+ *
+ * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
+ * a good idea, because it would increase the minimum amount of shared memory
+ * required to start, which could be a problem for people running very small
+ * configurations.  The following formula seems to represent a reasonable
+ * compromise: people with very low values for shared_buffers will get fewer
+ * CLOG buffers as well, and everyone else will get 128.
+ */
+Size
+CLOGShmemBuffers(void)
+{
+	return Min(8192, Max(4, NBuffers / 512));
+}
+
+/*
+ * Initialization of shared memory for CLOG
+ */
+Size
+CLOGShmemSize(void)
+{
+	return SimpleLruShmemSize(CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE);
+}
+
+void
+CLOGShmemInit(void)
+{
+	ClogCtl->PagePrecedes = CLOGPagePrecedes;
+	SimpleLruInit(ClogCtl, "clog", CLOGShmemBuffers(), CLOG_LSNS_PER_PAGE,
+				  CLogControlLock, "pg_xact", LWTRANCHE_CLOG_BUFFERS);
+}
+
+/*
+ * This func must be called ONCE on system install.  It creates
+ * the initial CLOG segment.  (The CLOG directory is assumed to
+ * have been created by initdb, and CLOGShmemInit must have been
+ * called already.)
+ */
+void
+BootStrapCLOG(void)
+{
+	int			slotno;
+
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+	/* Create and zero the first page of the commit log */
+	slotno = ZeroCLOGPage(0, false);
+
+	/* Make sure it's written out */
+	SimpleLruWritePage(ClogCtl, slotno);
+	Assert(!ClogCtl->shared->page_dirty[slotno]);
+
+	LWLockRelease(CLogControlLock);
+}
+
+/*
+ * Initialize (or reinitialize) a page of CLOG to zeroes.
+ * If writeXlog is TRUE, also emit an XLOG record saying we did this.
+ *
+ * The page is not actually written, just set up in shared memory.
+ * The slot number of the new page is returned.
+ *
+ * Control lock must be held at entry, and will be held at exit.
+ */
+static int
+ZeroCLOGPage(int pageno, bool writeXlog)
+{
+	int			slotno;
+
+	slotno = SimpleLruZeroPage(ClogCtl, pageno);
+
+	if (writeXlog)
+		WriteZeroPageXlogRec(pageno);
+
+	return slotno;
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend startup,
+ * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+ */
+void
+StartupCLOG(void)
+{
+	TransactionId xid = ShmemVariableCache->nextXid;
+	int			pageno = TransactionIdToPage(xid);
+
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Initialize our idea of the latest page number.
+	 */
+	ClogCtl->shared->latest_page_number = pageno;
+
+	LWLockRelease(CLogControlLock);
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ */
+void
+TrimCLOG(void)
+{
+	TransactionId xid = ShmemVariableCache->nextXid;
+	int			pageno = TransactionIdToPage(xid);
+
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Re-Initialize our idea of the latest page number.
+	 */
+	ClogCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Zero out the remainder of the current clog page.  Under normal
+	 * circumstances it should be zeroes already, but it seems at least
+	 * theoretically possible that XLOG replay will have settled on a nextXID
+	 * value that is less than the last XID actually used and marked by the
+	 * previous database lifecycle (since subtransaction commit writes clog
+	 * but makes no WAL entry).  Let's just be safe. (We need not worry about
+	 * pages beyond the current one, since those will be zeroed when first
+	 * used.  For the same reason, there is no need to do anything when
+	 * nextXid is exactly at a page boundary; and it's likely that the
+	 * "current" page doesn't exist yet in that case.)
+	 */
+	if (TransactionIdToPgIndex(xid) != 0)
+	{
+		int			byteno = TransactionIdToByte(xid);
+		int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
+		int			slotno;
+		char	   *byteptr;
+
+		slotno = SimpleLruReadPage(ClogCtl, pageno, false, xid);
+		byteptr = ClogCtl->shared->page_buffer[slotno] + byteno;
+
+		/* Zero so-far-unused positions in the current byte */
+		*byteptr &= (1 << bshift) - 1;
+		/* Zero the rest of the page */
+		MemSet(byteptr + 1, 0, BLCKSZ - byteno - 1);
+
+		ClogCtl->shared->page_dirty[slotno] = true;
+	}
+
+	LWLockRelease(CLogControlLock);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend shutdown
+ */
+void
+ShutdownCLOG(void)
+{
+	/* Flush dirty CLOG pages to disk */
+	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(false);
+	SimpleLruFlush(ClogCtl, false);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
+	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(false);
+}
+
+/*
+ * Perform a checkpoint --- either during shutdown, or on-the-fly
+ */
+void
+CheckPointCLOG(void)
+{
+	/* Flush dirty CLOG pages to disk */
+	TRACE_POSTGRESQL_CLOG_CHECKPOINT_START(true);
+	SimpleLruFlush(ClogCtl, true);
+
+	/*
+	 * fsync pg_xact to ensure that any files flushed previously are durably
+	 * on disk.
+	 */
+	fsync_fname("pg_xact", true);
+
+	TRACE_POSTGRESQL_CLOG_CHECKPOINT_DONE(true);
+}
+
+
+/*
+ * Make sure that CLOG has room for a newly-allocated XID.
+ *
+ * NB: this is called while holding XidGenLock.  We want it to be very fast
+ * most of the time; even when it's not so fast, no actual I/O need happen
+ * unless we're forced to write out a dirty clog or xlog page to make room
+ * in shared memory.
+ */
+void
+ExtendCLOG(TransactionId newestXact)
+{
+	int			pageno;
+
+	/*
+	 * No work except at first XID of a page.  But beware: just after
+	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
+	 */
+	if (TransactionIdToPgIndex(newestXact) != 0 &&
+		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
+		return;
+
+	pageno = TransactionIdToPage(newestXact);
+
+	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+	/* Zero the page and make an XLOG entry about it */
+	ZeroCLOGPage(pageno, true);
+
+	LWLockRelease(CLogControlLock);
+}
+
+
+/*
+ * Remove all CLOG segments before the one holding the passed transaction ID
+ *
+ * Before removing any CLOG data, we must flush XLOG to disk, to ensure
+ * that any recently-emitted HEAP_FREEZE records have reached disk; otherwise
+ * a crash and restart might leave us with some unfrozen tuples referencing
+ * removed CLOG data.  We choose to emit a special TRUNCATE XLOG record too.
+ * Replaying the deletion from XLOG is not critical, since the files could
+ * just as well be removed later, but doing so prevents a long-running hot
+ * standby server from acquiring an unreasonably bloated CLOG directory.
+ *
+ * Since CLOG segments hold a large number of transactions, the opportunity to
+ * actually remove a segment is fairly rare, and so it seems best not to do
+ * the XLOG flush unless we have confirmed that there is a removable segment.
+ */
+void
+TruncateCLOG(TransactionId oldestXact, Oid oldestxid_datoid)
+{
+	int			cutoffPage;
+
+	/*
+	 * The cutoff point is the start of the segment containing oldestXact. We
+	 * pass the *page* containing oldestXact to SimpleLruTruncate.
+	 */
+	cutoffPage = TransactionIdToPage(oldestXact);
+
+	/* Check to see if there's any files that could be removed */
+	if (!SlruScanDirectory(ClogCtl, SlruScanDirCbReportPresence, &cutoffPage))
+		return;					/* nothing to remove */
+
+	/*
+	 * Advance oldestClogXid before truncating clog, so concurrent xact status
+	 * lookups can ensure they don't attempt to access truncated-away clog.
+	 *
+	 * It's only necessary to do this if we will actually truncate away clog
+	 * pages.
+	 */
+	AdvanceOldestClogXid(oldestXact);
+
+	/*
+	 * Write XLOG record and flush XLOG to disk. We record the oldest xid
+	 * we're keeping information about here so we can ensure that it's always
+	 * ahead of clog truncation in case we crash, and so a standby finds out
+	 * the new valid xid before the next checkpoint.
+	 */
+	WriteTruncateXlogRec(cutoffPage, oldestXact, oldestxid_datoid);
+
+	/* Now we can remove the old CLOG segment(s) */
+	SimpleLruTruncate(ClogCtl, cutoffPage);
+}
+
+
+/*
+ * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ *
+ * We need to use comparison of TransactionIds here in order to do the right
+ * thing with wraparound XID arithmetic.  However, if we are asked about
+ * page number zero, we don't want to hand InvalidTransactionId to
+ * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
+ * offset both xids by FirstNormalTransactionId to avoid that.
+ */
+static bool
+CLOGPagePrecedes(int page1, int page2)
+{
+	TransactionId xid1;
+	TransactionId xid2;
+
+	xid1 = ((TransactionId) page1) * CLOG_XACTS_PER_PAGE;
+	xid1 += FirstNormalTransactionId;
+	xid2 = ((TransactionId) page2) * CLOG_XACTS_PER_PAGE;
+	xid2 += FirstNormalTransactionId;
+
+	return TransactionIdPrecedes(xid1, xid2);
+}
+
+
+/*
+ * Write a ZEROPAGE xlog record
+ */
+static void
+WriteZeroPageXlogRec(int pageno)
+{
+	XLogBeginInsert();
+	XLogRegisterData((char *) (&pageno), sizeof(int));
+	(void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE);
+}
+
+/*
+ * Write a TRUNCATE xlog record
+ *
+ * We must flush the xlog record to disk before returning --- see notes
+ * in TruncateCLOG().
+ */
+static void
+WriteTruncateXlogRec(int pageno, TransactionId oldestXact, Oid oldestXactDb)
+{
+	XLogRecPtr	recptr;
+	xl_clog_truncate xlrec;
+
+	xlrec.pageno = pageno;
+	xlrec.oldestXact = oldestXact;
+	xlrec.oldestXactDb = oldestXactDb;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) (&xlrec), sizeof(xl_clog_truncate));
+	recptr = XLogInsert(RM_CLOG_ID, CLOG_TRUNCATE);
+	XLogFlush(recptr);
+}
+
+/*
+ * CLOG resource manager's routines
+ */
+void
+clog_redo(XLogReaderState *record)
+{
+	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;
+
+	/* Backup blocks are not used in clog records */
+	Assert(!XLogRecHasAnyBlockRefs(record));
+
+	if (info == CLOG_ZEROPAGE)
+	{
+		int			pageno;
+		int			slotno;
+
+		memcpy(&pageno, XLogRecGetData(record), sizeof(int));
+
+		LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
+
+		slotno = ZeroCLOGPage(pageno, false);
+		SimpleLruWritePage(ClogCtl, slotno);
+		Assert(!ClogCtl->shared->page_dirty[slotno]);
+
+		LWLockRelease(CLogControlLock);
+	}
+	else if (info == CLOG_TRUNCATE)
+	{
+		xl_clog_truncate xlrec;
+
+		memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_clog_truncate));
+
+		/*
+		 * During XLOG replay, latest_page_number isn't set up yet; insert a
+		 * suitable value to bypass the sanity test in SimpleLruTruncate.
+		 */
+		ClogCtl->shared->latest_page_number = xlrec.pageno;
+
+		AdvanceOldestClogXid(xlrec.oldestXact);
+
+		SimpleLruTruncate(ClogCtl, xlrec.pageno);
+	}
+	else
+		elog(PANIC, "clog_redo: unknown op code %u", info);
+}
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 60fb9eeb06..8cf821be1b 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -26,6 +26,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
diff --git a/src/backend/access/transam/csnlog.c b/src/backend/access/transam/csnlog.c
new file mode 100644
index 0000000000..0a6ce3ec25
--- /dev/null
+++ b/src/backend/access/transam/csnlog.c
@@ -0,0 +1,762 @@
+/*-------------------------------------------------------------------------
+ *
+ * csnlog.c
+ *		Tracking Commit-Sequence-Numbers and in-progress subtransactions
+ *
+ * The pg_csnlog manager is a pg_clog-like manager that stores the commit
+ * sequence number, or parent transaction Id, for each transaction.  It is
+ * a fundamental part of MVCC.
+ *
+ * The csnlog serves two purposes:
+ *
+ * 1. While a transaction is in progress, it stores the parent transaction
+ * Id for each in-progress subtransaction. A main transaction has a parent
+ * of InvalidTransactionId, and each subtransaction has its immediate
+ * parent. The tree can easily be walked from child to parent, but not in
+ * the opposite direction.
+ *
+ * 2. After a transaction has committed, it stores the Commit Sequence
+ * Number of the commit.
+ *
+ * We can use the same structure for both, because we don't care about the
+ * parent-child relationships subtransaction after commit.
+ *
+ * This code is based on clog.c, but the robustness requirements
+ * are completely different from pg_clog, because we only need to remember
+ * pg_csnlog information for currently-open and recently committed
+ * transactions.  Thus, there is no need to preserve data over a crash and
+ * restart.
+ *
+ * There are no XLOG interactions since we do not care about preserving
+ * data across crashes.  During database startup, we simply force the
+ * currently-active page of CSNLOG to zeroes.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/access/transam/csnlog.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
+#include "access/slru.h"
+#include "access/subtrans.h"
+#include "access/transam.h"
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "utils/snapmgr.h"
+
+/*
+ * Defines for CSNLOG page sizes.  A page is the same BLCKSZ as is used
+ * everywhere else in Postgres.
+ *
+ * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
+ * CSNLOG page numbering also wraps around at 0xFFFFFFFF/CSNLOG_XACTS_PER_PAGE,
+ * and CSNLOG segment numbering at
+ * 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
+ * explicit notice of that fact in this module, except when comparing segment
+ * and page numbers in TruncateCSNLOG (see CSNLOGPagePrecedes).
+ */
+
+/* We store the commit LSN for each xid */
+#define CSNLOG_XACTS_PER_PAGE (BLCKSZ / sizeof(CommitSeqNo))
+
+#define TransactionIdToPage(xid)	((xid) / (TransactionId) CSNLOG_XACTS_PER_PAGE)
+#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CSNLOG_XACTS_PER_PAGE)
+
+/* We allocate new log pages in batches */
+#define BATCH_SIZE 128
+
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
+static SlruCtlData CsnlogCtlData;
+
+#define CsnlogCtl (&CsnlogCtlData)
+
+
+static int	ZeroCSNLOGPage(int pageno);
+static bool CSNLOGPagePrecedes(int page1, int page2);
+static void CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+					TransactionId *subxids,
+					CommitSeqNo csn, int pageno);
+static void CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno);
+static CommitSeqNo InternalGetCommitSeqNo(TransactionId xid);
+static CommitSeqNo RecursiveGetCommitSeqNo(TransactionId xid);
+
+/*
+ * CSNLogSetCommitSeqNo
+ *
+ * Record the status and CSN of transaction entries in the commit log for a
+ * transaction and its subtransaction tree. Take care to ensure this is
+ * efficient, and as atomic as possible.
+ *
+ * xid is a single xid to set status for. This will typically be the
+ * top level transactionid for a top level commit or abort. It can
+ * also be a subtransaction when we record transaction aborts.
+ *
+ * subxids is an array of xids of length nsubxids, representing subtransactions
+ * in the tree of xid. In various cases nsubxids may be zero.
+ *
+ * csn is the commit sequence number of the transaction. It should be
+ * InvalidCommitSeqNo for abort cases.
+ *
+ * Note: This doesn't guarantee atomicity. The caller can use the
+ * COMMITSEQNO_COMMITTING special value for that.
+ */
+void
+CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn)
+{
+	int nextSubxid;
+	int topPage;
+	TransactionId topXid;
+	TransactionId oldestActiveXid = pg_atomic_read_u32(
+				&ShmemVariableCache->oldestActiveXid);
+				
+	Assert(!TransactionIdIsNormal(xid)
+		   || TransactionIdPrecedesOrEquals(oldestActiveXid, xid));
+
+	if (csn == InvalidCommitSeqNo || xid == BootstrapTransactionId)
+	{
+		if (IsBootstrapProcessingMode())
+			csn = COMMITSEQNO_FROZEN;
+		else
+			elog(ERROR, "cannot mark transaction committed without CSN");
+	}
+
+	/*
+	 * We set the status of child transaction before the status of parent
+	 * transactions, so that another process can correctly determine the
+	 * resulting status of a child transaction. See RecursiveGetCommitSeqNo().
+	 */
+	topXid = InvalidTransactionId;
+	topPage = TransactionIdToPage(xid);
+	nextSubxid = nsubxids - 1;
+	do
+	{
+		int currentPage = topPage;
+		int subxidsOnPage = 0;
+		for (; nextSubxid >= 0; nextSubxid--)
+		{
+			int subxidPage = TransactionIdToPage(subxids[nextSubxid]);
+
+			if (subxidsOnPage == 0)
+				currentPage = subxidPage;
+
+			if (currentPage != subxidPage)
+				break;
+
+			subxidsOnPage++;
+		}
+
+		if (currentPage == topPage)
+		{
+			Assert(topXid == InvalidTransactionId);
+			topXid = xid;
+		}
+
+		CSNLogSetPageStatus(topXid, subxidsOnPage, subxids + nextSubxid + 1,
+							csn, currentPage);
+	}
+	while (nextSubxid >= 0);
+
+	if (topXid == InvalidTransactionId)
+	{
+		/*
+		 * No subxids were on the same page as the main xid; we have to update
+		 * it separately
+		 */
+		CSNLogSetPageStatus(xid, 0, NULL, csn, topPage);
+	}
+}
+
+/*
+ * Record the final state of transaction entries in the csn log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as TransactionIdSetTreeStatus()
+ */
+static void
+CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+					TransactionId *subxids,
+					CommitSeqNo csn, int pageno)
+{
+	int			slotno;
+	int			i;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+	/*
+	 * We set the status of child transaction before the status of parent
+	 * transactions, so that another process can correctly determine the
+	 * resulting status of a child transaction. See RecursiveGetCommitSeqNo().
+	 */
+	for (i = nsubxids - 1; i >= 0; i--)
+	{
+		Assert(CsnlogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+		CSNLogSetCSN(subxids[i], csn, slotno);
+		pg_write_barrier();
+	}
+
+	if (TransactionIdIsValid(xid))
+		CSNLogSetCSN(xid, csn, slotno);
+
+	CsnlogCtl->shared->page_dirty[slotno] = true;
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+
+/*
+ * Record the parent of a subtransaction in the subtrans log.
+ *
+ * In some cases we may need to overwrite an existing value.
+ */
+void
+SubTransSetParent(TransactionId xid, TransactionId parent)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+	CommitSeqNo *ptr;
+	CommitSeqNo newcsn;
+
+	Assert(TransactionIdIsValid(parent));
+	Assert(TransactionIdFollows(xid, parent));
+
+	newcsn = CSN_SUBTRANS_BIT | (uint64) parent;
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	slotno = SimpleLruReadPage(CsnlogCtl, pageno, true, xid);
+	ptr = (CommitSeqNo *) CsnlogCtl->shared->page_buffer[slotno];
+	ptr += entryno;
+
+	/*
+	 * It's possible we'll try to set the parent xid multiple times but we
+	 * shouldn't ever be changing the xid from one valid xid to another valid
+	 * xid, which would corrupt the data structure.
+	 */
+	if (*ptr != newcsn)
+	{
+		Assert(*ptr == COMMITSEQNO_INPROGRESS);
+		*ptr = newcsn;
+		CsnlogCtl->shared->page_dirty[slotno] = true;
+	}
+
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Interrogate the parent of a transaction in the csnlog.
+ */
+TransactionId
+SubTransGetParent(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	csn = InternalGetCommitSeqNo(xid);
+
+	LWLockRelease(CSNLogControlLock);
+
+	if (COMMITSEQNO_IS_SUBTRANS(csn))
+		return (TransactionId) (csn & 0xFFFFFFFF);
+	else
+		return InvalidTransactionId;
+}
+
+/*
+ * SubTransGetTopmostTransaction
+ *
+ * Returns the topmost transaction of the given transaction id.
+ *
+ * Because we cannot look back further than TransactionXmin, it is possible
+ * that this function will lie and return an intermediate subtransaction ID
+ * instead of the true topmost parent ID.  This is OK, because in practice
+ * we only care about detecting whether the topmost parent is still running
+ * or is part of a current snapshot's list of still-running transactions.
+ * Therefore, any XID before TransactionXmin is as good as any other.
+ */
+TransactionId
+SubTransGetTopmostTransaction(TransactionId xid)
+{
+	TransactionId parentXid = xid,
+				previousXid = xid;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	while (TransactionIdIsValid(parentXid))
+	{
+		previousXid = parentXid;
+		if (TransactionIdPrecedes(parentXid, TransactionXmin))
+			break;
+		parentXid = SubTransGetParent(parentXid);
+
+		/*
+		 * By convention the parent xid gets allocated first, so should always
+		 * precede the child xid. Anything else points to a corrupted data
+		 * structure that could lead to an infinite loop, so exit.
+		 */
+		if (!TransactionIdPrecedes(parentXid, previousXid))
+			elog(ERROR, "pg_csnlog contains invalid entry: xid %u points to parent xid %u",
+				 previousXid, parentXid);
+	}
+
+	Assert(TransactionIdIsValid(previousXid));
+
+	return previousXid;
+}
+
+/*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CSNLogControlLock held
+ */
+static void
+CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno)
+{
+	int			entryno = TransactionIdToPgIndex(xid);
+	CommitSeqNo *ptr;
+
+	ptr = (CommitSeqNo *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+	/*
+	 * Current state change should be from 0 to target state. (Allow setting
+	 * it again to same value.)
+	 */
+	Assert(COMMITSEQNO_IS_INPROGRESS(*ptr) ||
+		   COMMITSEQNO_IS_COMMITTING(*ptr) ||
+		   COMMITSEQNO_IS_SUBTRANS(*ptr) ||
+		   *ptr == csn);
+
+	*ptr = csn;
+}
+
+/*
+ * Interrogate the state of a transaction in the commit log.
+ *
+ * Aside from the actual commit status, this function returns (into *lsn)
+ * an LSN that is late enough to be able to guarantee that if we flush up to
+ * that LSN then we will have flushed the transaction's commit record to disk.
+ * The result is not necessarily the exact LSN of the transaction's commit
+ * record!	For example, for long-past transactions (those whose clog pages
+ * already migrated to disk), we'll return InvalidXLogRecPtr.  Also, because
+ * we group transactions on the same clog page to conserve storage, we might
+ * return the LSN of a later transaction that falls into the same group.
+ *
+ * NB: this is a low-level routine and is NOT the preferred entry point
+ * for most uses; TransactionIdGetCommitSeqNo() in transam.c is the intended caller.
+ */
+CommitSeqNo
+CSNLogGetCommitSeqNo(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	csn = RecursiveGetCommitSeqNo(xid);
+
+	LWLockRelease(CSNLogControlLock);
+
+	return csn;
+}
+
+/* Determine the CSN of a transaction, walking the subtransaction tree if needed */
+static CommitSeqNo
+RecursiveGetCommitSeqNo(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	csn = InternalGetCommitSeqNo(xid);
+
+	if (COMMITSEQNO_IS_SUBTRANS(csn))
+	{
+		TransactionId parentXid = csn & ~CSN_SUBTRANS_BIT;
+		CommitSeqNo parentCsn = RecursiveGetCommitSeqNo(parentXid);
+
+		Assert(!COMMITSEQNO_IS_SUBTRANS(parentCsn));
+
+		/*
+		 * The parent and child transaction status update is not atomic. We
+		 * must take care not to use the updated parent status with the old
+		 * child status, or else we can wrongly see a committed subtransaction
+		 * as aborted. This happens when the parent is already marked as
+		 * committed and the child is not yet marked.
+		 */
+		pg_read_barrier();
+
+		csn = InternalGetCommitSeqNo(xid);
+
+		if (COMMITSEQNO_IS_SUBTRANS(csn))
+		{
+			if (COMMITSEQNO_IS_ABORTED(parentCsn)
+				|| COMMITSEQNO_IS_COMMITTED(parentCsn))
+			{
+				csn = COMMITSEQNO_ABORTED;
+			}
+			else if (COMMITSEQNO_IS_INPROGRESS(parentCsn))
+				csn = COMMITSEQNO_INPROGRESS;
+			else if (COMMITSEQNO_IS_COMMITTING(parentCsn))
+				csn = COMMITSEQNO_COMMITTING;
+			else
+				Assert(false);
+		}
+	}
+
+	return csn;
+}
+
+/*
+ * Get the raw CSN value.
+ */
+static CommitSeqNo
+InternalGetCommitSeqNo(TransactionId xid)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	if (!TransactionIdIsNormal(xid))
+	{
+		if (xid == InvalidTransactionId)
+			return COMMITSEQNO_ABORTED;
+		if (xid == FrozenTransactionId || xid == BootstrapTransactionId)
+			return COMMITSEQNO_FROZEN;
+	}
+
+	slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+	return *(CommitSeqNo *) (CsnlogCtl->shared->page_buffer[slotno]
+							 + entryno * sizeof(XLogRecPtr));
+}
+
+/*
+ * Find the next xid that is in progress.
+ * We do not care about the subtransactions, they are accounted for
+ * by their respective top-level transactions.
+ */
+TransactionId
+CSNLogGetNextActiveXid(TransactionId xid,
+					   TransactionId end)
+{
+	Assert(TransactionIdIsValid(TransactionXmin));
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	for (;;)
+	{
+		int pageno;
+		int slotno;
+		int entryno;
+
+		if (!TransactionIdPrecedes(xid, end))
+			goto end;
+
+		pageno = TransactionIdToPage(xid);
+		slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+		for (entryno = TransactionIdToPgIndex(xid); entryno < CSNLOG_XACTS_PER_PAGE;
+			 entryno++)
+		{
+			CommitSeqNo csn;
+
+			if (!TransactionIdPrecedes(xid, end))
+				goto end;
+
+			csn = *(XLogRecPtr *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+			if (COMMITSEQNO_IS_INPROGRESS(csn)
+				|| COMMITSEQNO_IS_COMMITTING(csn))
+			{
+				goto end;
+			}
+
+			TransactionIdAdvance(xid);
+		}
+	}
+
+end:
+	LWLockRelease(CSNLogControlLock);
+
+	return xid;
+}
+
+/*
+ * Number of shared CSNLOG buffers.
+ */
+Size
+CSNLOGShmemBuffers(void)
+{
+	return Min(128, Max(BATCH_SIZE, NBuffers / 512));
+}
+
+/*
+ * Initialization of shared memory for CSNLOG
+ */
+Size
+CSNLOGShmemSize(void)
+{
+	return SimpleLruShmemSize(CSNLOGShmemBuffers(), 0);
+}
+
+void
+CSNLOGShmemInit(void)
+{
+	CsnlogCtl->PagePrecedes = CSNLOGPagePrecedes;
+	SimpleLruInit(CsnlogCtl, "CSNLOG Ctl", CSNLOGShmemBuffers(), 0,
+				  CSNLogControlLock, "pg_csnlog", LWTRANCHE_CSNLOG_BUFFERS);
+}
+
+/*
+ * This func must be called ONCE on system install.  It creates
+ * the initial CSNLOG segment.  (The pg_csnlog directory is assumed to
+ * have been created by initdb, and CSNLOGShmemInit must have been
+ * called already.)
+ */
+void
+BootStrapCSNLOG(void)
+{
+	int			slotno;
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Create and zero the first page of the commit log */
+	slotno = ZeroCSNLOGPage(0);
+
+	/* Make sure it's written out */
+	SimpleLruWritePage(CsnlogCtl, slotno);
+	Assert(!CsnlogCtl->shared->page_dirty[slotno]);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Initialize (or reinitialize) a page of CLOG to zeroes.
+ * If writeXlog is TRUE, also emit an XLOG record saying we did this.
+ *
+ * The page is not actually written, just set up in shared memory.
+ * The slot number of the new page is returned.
+ *
+ * Control lock must be held at entry, and will be held at exit.
+ */
+static int
+ZeroCSNLOGPage(int pageno)
+{
+	return SimpleLruZeroPage(CsnlogCtl, pageno);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend startup,
+ * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+ *
+ * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
+ * if there are none.
+ */
+void
+StartupCSNLOG(TransactionId oldestActiveXID)
+{
+	int			startPage;
+	int			endPage;
+
+	/*
+	 * Since we don't expect pg_csnlog to be valid across crashes, we
+	 * initialize the currently-active page(s) to zeroes during startup.
+	 * Whenever we advance into a new page, ExtendCSNLOG will likewise zero
+	 * the new page without regard to whatever was previously on disk.
+	 */
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	startPage = TransactionIdToPage(oldestActiveXID);
+	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
+	endPage = ((endPage + BATCH_SIZE - 1) / BATCH_SIZE) * BATCH_SIZE;
+
+	while (startPage != endPage)
+	{
+		(void) ZeroCSNLOGPage(startPage);
+		startPage++;
+		/* must account for wraparound */
+		if (startPage > TransactionIdToPage(MaxTransactionId))
+			startPage = 0;
+	}
+	(void) ZeroCSNLOGPage(startPage);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend shutdown
+ */
+void
+ShutdownCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely as a debugging aid.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(false);
+	SimpleLruFlush(CsnlogCtl, false);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(false);
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ */
+void
+TrimCSNLOG(void)
+{
+	TransactionId xid = ShmemVariableCache->nextXid;
+	int			pageno = TransactionIdToPage(xid);
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Re-Initialize our idea of the latest page number.
+	 */
+	CsnlogCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Zero out the remainder of the current clog page.  Under normal
+	 * circumstances it should be zeroes already, but it seems at least
+	 * theoretically possible that XLOG replay will have settled on a nextXID
+	 * value that is less than the last XID actually used and marked by the
+	 * previous database lifecycle (since subtransaction commit writes clog
+	 * but makes no WAL entry).  Let's just be safe. (We need not worry about
+	 * pages beyond the current one, since those will be zeroed when first
+	 * used.  For the same reason, there is no need to do anything when
+	 * nextXid is exactly at a page boundary; and it's likely that the
+	 * "current" page doesn't exist yet in that case.)
+	 */
+	if (TransactionIdToPgIndex(xid) != 0)
+	{
+		int			entryno = TransactionIdToPgIndex(xid);
+		int			byteno = entryno * sizeof(XLogRecPtr);
+		int			slotno;
+		char	   *byteptr;
+
+		slotno = SimpleLruReadPage(CsnlogCtl, pageno, false, xid);
+
+		byteptr = CsnlogCtl->shared->page_buffer[slotno] + byteno;
+
+		/* Zero the rest of the page */
+		MemSet(byteptr, 0, BLCKSZ - byteno);
+
+		CsnlogCtl->shared->page_dirty[slotno] = true;
+	}
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Perform a checkpoint --- either during shutdown, or on-the-fly
+ */
+void
+CheckPointCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely to improve the odds that writing of dirty pages is done by
+	 * the checkpoint process and not by backends.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(true);
+	SimpleLruFlush(CsnlogCtl, true);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(true);
+}
+
+
+/*
+ * Make sure that CSNLOG has room for a newly-allocated XID.
+ *
+ * NB: this is called while holding XidGenLock.  We want it to be very fast
+ * most of the time; even when it's not so fast, no actual I/O need happen
+ * unless we're forced to write out a dirty clog or xlog page to make room
+ * in shared memory.
+ */
+void
+ExtendCSNLOG(TransactionId newestXact)
+{
+	int			i;
+	int			pageno;
+
+	/*
+	 * No work except at first XID of a page.  But beware: just after
+	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
+	 */
+	if (TransactionIdToPgIndex(newestXact) != 0 &&
+		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
+		return;
+
+	pageno = TransactionIdToPage(newestXact);
+
+	if (pageno % BATCH_SIZE)
+		return;
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Zero the page and make an XLOG entry about it */
+	for (i = pageno; i < pageno + BATCH_SIZE; i++)
+		ZeroCSNLOGPage(i);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Remove all CSNLOG segments before the one holding the passed transaction ID
+ *
+ * This is normally called during checkpoint, with oldestXact being the
+ * oldest TransactionXmin of any running transaction.
+ */
+void
+TruncateCSNLOG(TransactionId oldestXact)
+{
+	int			cutoffPage;
+
+	/*
+	 * The cutoff point is the start of the segment containing oldestXact. We
+	 * pass the *page* containing oldestXact to SimpleLruTruncate.
+	 */
+	cutoffPage = TransactionIdToPage(oldestXact);
+
+	SimpleLruTruncate(CsnlogCtl, cutoffPage);
+}
+
+
+/*
+ * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ *
+ * We need to use comparison of TransactionIds here in order to do the right
+ * thing with wraparound XID arithmetic.  However, if we are asked about
+ * page number zero, we don't want to hand InvalidTransactionId to
+ * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
+ * offset both xids by FirstNormalTransactionId to avoid that.
+ */
+static bool
+CSNLOGPagePrecedes(int page1, int page2)
+{
+	TransactionId xid1;
+	TransactionId xid2;
+
+	xid1 = ((TransactionId) page1) * CSNLOG_XACTS_PER_PAGE;
+	xid1 += FirstNormalTransactionId;
+	xid2 = ((TransactionId) page2) * CSNLOG_XACTS_PER_PAGE;
+	xid2 += FirstNormalTransactionId;
+
+	return TransactionIdPrecedes(xid1, xid2);
+}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 7142ecede0..d6db474d50 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -69,6 +69,7 @@
 #include "postgres.h"
 
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -513,9 +514,11 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
 
 	for (i = 0, j = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid) ||
+		TransactionIdStatus xidstatus = TransactionIdGetStatus(members[i].xid);
+
+		if (xidstatus == XID_INPROGRESS ||
 			(ISUPDATE_from_mxstatus(members[i].status) &&
-			 TransactionIdDidCommit(members[i].xid)))
+			 xidstatus == XID_COMMITTED))
 		{
 			newMembers[j].xid = members[i].xid;
 			newMembers[j++].status = members[i].status;
@@ -590,7 +593,7 @@ MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly)
 	 */
 	for (i = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid))
+		if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 		{
 			debug_elog4(DEBUG2, "IsRunning: member %d (%u) is running",
 						i, members[i].xid);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 77edc51e1c..3e58314df3 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -57,6 +57,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/hsearch.h"
 #include "miscadmin.h"
 
 
@@ -81,6 +82,13 @@ typedef struct SlruFlushData
 
 typedef struct SlruFlushData *SlruFlush;
 
+/* An entry of page-to-slot hash map */
+typedef struct PageSlotEntry
+{
+	int page;
+	int slot;
+} PageSlotEntry;
+
 /*
  * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
  * of arguments!
@@ -166,11 +174,24 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  LWLock *ctllock, const char *subdir, int tranche_id)
 {
 	SlruShared	shared;
+	char *hashName;
+	HTAB *htab;
 	bool		found;
+	HASHCTL info;
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
 										  &found);
+	hashName = psprintf("%s_hash", name);
+
+	MemSet(&info, 0, sizeof(info));
+	info.keysize = sizeof(((PageSlotEntry*)0)->page);
+	info.entrysize = sizeof(PageSlotEntry);
+
+	htab = ShmemInitHash(hashName, nslots, nslots, &info,
+						 HASH_ELEM | HASH_BLOBS | HASH_FIXED_SIZE);
+
+	pfree(hashName);
 
 	if (!IsUnderPostmaster)
 	{
@@ -247,6 +268,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 * assume caller set PagePrecedes.
 	 */
 	ctl->shared = shared;
+	ctl->pageToSlot = htab;
 	ctl->do_fsync = true;		/* default behavior */
 	StrNCpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -264,6 +286,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	PageSlotEntry *entry = NULL;
 
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
@@ -273,7 +296,16 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 		   shared->page_number[slotno] == pageno);
 
 	/* Mark the slot as containing this page */
+	if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+	{
+		int oldpageno = shared->page_number[slotno];
+		entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+		Assert(entry != NULL);
+	}
+
 	shared->page_number[slotno] = pageno;
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+	entry->slot = slotno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
 	SlruRecentlyUsed(shared, slotno);
@@ -343,8 +375,14 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				int oldpageno = shared->page_number[slotno];
+				PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+
+				Assert(entry != NULL);
 				shared->page_status[slotno] = SLRU_PAGE_EMPTY;
-			else				/* write_in_progress */
+			}
+			else	/* write_in_progress */
 			{
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
 				shared->page_dirty[slotno] = true;
@@ -382,6 +420,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	{
 		int			slotno;
 		bool		ok;
+		PageSlotEntry *entry;
 
 		/* See if page already is in memory; if not, pick victim slot */
 		slotno = SlruSelectLRUPage(ctl, pageno);
@@ -413,7 +452,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				!shared->page_dirty[slotno]));
 
 		/* Mark the slot read-busy */
+		if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+		}
+
 		shared->page_number[slotno] = pageno;
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+		entry->slot = slotno;
 		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
 		shared->page_dirty[slotno] = false;
 
@@ -436,7 +484,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
 			   !shared->page_dirty[slotno]);
 
-		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
+		if (ok)
+		{
+			shared->page_status[slotno] = SLRU_PAGE_VALID;
+		}
+		else
+		{
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &pageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+		}
 
 		LWLockRelease(&shared->buffer_locks[slotno].lock);
 
@@ -467,19 +524,22 @@ int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
-	int			slotno;
+	PageSlotEntry *entry = NULL;
+	int slotno;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+	if (entry != NULL)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
-			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+		slotno = entry->slot;
+		Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+			&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
-			/* See comments for SlruRecentlyUsed macro */
+			Assert(shared->page_number[slotno] == pageno);
 			SlruRecentlyUsed(shared, slotno);
 			return slotno;
 		}
@@ -493,6 +553,44 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 }
 
 /*
+ * Same as SimpleLruReadPage_ReadOnly, but the shared lock must be held by the caller
+ * and will be held at exit.
+ */
+int
+SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno, TransactionId xid)
+{
+	SlruShared	shared = ctl->shared;
+	int slotno;
+	PageSlotEntry *entry;
+
+	Assert(LWLockHeldByMe(shared->ControlLock));
+
+	for (;;)
+	{
+		/* See if page is already in a buffer */
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+		if (entry != NULL)
+		{
+			slotno = entry->slot;
+			if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+				&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				Assert(shared->page_number[slotno] == pageno);
+				SlruRecentlyUsed(shared, slotno);
+				return slotno;
+			}
+		}
+
+		/* No luck, so switch to normal exclusive lock and do regular read */
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		SimpleLruReadPage(ctl, pageno, true, xid);
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_SHARED);
+	}
+}
+
+/*
  * Write a page from a shared buffer, if necessary.
  * Does nothing if the specified slot is not dirty.
  *
@@ -976,9 +1074,9 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestvalidslot = 0;	/* keep compiler quiet */
 		int			best_valid_delta = -1;
 		int			best_valid_page_number = 0; /* keep compiler quiet */
-		int			bestinvalidslot = 0;	/* keep compiler quiet */
+		int			bestinvalidslot = 0;		/* keep compiler quiet */
 		int			best_invalid_delta = -1;
-		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			best_invalid_page_number = 0;		/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
 		for (slotno = 0; slotno < shared->num_slots; slotno++)
@@ -1214,6 +1312,9 @@ restart:;
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1285,6 +1386,9 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
deleted file mode 100644
index f640661130..0000000000
--- a/src/backend/access/transam/subtrans.c
+++ /dev/null
@@ -1,394 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * subtrans.c
- *		PostgreSQL subtransaction-log manager
- *
- * The pg_subtrans manager is a pg_xact-like manager that stores the parent
- * transaction Id for each transaction.  It is a fundamental part of the
- * nested transactions implementation.  A main transaction has a parent
- * of InvalidTransactionId, and each subtransaction has its immediate parent.
- * The tree can easily be walked from child to parent, but not in the
- * opposite direction.
- *
- * This code is based on xact.c, but the robustness requirements
- * are completely different from pg_xact, because we only need to remember
- * pg_subtrans information for currently-open transactions.  Thus, there is
- * no need to preserve data over a crash and restart.
- *
- * There are no XLOG interactions since we do not care about preserving
- * data across crashes.  During database startup, we simply force the
- * currently-active page of SUBTRANS to zeroes.
- *
- * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/backend/access/transam/subtrans.c
- *
- *-------------------------------------------------------------------------
- */
-#include "postgres.h"
-
-#include "access/slru.h"
-#include "access/subtrans.h"
-#include "access/transam.h"
-#include "pg_trace.h"
-#include "utils/snapmgr.h"
-
-
-/*
- * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
- * SubTrans page numbering also wraps around at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
- * them in StartupSUBTRANS.
- */
-
-/* We need four bytes per xact */
-#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
-
-#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-
-
-/*
- * Link to shared-memory data structures for SUBTRANS control
- */
-static SlruCtlData SubTransCtlData;
-
-#define SubTransCtl  (&SubTransCtlData)
-
-
-static int	ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
-
-
-/*
- * Record the parent of a subtransaction in the subtrans log.
- */
-void
-SubTransSetParent(TransactionId xid, TransactionId parent)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-
-	Assert(TransactionIdIsValid(parent));
-	Assert(TransactionIdFollows(xid, parent));
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	/*
-	 * It's possible we'll try to set the parent xid multiple times but we
-	 * shouldn't ever be changing the xid from one valid xid to another valid
-	 * xid, which would corrupt the data structure.
-	 */
-	if (*ptr != parent)
-	{
-		Assert(*ptr == InvalidTransactionId);
-		*ptr = parent;
-		SubTransCtl->shared->page_dirty[slotno] = true;
-	}
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Interrogate the parent of a transaction in the subtrans log.
- */
-TransactionId
-SubTransGetParent(TransactionId xid)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-	TransactionId parent;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	/* Bootstrap and frozen XIDs have no parent */
-	if (!TransactionIdIsNormal(xid))
-		return InvalidTransactionId;
-
-	/* lock is acquired by SimpleLruReadPage_ReadOnly */
-
-	slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	parent = *ptr;
-
-	LWLockRelease(SubtransControlLock);
-
-	return parent;
-}
-
-/*
- * SubTransGetTopmostTransaction
- *
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID.  This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
- */
-TransactionId
-SubTransGetTopmostTransaction(TransactionId xid)
-{
-	TransactionId parentXid = xid,
-				previousXid = xid;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	while (TransactionIdIsValid(parentXid))
-	{
-		previousXid = parentXid;
-		if (TransactionIdPrecedes(parentXid, TransactionXmin))
-			break;
-		parentXid = SubTransGetParent(parentXid);
-
-		/*
-		 * By convention the parent xid gets allocated first, so should always
-		 * precede the child xid. Anything else points to a corrupted data
-		 * structure that could lead to an infinite loop, so exit.
-		 */
-		if (!TransactionIdPrecedes(parentXid, previousXid))
-			elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
-				 previousXid, parentXid);
-	}
-
-	Assert(TransactionIdIsValid(previousXid));
-
-	return previousXid;
-}
-
-
-/*
- * Initialization of shared memory for SUBTRANS
- */
-Size
-SUBTRANSShmemSize(void)
-{
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
-}
-
-void
-SUBTRANSShmemInit(void)
-{
-	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransControlLock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFERS);
-	/* Override default assumption that writes should be fsync'd */
-	SubTransCtl->do_fsync = false;
-}
-
-/*
- * This func must be called ONCE on system install.  It creates
- * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
- * have been created by the initdb shell script, and SUBTRANSShmemInit
- * must have been called already.)
- *
- * Note: it's not really necessary to create the initial segment now,
- * since slru.c would create it on first write anyway.  But we may as well
- * do it to be sure the directory is set up correctly.
- */
-void
-BootStrapSUBTRANS(void)
-{
-	int			slotno;
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Create and zero the first page of the subtrans log */
-	slotno = ZeroSUBTRANSPage(0);
-
-	/* Make sure it's written out */
-	SimpleLruWritePage(SubTransCtl, slotno);
-	Assert(!SubTransCtl->shared->page_dirty[slotno]);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Initialize (or reinitialize) a page of SUBTRANS to zeroes.
- *
- * The page is not actually written, just set up in shared memory.
- * The slot number of the new page is returned.
- *
- * Control lock must be held at entry, and will be held at exit.
- */
-static int
-ZeroSUBTRANSPage(int pageno)
-{
-	return SimpleLruZeroPage(SubTransCtl, pageno);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend startup,
- * after StartupXLOG has initialized ShmemVariableCache->nextXid.
- *
- * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
- * if there are none.
- */
-void
-StartupSUBTRANS(TransactionId oldestActiveXID)
-{
-	int			startPage;
-	int			endPage;
-
-	/*
-	 * Since we don't expect pg_subtrans to be valid across crashes, we
-	 * initialize the currently-active page(s) to zeroes during startup.
-	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
-	 * the new page without regard to whatever was previously on disk.
-	 */
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	startPage = TransactionIdToPage(oldestActiveXID);
-	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
-
-	while (startPage != endPage)
-	{
-		(void) ZeroSUBTRANSPage(startPage);
-		startPage++;
-		/* must account for wraparound */
-		if (startPage > TransactionIdToPage(MaxTransactionId))
-			startPage = 0;
-	}
-	(void) ZeroSUBTRANSPage(startPage);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend shutdown
- */
-void
-ShutdownSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely as a debugging aid.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
-	SimpleLruFlush(SubTransCtl, false);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
-}
-
-/*
- * Perform a checkpoint --- either during shutdown, or on-the-fly
- */
-void
-CheckPointSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely to improve the odds that writing of dirty pages is done by
-	 * the checkpoint process and not by backends.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
-	SimpleLruFlush(SubTransCtl, true);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
-}
-
-
-/*
- * Make sure that SUBTRANS has room for a newly-allocated XID.
- *
- * NB: this is called while holding XidGenLock.  We want it to be very fast
- * most of the time; even when it's not so fast, no actual I/O need happen
- * unless we're forced to write out a dirty subtrans page to make room
- * in shared memory.
- */
-void
-ExtendSUBTRANS(TransactionId newestXact)
-{
-	int			pageno;
-
-	/*
-	 * No work except at first XID of a page.  But beware: just after
-	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
-	 */
-	if (TransactionIdToEntry(newestXact) != 0 &&
-		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
-		return;
-
-	pageno = TransactionIdToPage(newestXact);
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Zero the page */
-	ZeroSUBTRANSPage(pageno);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-
-/*
- * Remove all SUBTRANS segments before the one holding the passed transaction ID
- *
- * This is normally called during checkpoint, with oldestXact being the
- * oldest TransactionXmin of any running transaction.
- */
-void
-TruncateSUBTRANS(TransactionId oldestXact)
-{
-	int			cutoffPage;
-
-	/*
-	 * The cutoff point is the start of the segment containing oldestXact. We
-	 * pass the *page* containing oldestXact to SimpleLruTruncate.  We step
-	 * back one transaction to avoid passing a cutoff page that hasn't been
-	 * created yet in the rare case that oldestXact would be the first item on
-	 * a page and oldestXact == next XID.  In that case, if we didn't subtract
-	 * one, we'd trigger SimpleLruTruncate's wraparound detection.
-	 */
-	TransactionIdRetreat(oldestXact);
-	cutoffPage = TransactionIdToPage(oldestXact);
-
-	SimpleLruTruncate(SubTransCtl, cutoffPage);
-}
-
-
-/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
- */
-static bool
-SubTransPagePrecedes(int page1, int page2)
-{
-	TransactionId xid1;
-	TransactionId xid2;
-
-	xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
-	xid1 += FirstNormalTransactionId;
-	xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
-	xid2 += FirstNormalTransactionId;
-
-	return TransactionIdPrecedes(xid1, xid2);
-}
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 968b232364..e2dd957693 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -3,6 +3,15 @@
  * transam.c
  *	  postgres transaction (commit) log interface routines
  *
+ * This module contains high level functions for managing the status
+ * of transactions. It sits on top of two lower level structures: the
+ * CLOG, and the CSNLOG. The CLOG is a permanent on-disk structure that
+ * tracks the committed/aborted status for each transaction ID. The CSNLOG
+ * tracks *when* each transaction ID committed (or aborted). The CSNLOG
+ * is used when checking the status of recent transactions that might still
+ * be in-progress, and it is reset at server startup. The CLOG is used for
+ * older transactions that are known to have completed (or crashed).
+ *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -10,56 +19,49 @@
  * IDENTIFICATION
  *	  src/backend/access/transam/transam.c
  *
- * NOTES
- *	  This file contains the high level access-method interface to the
- *	  transaction system.
- *
  *-------------------------------------------------------------------------
  */
 
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "storage/lmgr.h"
 #include "utils/snapmgr.h"
 
 /*
- * Single-item cache for results of TransactionLogFetch.  It's worth having
+ * Single-item cache for results of TransactionIdGetCommitSeqNo.  It's worth
+ * having
  * such a cache because we frequently find ourselves repeatedly checking the
  * same XID, for example when scanning a table just after a bulk insert,
  * update, or delete.
  */
 static TransactionId cachedFetchXid = InvalidTransactionId;
-static XidStatus cachedFetchXidStatus;
-static XLogRecPtr cachedCommitLSN;
+static CommitSeqNo cachedCSN;
 
-/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
-
-
-/* ----------------------------------------------------------------
- *		Postgres log access method interface
- *
- *		TransactionLogFetch
- * ----------------------------------------------------------------
+/*
+ * Also have a (separate) cache for CLogGetCommitLSN()
  */
+static TransactionId cachedLSNFetchXid = InvalidTransactionId;
+static XLogRecPtr cachedCommitLSN;
 
 /*
- * TransactionLogFetch --- fetch commit status of specified transaction id
+ * TransactionIdGetCommitSeqNo --- fetch CSN of specified transaction id
  */
-static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+CommitSeqNo
+TransactionIdGetCommitSeqNo(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
-	XLogRecPtr	xidlsn;
+	CommitSeqNo	csn;
 
 	/*
 	 * Before going to the commit log manager, check our single item cache to
 	 * see if we didn't just check the transaction status a moment ago.
 	 */
 	if (TransactionIdEquals(transactionId, cachedFetchXid))
-		return cachedFetchXidStatus;
+		return cachedCSN;
 
 	/*
 	 * Also, check to see if the transaction ID is a permanent one.
@@ -67,53 +69,63 @@ TransactionLogFetch(TransactionId transactionId)
 	if (!TransactionIdIsNormal(transactionId))
 	{
 		if (TransactionIdEquals(transactionId, BootstrapTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
+			return COMMITSEQNO_FROZEN;
 		if (TransactionIdEquals(transactionId, FrozenTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
-		return TRANSACTION_STATUS_ABORTED;
+			return COMMITSEQNO_FROZEN;
+		return COMMITSEQNO_ABORTED;
 	}
 
 	/*
-	 * Get the transaction status.
+	 * If the XID is older than TransactionXmin, check the clog. Otherwise
+	 * check the csnlog.
 	 */
-	xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+	Assert(TransactionIdIsValid(TransactionXmin));
+	if (TransactionIdPrecedes(transactionId, TransactionXmin))
+	{
+		XLogRecPtr lsn;
+
+		if (CLogGetStatus(transactionId, &lsn) == CLOG_XID_STATUS_COMMITTED)
+			csn = COMMITSEQNO_FROZEN;
+		else
+			csn = COMMITSEQNO_ABORTED;
+	}
+	else
+	{
+		csn = CSNLogGetCommitSeqNo(transactionId);
+
+		if (csn == COMMITSEQNO_COMMITTING)
+		{
+			/*
+			 * If the transaction is committing at this very instant, and
+			 * hasn't set its CSN yet, wait for it to finish doing so.
+			 *
+			 * XXX: Alternatively, we could wait on the heavy-weight lock on
+			 * the XID. that'd make TransactionIdCommitTree() slightly
+			 * cheaper, as it wouldn't need to acquire CommitSeqNoLock (even
+			 * in shared mode).
+			 */
+			LWLockAcquire(CommitSeqNoLock, LW_EXCLUSIVE);
+			LWLockRelease(CommitSeqNoLock);
+
+			csn = CSNLogGetCommitSeqNo(transactionId);
+			Assert(csn != COMMITSEQNO_COMMITTING);
+		}
+	}
 
 	/*
-	 * Cache it, but DO NOT cache status for unfinished or sub-committed
-	 * transactions!  We only cache status that is guaranteed not to change.
+	 * Cache it, but DO NOT cache status for unfinished transactions!
+	 * We only cache status that is guaranteed not to change.
 	 */
-	if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
-		xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn) ||
+		COMMITSEQNO_IS_ABORTED(csn))
 	{
 		cachedFetchXid = transactionId;
-		cachedFetchXidStatus = xidstatus;
-		cachedCommitLSN = xidlsn;
+		cachedCSN = csn;
 	}
 
-	return xidstatus;
+	return csn;
 }
 
-/* ----------------------------------------------------------------
- *						Interface functions
- *
- *		TransactionIdDidCommit
- *		TransactionIdDidAbort
- *		========
- *		   these functions test the transaction status of
- *		   a specified transaction id.
- *
- *		TransactionIdCommitTree
- *		TransactionIdAsyncCommitTree
- *		TransactionIdAbortTree
- *		========
- *		   these functions set the transaction status of the specified
- *		   transaction tree.
- *
- * See also TransactionIdIsInProgress, which once was in this module
- * but now lives in procarray.c.
- * ----------------------------------------------------------------
- */
-
 /*
  * TransactionIdDidCommit
  *		True iff transaction associated with the identifier did commit.
@@ -124,50 +136,14 @@ TransactionLogFetch(TransactionId transactionId)
 bool							/* true if given transaction committed */
 TransactionIdDidCommit(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked committed, it's committed.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 *
-	 * Originally we Assert'ed that the result of SubTransGetParent was not
-	 * zero. However with the introduction of prepared transactions, there can
-	 * be a window just after database startup where we do not have complete
-	 * knowledge in pg_subtrans of the transactions after TransactionXmin.
-	 * StartupSUBTRANS() has ensured that any missing information will be
-	 * zeroed.  Since this case should not happen under normal conditions, it
-	 * seems reasonable to emit a WARNING for it.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return false;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return false;
-		}
-		return TransactionIdDidCommit(parentXid);
-	}
-
-	/*
-	 * It's not committed.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
@@ -180,70 +156,35 @@ TransactionIdDidCommit(TransactionId transactionId)
 bool							/* true if given transaction aborted */
 TransactionIdDidAbort(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked aborted, it's aborted.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_ABORTED)
+	if (COMMITSEQNO_IS_ABORTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return true;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			/* see notes in TransactionIdDidCommit */
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return true;
-		}
-		return TransactionIdDidAbort(parentXid);
-	}
-
-	/*
-	 * It's not aborted.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
- * TransactionIdIsKnownCompleted
- *		True iff transaction associated with the identifier is currently
- *		known to have either committed or aborted.
+ * Returns the status of the tranaction.
  *
- * This does NOT look into pg_xact but merely probes our local cache
- * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated tqual.c checks for the same XID.  If this isn't extremely fast
- * then it will be counterproductive.
- *
- * Note:
- *		Assumes transaction identifier is valid.
+ * Note that this treats a a crashed transaction as still in-progress,
+ * until it falls off the xmin horizon.
  */
-bool
-TransactionIdIsKnownCompleted(TransactionId transactionId)
+TransactionIdStatus
+TransactionIdGetStatus(TransactionId xid)
 {
-	if (TransactionIdEquals(transactionId, cachedFetchXid))
-	{
-		/* If it's in the cache at all, it must be completed. */
-		return true;
-	}
+	CommitSeqNo csn;
+
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-	return false;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return XID_COMMITTED;
+	else if (COMMITSEQNO_IS_ABORTED(csn))
+		return XID_ABORTED;
+	else
+		return XID_INPROGRESS;
 }
 
 /*
@@ -252,28 +193,82 @@ TransactionIdIsKnownCompleted(TransactionId transactionId)
  *
  * "xid" is a toplevel transaction commit, and the xids array contains its
  * committed subtransactions.
- *
- * This commit operation is not guaranteed to be atomic, but if not, subxids
- * are correctly marked subcommit first.
  */
 void
 TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED,
-							   InvalidXLogRecPtr);
+	TransactionIdAsyncCommitTree(xid, nxids, xids, InvalidXLogRecPtr);
 }
 
 /*
  * TransactionIdAsyncCommitTree
- *		Same as above, but for async commits.  The commit record LSN is needed.
+ *		Same as above, but for async commits.
+ *
+ * "xid" is a toplevel transaction commit, and the xids array contains its
+ * committed subtransactions.
  */
 void
 TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 							 XLogRecPtr lsn)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED, lsn);
+	CommitSeqNo csn;
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+	/*
+	 * First update the clog, then CSN log.
+	 * oldestActiveXid advances based on CSN log content (see
+	 * AdvanceOldestActiveXid), and it should not become greater than
+	 * our xid before we set the clog status.
+	 * Otherwise other transactions could see us as aborted for some time
+	 * after we have written to CSN log, and somebody advanced the oldest
+	 * active xid past our xid, but before we write to clog.
+	 */
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_COMMITTED,
+					  lsn);
+
+	/*
+	 * Grab the CommitSeqNoLock, in shared mode. This is only used to
+	 * provide a way for a concurrent transaction to wait for us to
+	 * complete (see TransactionIdGetCommitSeqNo()).
+	 *
+	 * XXX: We could reduce the time the lock is held, by only setting
+	 * the CSN on the top-XID while holding the lock, and updating the
+	 * sub-XIDs later. But it doesn't matter much, because we're only
+	 * holding it in shared mode, and it's rare for it to be acquired
+	 * in exclusive mode.
+	 */
+	LWLockAcquire(CommitSeqNoLock, LW_SHARED);
+
+	/*
+	 * First update latestCompletedXid to cover this xid. We do this before
+	 * assigning a CSN, so that if someone acquires a new snapshot at the same
+	 * time, the xmax it computes is sure to cover our XID.
+	 */
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	/*
+	 * Mark our top transaction id as commit-in-progress.
+	 */
+	CSNLogSetCommitSeqNo(xid, 0, NULL, COMMITSEQNO_COMMITTING);
+
+	/* Get our CSN and increment */
+	csn = pg_atomic_fetch_add_u64(&ShmemVariableCache->nextCommitSeqNo, 1);
+	Assert(csn >= COMMITSEQNO_FIRST_NORMAL);
+
+	/* Stamp this XID (and sub-XIDs) with the CSN */
+	CSNLogSetCommitSeqNo(xid, nxids, xids, csn);
+
+	LWLockRelease(CommitSeqNoLock);
 }
 
 /*
@@ -289,8 +284,23 @@ TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 void
 TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	CSNLogSetCommitSeqNo(xid, nxids, xids, COMMITSEQNO_ABORTED);
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
 /*
@@ -409,7 +419,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	 * checking TransactionLogFetch's cache will usually succeed and avoid an
 	 * extra trip to shared memory.
 	 */
-	if (TransactionIdEquals(xid, cachedFetchXid))
+	if (TransactionIdEquals(xid, cachedLSNFetchXid))
 		return cachedCommitLSN;
 
 	/* Special XIDs are always known committed */
@@ -419,7 +429,10 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	/*
 	 * Get the transaction status.
 	 */
-	(void) TransactionIdGetStatus(xid, &result);
+	(void) CLogGetStatus(xid, &result);
+
+	cachedLSNFetchXid = xid;
+	cachedCommitLSN = result;
 
 	return result;
 }
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index ae832917ce..aaea2c5b0a 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -22,7 +22,7 @@
  *		transaction in prepared state with the same GID.
  *
  *		A global transaction (gxact) also has dummy PGXACT and PGPROC; this is
- *		what keeps the XID considered running by TransactionIdIsInProgress.
+ *		what keeps the XID considered running by the functions in procarray.c.
  *		It is also convenient as a PGPROC to hook the gxact's locks to.
  *
  *		Information to recover prepared transactions in case of crash is
@@ -78,6 +78,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -467,6 +468,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->lxid = (LocalTransactionId) xid;
 	pgxact->xid = xid;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	pgxact->delayChkpt = false;
 	pgxact->vacuumFlags = 0;
 	proc->pid = 0;
@@ -480,9 +482,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
 		SHMQueueInit(&(proc->myProcLocks[i]));
-	/* subxid data must be filled later by GXactLoadSubxactData */
-	pgxact->overflowed = false;
-	pgxact->nxids = 0;
 
 	gxact->prepared_at = prepared_at;
 	gxact->xid = xid;
@@ -500,34 +499,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 }
 
 /*
- * GXactLoadSubxactData
- *
- * If the transaction being persisted had any subtransactions, this must
- * be called before MarkAsPrepared() to load information into the dummy
- * PGPROC.
- */
-static void
-GXactLoadSubxactData(GlobalTransaction gxact, int nsubxacts,
-					 TransactionId *children)
-{
-	PGPROC	   *proc = &ProcGlobal->allProcs[gxact->pgprocno];
-	PGXACT	   *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
-
-	/* We need no extra lock since the GXACT isn't valid yet */
-	if (nsubxacts > PGPROC_MAX_CACHED_SUBXIDS)
-	{
-		pgxact->overflowed = true;
-		nsubxacts = PGPROC_MAX_CACHED_SUBXIDS;
-	}
-	if (nsubxacts > 0)
-	{
-		memcpy(proc->subxids.xids, children,
-			   nsubxacts * sizeof(TransactionId));
-		pgxact->nxids = nsubxacts;
-	}
-}
-
-/*
  * MarkAsPrepared
  *		Mark the GXACT as fully valid, and enter it into the global ProcArray.
  *
@@ -545,7 +516,7 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
 		LWLockRelease(TwoPhaseStateLock);
 
 	/*
-	 * Put it into the global ProcArray so TransactionIdIsInProgress considers
+	 * Put it into the global ProcArray so GetOldestActiveTransactionId() considers
 	 * the XID as still running.
 	 */
 	ProcArrayAdd(&ProcGlobal->allProcs[gxact->pgprocno]);
@@ -1036,8 +1007,6 @@ StartPrepare(GlobalTransaction gxact)
 	if (hdr.nsubxacts > 0)
 	{
 		save_state_data(children, hdr.nsubxacts * sizeof(TransactionId));
-		/* While we have the child-xact data, stuff it in the gxact too */
-		GXactLoadSubxactData(gxact, hdr.nsubxacts, children);
 	}
 	if (hdr.ncommitrels > 0)
 	{
@@ -1123,7 +1092,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * NB: a side effect of this is to make a dummy ProcArray entry for the
 	 * prepared XID.  This must happen before we clear the XID from MyPgXact,
 	 * else there is a window where the XID is not running according to
-	 * TransactionIdIsInProgress, and onlookers would be entitled to assume
+	 * GetOldestActiveTransactionId, and onlookers would be entitled to assume
 	 * the xact crashed.  Instead we have a window where the same XID appears
 	 * twice in ProcArray, which is OK.
 	 */
@@ -1373,7 +1342,6 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	char	   *buf;
 	char	   *bufptr;
 	TwoPhaseFileHeader *hdr;
-	TransactionId latestXid;
 	TransactionId *children;
 	RelFileNode *commitrels;
 	RelFileNode *abortrels;
@@ -1418,14 +1386,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	invalmsgs = (SharedInvalidationMessage *) bufptr;
 	bufptr += MAXALIGN(hdr->ninvalmsgs * sizeof(SharedInvalidationMessage));
 
-	/* compute latestXid among all children */
-	latestXid = TransactionIdLatest(xid, hdr->nsubxacts, children);
-
 	/*
 	 * The order of operations here is critical: make the XLOG entry for
 	 * commit or abort, then mark the transaction committed or aborted in
 	 * pg_xact, then remove its PGPROC from the global ProcArray (which means
-	 * TransactionIdIsInProgress will stop saying the prepared xact is in
+	 * GetOldestActiveTransactionId() will stop saying the prepared xact is in
 	 * progress), then run the post-commit or post-abort callbacks. The
 	 * callbacks will release the locks the transaction held.
 	 */
@@ -1440,7 +1405,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 									   hdr->nsubxacts, children,
 									   hdr->nabortrels, abortrels);
 
-	ProcArrayRemove(proc, latestXid);
+	ProcArrayRemove(proc);
 
 	/*
 	 * In case we fail while running the callbacks, mark the gxact invalid so
@@ -1926,17 +1891,17 @@ RecoverPreparedTransactions(void)
 		xid = gxact->xid;
 
 		/*
-		 * Reconstruct subtrans state for the transaction --- needed because
-		 * pg_subtrans is not preserved over a restart.  Note that we are
-		 * linking all the subtransactions directly to the top-level XID;
-		 * there may originally have been a more complex hierarchy, but
-		 * there's no need to restore that exactly. It's possible that
-		 * SubTransSetParent has been set before, if the prepared transaction
-		 * generated xid assignment records.
+		 * Reconstruct subtrans state for the transaction --- needed
+		 * because pg_csnlog is not preserved over a restart.  Note that
+		 * we are linking all the subtransactions directly to the
+		 * top-level XID; there may originally have been a more complex
+		 * hierarchy, but there's no need to restore that exactly.
+		 * It's possible that SubTransSetParent has been set before, if
+		 * the prepared transaction generated xid assignment records.
 		 */
 		buf = ProcessTwoPhaseBuffer(xid,
-									gxact->prepare_start_lsn,
-									gxact->ondisk, true, false);
+				gxact->prepare_start_lsn,
+				gxact->ondisk, true, false);
 		if (buf == NULL)
 			continue;
 
@@ -1965,7 +1930,6 @@ RecoverPreparedTransactions(void)
 		/* recovered, so reset the flag for entries generated by redo */
 		gxact->inredo = false;
 
-		GXactLoadSubxactData(gxact, hdr->nsubxacts, subxids);
 		MarkAsPrepared(gxact, true);
 
 		LWLockRelease(TwoPhaseStateLock);
@@ -2026,7 +1990,7 @@ ProcessTwoPhaseBuffer(TransactionId xid,
 		Assert(prepare_start_lsn != InvalidXLogRecPtr);
 
 	/* Already processed? */
-	if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+	if (TransactionIdGetStatus(xid) != XID_INPROGRESS)
 	{
 		if (fromdisk)
 		{
@@ -2225,7 +2189,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
-	/* Mark the transaction committed in pg_xact */
+	/* Mark the transaction committed in pg_xact and pg_csnlog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
 	/* Checkpoint can proceed now */
@@ -2263,7 +2227,7 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * Catch the scenario where we aborted partway through
 	 * RecordTransactionCommitPrepared ...
 	 */
-	if (TransactionIdDidCommit(xid))
+	if (TransactionIdGetStatus(xid) == XID_COMMITTED)
 		elog(PANIC, "cannot abort transaction %u, it was already committed",
 			 xid);
 
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 702c8c957f..f7ce30273c 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -15,6 +15,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/xact.h"
@@ -169,8 +171,8 @@ GetNewTransactionId(bool isSubXact)
 	 * Extend pg_subtrans and pg_commit_ts too.
 	 */
 	ExtendCLOG(xid);
+	ExtendCSNLOG(xid);
 	ExtendCommitTs(xid);
-	ExtendSUBTRANS(xid);
 
 	/*
 	 * Now advance the nextXid counter.  This must not happen until after we
@@ -200,17 +202,8 @@ GetNewTransactionId(bool isSubXact)
 	 * A solution to the atomic-store problem would be to give each PGXACT its
 	 * own spinlock used only for fetching/storing that PGXACT's xid and
 	 * related fields.
-	 *
-	 * If there's no room to fit a subtransaction XID into PGPROC, set the
-	 * cache-overflowed flag instead.  This forces readers to look in
-	 * pg_subtrans to map subtransaction XIDs up to top-level XIDs. There is a
-	 * race-condition window, in that the new XID will not appear as running
-	 * until its parent link has been placed into pg_subtrans. However, that
-	 * will happen before anyone could possibly have a reason to inquire about
-	 * the status of the XID, so it seems OK.  (Snapshots taken during this
-	 * window *will* include the parent XID, so they will deliver the correct
-	 * answer later on when someone does have a reason to inquire.)
 	 */
+	if (!isSubXact)
 	{
 		/*
 		 * Use volatile pointer to prevent code rearrangement; other backends
@@ -219,23 +212,9 @@ GetNewTransactionId(bool isSubXact)
 		 * nxids before filling the array entry.  Note we are assuming that
 		 * TransactionId and int fetch/store are atomic.
 		 */
-		volatile PGPROC *myproc = MyProc;
 		volatile PGXACT *mypgxact = MyPgXact;
 
-		if (!isSubXact)
-			mypgxact->xid = xid;
-		else
-		{
-			int			nxids = mypgxact->nxids;
-
-			if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
-			{
-				myproc->subxids.xids[nxids] = xid;
-				mypgxact->nxids = nxids + 1;
-			}
-			else
-				mypgxact->overflowed = true;
-		}
+		mypgxact->xid = xid;
 	}
 
 	LWLockRelease(XidGenLock);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 93dca7a72a..98dda14127 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -20,8 +20,10 @@
 #include <time.h>
 #include <unistd.h>
 
+#include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -185,11 +187,10 @@ typedef struct TransactionStateData
 	int			maxChildXids;	/* allocated size of childXids[] */
 	Oid			prevUser;		/* previous CurrentUserId setting */
 	int			prevSecContext; /* previous SecurityRestrictionContext */
-	bool		prevXactReadOnly;	/* entry-time xact r/o state */
-	bool		startedInRecovery;	/* did we start in recovery? */
-	bool		didLogXid;		/* has xid been included in WAL record? */
-	int			parallelModeLevel;	/* Enter/ExitParallelMode counter */
-	struct TransactionStateData *parent;	/* back link to parent */
+	bool		prevXactReadOnly;		/* entry-time xact r/o state */
+	bool		startedInRecovery;		/* did we start in recovery? */
+	int			parallelModeLevel;		/* Enter/ExitParallelMode counter */
+	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
 typedef TransactionStateData *TransactionState;
@@ -218,18 +219,10 @@ static TransactionStateData TopTransactionStateData = {
 	0,							/* previous SecurityRestrictionContext */
 	false,						/* entry-time xact r/o state */
 	false,						/* startedInRecovery */
-	false,						/* didLogXid */
 	0,							/* parallelMode */
 	NULL						/* link to parent state block */
 };
 
-/*
- * unreportedXids holds XIDs of all subtransactions that have not yet been
- * reported in an XLOG_XACT_ASSIGNMENT record.
- */
-static int	nUnreportedXids;
-static TransactionId unreportedXids[PGPROC_MAX_CACHED_SUBXIDS];
-
 static TransactionState CurrentTransactionState = &TopTransactionStateData;
 
 /*
@@ -313,7 +306,7 @@ static void CleanupTransaction(void);
 static void CheckTransactionChain(bool isTopLevel, bool throwError,
 					  const char *stmtType);
 static void CommitTransaction(void);
-static TransactionId RecordTransactionAbort(bool isSubXact);
+static void RecordTransactionAbort(bool isSubXact);
 static void StartTransaction(void);
 
 static void StartSubTransaction(void);
@@ -438,19 +431,6 @@ GetCurrentTransactionIdIfAny(void)
 }
 
 /*
- *	MarkCurrentTransactionIdLoggedIfAny
- *
- * Remember that the current xid - if it is assigned - now has been wal logged.
- */
-void
-MarkCurrentTransactionIdLoggedIfAny(void)
-{
-	if (TransactionIdIsValid(CurrentTransactionState->transactionId))
-		CurrentTransactionState->didLogXid = true;
-}
-
-
-/*
  *	GetStableLatestTransactionId
  *
  * Get the transaction's XID if it has one, else read the next-to-be-assigned
@@ -491,7 +471,6 @@ AssignTransactionId(TransactionState s)
 {
 	bool		isSubXact = (s->parent != NULL);
 	ResourceOwner currentOwner;
-	bool		log_unknown_top = false;
 
 	/* Assert that caller didn't screw up */
 	Assert(!TransactionIdIsValid(s->transactionId));
@@ -542,18 +521,14 @@ AssignTransactionId(TransactionState s)
 	 * superfluously log something. That can happen when an xid is included
 	 * somewhere inside a wal record, but not in XLogRecord->xl_xid, like in
 	 * xl_standby_locks.
+	 *
+	 * FIXME: didLogXid and the whole xact_assignment stuff is no more. We
+	 * no longer need it for subtransactions. Do we still need it for this
+	 * logical stuff?
 	 */
-	if (isSubXact && XLogLogicalInfoActive() &&
-		!TopTransactionStateData.didLogXid)
-		log_unknown_top = true;
 
 	/*
 	 * Generate a new Xid and record it in PG_PROC and pg_subtrans.
-	 *
-	 * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
-	 * shared storage other than PG_PROC; because if there's no room for it in
-	 * PG_PROC, the subtrans entry is needed to ensure that other backends see
-	 * the Xid as "running".  See GetNewTransactionId.
 	 */
 	s->transactionId = GetNewTransactionId(isSubXact);
 	if (!isSubXact)
@@ -588,59 +563,6 @@ AssignTransactionId(TransactionState s)
 	}
 	PG_END_TRY();
 	CurrentResourceOwner = currentOwner;
-
-	/*
-	 * Every PGPROC_MAX_CACHED_SUBXIDS assigned transaction ids within each
-	 * top-level transaction we issue a WAL record for the assignment. We
-	 * include the top-level xid and all the subxids that have not yet been
-	 * reported using XLOG_XACT_ASSIGNMENT records.
-	 *
-	 * This is required to limit the amount of shared memory required in a hot
-	 * standby server to keep track of in-progress XIDs. See notes for
-	 * RecordKnownAssignedTransactionIds().
-	 *
-	 * We don't keep track of the immediate parent of each subxid, only the
-	 * top-level transaction that each subxact belongs to. This is correct in
-	 * recovery only because aborted subtransactions are separately WAL
-	 * logged.
-	 *
-	 * This is correct even for the case where several levels above us didn't
-	 * have an xid assigned as we recursed up to them beforehand.
-	 */
-	if (isSubXact && XLogStandbyInfoActive())
-	{
-		unreportedXids[nUnreportedXids] = s->transactionId;
-		nUnreportedXids++;
-
-		/*
-		 * ensure this test matches similar one in
-		 * RecoverPreparedTransactions()
-		 */
-		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
-			log_unknown_top)
-		{
-			xl_xact_assignment xlrec;
-
-			/*
-			 * xtop is always set by now because we recurse up transaction
-			 * stack to the highest unassigned xid and then come back down
-			 */
-			xlrec.xtop = GetTopTransactionId();
-			Assert(TransactionIdIsValid(xlrec.xtop));
-			xlrec.nsubxacts = nUnreportedXids;
-
-			XLogBeginInsert();
-			XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment);
-			XLogRegisterData((char *) unreportedXids,
-							 nUnreportedXids * sizeof(TransactionId));
-
-			(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
-
-			nUnreportedXids = 0;
-			/* mark top, not current xact as having been logged */
-			TopTransactionStateData.didLogXid = true;
-		}
-	}
 }
 
 /*
@@ -1117,17 +1039,13 @@ AtSubStart_ResourceOwner(void)
 /*
  *	RecordTransactionCommit
  *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
- *
  * If you change this function, see RecordTransactionCommitPrepared also.
  */
-static TransactionId
+static void
 RecordTransactionCommit(void)
 {
 	TransactionId xid = GetTopTransactionIdIfAny();
 	bool		markXidCommitted = TransactionIdIsValid(xid);
-	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1291,7 +1209,7 @@ RecordTransactionCommit(void)
 		XLogFlush(XactLastRecEnd);
 
 		/*
-		 * Now we may update the CLOG, if we wrote a COMMIT record above
+		 * Now we may update the CLOG and CSNLOG, if we wrote a COMMIT record above
 		 */
 		if (markXidCommitted)
 			TransactionIdCommitTree(xid, nchildren, children);
@@ -1317,7 +1235,8 @@ RecordTransactionCommit(void)
 		 * flushed before the CLOG may be updated.
 		 */
 		if (markXidCommitted)
-			TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
+			TransactionIdAsyncCommitTree(xid, nchildren, children,
+										 XactLastRecEnd);
 	}
 
 	/*
@@ -1330,9 +1249,6 @@ RecordTransactionCommit(void)
 		END_CRIT_SECTION();
 	}
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
 	/*
 	 * Wait for synchronous replication, if required. Similar to the decision
 	 * above about using committing asynchronously we only want to wait if
@@ -1354,8 +1270,6 @@ cleanup:
 	/* Clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 
@@ -1523,15 +1437,11 @@ AtSubCommit_childXids(void)
 
 /*
  *	RecordTransactionAbort
- *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
  */
-static TransactionId
+static void
 RecordTransactionAbort(bool isSubXact)
 {
 	TransactionId xid = GetCurrentTransactionIdIfAny();
-	TransactionId latestXid;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1549,7 +1459,7 @@ RecordTransactionAbort(bool isSubXact)
 		/* Reset XactLastRecEnd until the next transaction writes something */
 		if (!isSubXact)
 			XactLastRecEnd = 0;
-		return InvalidTransactionId;
+		return;
 	}
 
 	/*
@@ -1612,18 +1522,6 @@ RecordTransactionAbort(bool isSubXact)
 
 	END_CRIT_SECTION();
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
-	/*
-	 * If we're aborting a subtransaction, we can immediately remove failed
-	 * XIDs from PGPROC's cache of running child XIDs.  We do that here for
-	 * subxacts, because we already have the child XID array at hand.  For
-	 * main xacts, the equivalent happens just after this function returns.
-	 */
-	if (isSubXact)
-		XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
-
 	/* Reset XactLastRecEnd until the next transaction writes something */
 	if (!isSubXact)
 		XactLastRecEnd = 0;
@@ -1631,8 +1529,6 @@ RecordTransactionAbort(bool isSubXact)
 	/* And clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 /*
@@ -1859,12 +1755,6 @@ StartTransaction(void)
 	currentCommandIdUsed = false;
 
 	/*
-	 * initialize reported xid accounting
-	 */
-	nUnreportedXids = 0;
-	s->didLogXid = false;
-
-	/*
 	 * must initialize resource-management stuff first
 	 */
 	AtStart_Memory();
@@ -1941,7 +1831,6 @@ static void
 CommitTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	is_parallel_worker = (s->blockState == TBLOCK_PARALLEL_INPROGRESS);
@@ -2041,17 +1930,11 @@ CommitTransaction(void)
 		 * We need to mark our XIDs as committed in pg_xact.  This is where we
 		 * durably commit.
 		 */
-		latestXid = RecordTransactionCommit();
+		RecordTransactionCommit();
 	}
 	else
 	{
 		/*
-		 * We must not mark our XID committed; the parallel master is
-		 * responsible for that.
-		 */
-		latestXid = InvalidTransactionId;
-
-		/*
 		 * Make sure the master will know about any WAL we wrote before it
 		 * commits.
 		 */
@@ -2065,7 +1948,7 @@ CommitTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionCommit.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * This is all post-commit cleanup.  Note that if an error is raised here,
@@ -2452,7 +2335,6 @@ static void
 AbortTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	/* Prevent cancel/die interrupt while cleaning up */
@@ -2557,11 +2439,9 @@ AbortTransaction(void)
 	 * record.
 	 */
 	if (!is_parallel_worker)
-		latestXid = RecordTransactionAbort(false);
+		RecordTransactionAbort(false);
 	else
 	{
-		latestXid = InvalidTransactionId;
-
 		/*
 		 * Since the parallel master won't get our value of XactLastRecEnd in
 		 * this case, we nudge WAL-writer ourselves in this case.  See related
@@ -2577,7 +2457,7 @@ AbortTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionAbort.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * Post-abort cleanup.  See notes in CommitTransaction() concerning
@@ -5538,9 +5418,12 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 	if (standbyState == STANDBY_DISABLED)
 	{
 		/*
-		 * Mark the transaction committed in pg_xact.
+		 * Mark the transaction committed in pg_xact. We don't bother updating
+		 * pg_csnlog during replay.
 		 */
-		TransactionIdCommitTree(xid, parsed->nsubxacts, parsed->subxacts);
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_COMMITTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5564,14 +5447,7 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * bits set on changes made by transactions that haven't yet
 		 * recovered. It's unlikely but it's good to be safe.
 		 */
-		TransactionIdAsyncCommitTree(
-									 xid, parsed->nsubxacts, parsed->subxacts, lsn);
-
-		/*
-		 * We must mark clog before we update the ProcArray.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
+		TransactionIdAsyncCommitTree(xid, parsed->nsubxacts, parsed->subxacts, lsn);
 
 		/*
 		 * Send any cache invalidations attached to the commit. We must
@@ -5696,8 +5572,13 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 
 	if (standbyState == STANDBY_DISABLED)
 	{
-		/* Mark the transaction aborted in pg_xact, no need for async stuff */
-		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
+		/*
+		 * Mark the transaction aborted in pg_xact, no need for async stuff or
+		 * to update pg_csnlog.
+		 */
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_ABORTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5716,12 +5597,6 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
 
 		/*
-		 * We must update the ProcArray after we have marked clog.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
-
-		/*
 		 * There are no flat files that need updating, nor invalidation
 		 * messages to send or undo.
 		 */
@@ -5810,14 +5685,6 @@ xact_redo(XLogReaderState *record)
 					   record->EndRecPtr);
 		LWLockRelease(TwoPhaseStateLock);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) XLogRecGetData(record);
-
-		if (standbyState >= STANDBY_INITIALIZED)
-			ProcArrayApplyXidAssignment(xlrec->xtop,
-										xlrec->nsubxacts, xlrec->xsub);
-	}
 	else
 		elog(PANIC, "xact_redo: unknown op code %u", info);
 }
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a3e8ce092f..dfb1055f06 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,7 +24,9 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
@@ -1099,8 +1101,6 @@ XLogInsertRecord(XLogRecData *rdata,
 	 */
 	WALInsertLockRelease();
 
-	MarkCurrentTransactionIdLoggedIfAny();
-
 	END_CRIT_SECTION();
 
 	/*
@@ -4961,6 +4961,7 @@ BootStrapXLOG(void)
 	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Select a hopefully-unique system identifier code for this installation.
@@ -5026,6 +5027,13 @@ BootStrapXLOG(void)
 	ShmemVariableCache->nextXid = checkPoint.nextXid;
 	ShmemVariableCache->nextOid = checkPoint.nextOid;
 	ShmemVariableCache->oidCount = 0;
+
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 	AdvanceOldestClogXid(checkPoint.oldestXid);
 	SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
@@ -5124,8 +5132,8 @@ BootStrapXLOG(void)
 
 	/* Bootstrap the commit log, too */
 	BootStrapCLOG();
+	BootStrapCSNLOG();
 	BootStrapCommitTs();
-	BootStrapSUBTRANS();
 	BootStrapMultiXact();
 
 	pfree(buffer);
@@ -6225,6 +6233,7 @@ StartupXLOG(void)
 	XLogPageReadPrivate private;
 	bool		fast_promoted = false;
 	struct stat st;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Read control file and check XLOG status looks valid.
@@ -6654,6 +6663,12 @@ StartupXLOG(void)
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
 
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	/*
 	 * Initialize replication slots, before there's a chance to remove
 	 * required resources.
@@ -6906,15 +6921,15 @@ StartupXLOG(void)
 			Assert(TransactionIdIsValid(oldestActiveXID));
 
 			/* Tell procarray about the range of xids it has to deal with */
-			ProcArrayInitRecovery(ShmemVariableCache->nextXid);
+			ProcArrayInitRecovery(oldestActiveXID, ShmemVariableCache->nextXid);
 
 			/*
-			 * Startup commit log and subtrans only.  MultiXact and commit
+			 * Startup commit log and csnlog only.  MultiXact and commit
 			 * timestamp have already been started up and other SLRUs are not
 			 * maintained during recovery and need not be started yet.
 			 */
 			StartupCLOG();
-			StartupSUBTRANS(oldestActiveXID);
+			StartupCSNLOG(oldestActiveXID);
 
 			/*
 			 * If we're beginning at a shutdown checkpoint, we know that
@@ -6925,7 +6940,6 @@ StartupXLOG(void)
 			if (wasShutdown)
 			{
 				RunningTransactionsData running;
-				TransactionId latestCompletedXid;
 
 				/*
 				 * Construct a RunningTransactions snapshot representing a
@@ -6933,16 +6947,8 @@ StartupXLOG(void)
 				 * alive. We're never overflowed at this point because all
 				 * subxids are listed with their parent prepared transactions.
 				 */
-				running.xcnt = nxids;
-				running.subxcnt = 0;
-				running.subxid_overflow = false;
 				running.nextXid = checkPoint.nextXid;
 				running.oldestRunningXid = oldestActiveXID;
-				latestCompletedXid = checkPoint.nextXid;
-				TransactionIdRetreat(latestCompletedXid);
-				Assert(TransactionIdIsNormal(latestCompletedXid));
-				running.latestCompletedXid = latestCompletedXid;
-				running.xids = xids;
 
 				ProcArrayApplyRecoveryInfo(&running);
 
@@ -7686,20 +7692,22 @@ StartupXLOG(void)
 	XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 	XLogCtl->lastSegSwitchLSN = EndOfLog;
 
-	/* also initialize latestCompletedXid, to nextXid - 1 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
-	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
-	LWLockRelease(ProcArrayLock);
+	/* also initialize latestCompletedXid, to nextXid - 1, and oldestActiveXid */
+	latestCompletedXid = ShmemVariableCache->nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid,
+						latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid,
+						oldestActiveXID);
 
 	/*
-	 * Start up the commit log and subtrans, if not already done for hot
+	 * Start up the commit log and csnlog, if not already done for hot
 	 * standby.  (commit timestamps are started below, if necessary.)
 	 */
 	if (standbyState == STANDBY_DISABLED)
 	{
 		StartupCLOG();
-		StartupSUBTRANS(oldestActiveXID);
+		StartupCSNLOG(oldestActiveXID);
 	}
 
 	/*
@@ -8368,8 +8376,8 @@ ShutdownXLOG(int code, Datum arg)
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	}
 	ShutdownCLOG();
+	ShutdownCSNLOG();
 	ShutdownCommitTs();
-	ShutdownSUBTRANS();
 	ShutdownMultiXact();
 }
 
@@ -8939,14 +8947,14 @@ CreateCheckPoint(int flags)
 		PreallocXlogFiles(recptr);
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  During recovery, though, we mustn't do this because
-	 * StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  During recovery, though, we mustn't do this because
+	 * StartupCSNLOG hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -9022,13 +9030,12 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointCLOG();
+	CheckPointCSNLOG();
 	CheckPointCommitTs();
-	CheckPointSUBTRANS();
 	CheckPointMultiXact();
 	CheckPointPredicate();
 	CheckPointRelationMap();
 	CheckPointReplicationSlots();
-	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointBuffers(flags);	/* performs all required fsyncs */
 	CheckPointReplicationOrigin();
@@ -9302,14 +9309,14 @@ CreateRestartPoint(int flags)
 	}
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  When hot standby is disabled, though, we mustn't do
-	 * this because StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  When hot standby is disabled, though, we mustn't do
+	 * this because StartupCSNLOG hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
@@ -9696,7 +9703,6 @@ xlog_redo(XLogReaderState *record)
 			TransactionId *xids;
 			int			nxids;
 			TransactionId oldestActiveXID;
-			TransactionId latestCompletedXid;
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
@@ -9707,16 +9713,8 @@ xlog_redo(XLogReaderState *record)
 			 * never overflowed at this point because all subxids are listed
 			 * with their parent prepared transactions.
 			 */
-			running.xcnt = nxids;
-			running.subxcnt = 0;
-			running.subxid_overflow = false;
 			running.nextXid = checkPoint.nextXid;
 			running.oldestRunningXid = oldestActiveXID;
-			latestCompletedXid = checkPoint.nextXid;
-			TransactionIdRetreat(latestCompletedXid);
-			Assert(TransactionIdIsNormal(latestCompletedXid));
-			running.latestCompletedXid = latestCompletedXid;
-			running.xids = xids;
 
 			ProcArrayApplyRecoveryInfo(&running);
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 05e70818e7..a56bd4075e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -69,6 +69,7 @@
 #include "parser/parse_relation.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -895,7 +896,7 @@ AddNewRelationTuple(Relation pg_class_desc,
 		 * We know that no xacts older than RecentXmin are still running, so
 		 * that will do.
 		 */
-		new_rel_reltup->relfrozenxid = RecentXmin;
+		new_rel_reltup->relfrozenxid = GetOldestActiveTransactionId();
 
 		/*
 		 * Similarly, initialize the minimum Multixact to the first value that
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index bacc08eb84..3291212748 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1928,27 +1928,21 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 		/* Ignore messages destined for other databases */
 		if (qe->dboid == MyDatabaseId)
 		{
-			if (TransactionIdIsInProgress(qe->xid))
+			TransactionIdStatus xidstatus = TransactionIdGetStatus(qe->xid);
+
+			if (xidstatus == XID_INPROGRESS)
 			{
 				/*
 				 * The source transaction is still in progress, so we can't
 				 * process this message yet.  Break out of the loop, but first
 				 * back up *current so we will reprocess the message next
-				 * time.  (Note: it is unlikely but not impossible for
-				 * TransactionIdDidCommit to fail, so we can't really avoid
-				 * this advance-then-back-up behavior when dealing with an
-				 * uncommitted message.)
-				 *
-				 * Note that we must test TransactionIdIsInProgress before we
-				 * test TransactionIdDidCommit, else we might return a message
-				 * from a transaction that is not yet visible to snapshots;
-				 * compare the comments at the head of tqual.c.
+				 * time.
 				 */
 				*current = thisentry;
 				reachedStop = true;
 				break;
 			}
-			else if (TransactionIdDidCommit(qe->xid))
+			else if (xidstatus == XID_COMMITTED)
 			{
 				/* qe->data is the null-terminated channel name */
 				char	   *channel = qe->data;
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d2e0376511..26575706a8 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -33,6 +33,7 @@
 #include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -842,7 +843,8 @@ static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
-					 RecentXmin, ReadNextMultiXactId(), relpersistence);
+					 GetOldestActiveTransactionId(), ReadNextMultiXactId(),
+					 relpersistence);
 }
 
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 96354bdee5..22f56f999b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -87,6 +87,7 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -1473,7 +1474,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			 * deletion at commit.
 			 */
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-									  RecentXmin, minmulti);
+									  GetOldestActiveTransactionId(), minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 				heap_create_init_fork(rel);
 
@@ -1487,7 +1488,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			{
 				rel = relation_open(toast_relid, AccessExclusiveLock);
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-										  RecentXmin, minmulti);
+										  GetOldestActiveTransactionId(), minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 					heap_create_init_fork(rel);
 				heap_close(rel, NoLock);
@@ -4283,7 +4284,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
-							 RecentXmin,
+							 GetOldestActiveTransactionId(),
 							 ReadNextMultiXactId(),
 							 persistence);
 		}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 486fd0c988..23d36401a6 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -165,7 +165,6 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 static void
 DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 {
-	SnapBuild  *builder = ctx->snapshot_builder;
 	uint8		info = XLogRecGetInfo(buf->record) & ~XLR_INFO_MASK;
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(buf->record),
@@ -176,8 +175,6 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* this is also used in END_OF_RECOVERY checkpoints */
 		case XLOG_CHECKPOINT_SHUTDOWN:
 		case XLOG_END_OF_RECOVERY:
-			SnapBuildSerializationPoint(builder, buf->origptr);
-
 			break;
 		case XLOG_CHECKPOINT_ONLINE:
 
@@ -217,8 +214,11 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * ok not to call ReorderBufferProcessXid() in that case, except in the
 	 * assignment case there'll not be any later records with the same xid;
 	 * and in the assignment case we'll not decode those xacts.
+	 *
+	 * FIXME: the assignment record is no more. I don't understand the above
+	 * comment. Can it be just removed?
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -259,23 +259,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeAbort(ctx, buf, &parsed, xid);
 				break;
 			}
-		case XLOG_XACT_ASSIGNMENT:
-			{
-				xl_xact_assignment *xlrec;
-				int			i;
-				TransactionId *sub_xid;
-
-				xlrec = (xl_xact_assignment *) XLogRecGetData(r);
-
-				sub_xid = &xlrec->xsub[0];
-
-				for (i = 0; i < xlrec->nsubxacts; i++)
-				{
-					ReorderBufferAssignChild(reorder, xlrec->xtop,
-											 *(sub_xid++), buf->origptr);
-				}
-				break;
-			}
 		case XLOG_XACT_PREPARE:
 
 			/*
@@ -354,7 +337,7 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -409,7 +392,7 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -502,7 +485,7 @@ DecodeLogicalMsgOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
 	/* No point in doing anything yet. */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index efb9785f25..0500c9dd76 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -113,7 +113,7 @@ CheckLogicalDecodingRequirements(void)
 static LogicalDecodingContext *
 StartupDecodingContext(List *output_plugin_options,
 					   XLogRecPtr start_lsn,
-					   TransactionId xmin_horizon,
+
 					   bool need_full_snapshot,
 					   XLogPageReadCB read_page,
 					   LogicalOutputPluginWriterPrepareWrite prepare_write,
@@ -173,7 +173,7 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
-		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
+		AllocateSnapshotBuilder(ctx->reorder, start_lsn,
 								need_full_snapshot);
 
 	ctx->reorder->private_data = ctx;
@@ -302,7 +302,7 @@ CreateInitDecodingContext(char *plugin,
 	ReplicationSlotMarkDirty();
 	ReplicationSlotSave();
 
-	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr, xmin_horizon,
+	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr,
 								 need_full_snapshot, read_page, prepare_write,
 								 do_write, update_progress);
 
@@ -394,10 +394,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	}
 
 	ctx = StartupDecodingContext(output_plugin_options,
-								 start_lsn, InvalidTransactionId, false,
+								 start_lsn, false,
 								 read_page, prepare_write, do_write,
 								 update_progress);
-
 	/* call output plugin initialization callback */
 	old_context = MemoryContextSwitchTo(ctx->context);
 	if (ctx->callbacks.startup_cb != NULL)
@@ -777,12 +776,12 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 }
 
 /*
- * Set the required catalog xmin horizon for historic snapshots in the current
- * replication slot.
+ * Set the oldest snapshot required for historic catalog lookups in the
+ * current replication slot.
  *
- * Note that in the most cases, we won't be able to immediately use the xmin
- * to increase the xmin horizon: we need to wait till the client has confirmed
- * receiving current_lsn with LogicalConfirmReceivedLocation().
+ * Note that in the most cases, we won't be able to immediately use the
+ * snapshot to increase the oldest snapshot, we need to wait till the client
+ * has confirmed receiving current_lsn with LogicalConfirmReceivedLocation().
  */
 void
 LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 657bafae57..77d9f842d8 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1236,7 +1236,6 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	Size		size;
 
 	size = sizeof(SnapshotData) +
-		sizeof(TransactionId) * orig_snap->xcnt +
 		sizeof(TransactionId) * (txn->nsubtxns + 1);
 
 	snap = MemoryContextAllocZero(rb->context, size);
@@ -1245,36 +1244,33 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	snap->copied = true;
 	snap->active_count = 1;		/* mark as active so nobody frees it */
 	snap->regd_count = 0;
-	snap->xip = (TransactionId *) (snap + 1);
-
-	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
 
 	/*
 	 * snap->subxip contains all txids that belong to our transaction which we
 	 * need to check via cmin/cmax. That's why we store the toplevel
 	 * transaction in there as well.
 	 */
-	snap->subxip = snap->xip + snap->xcnt;
-	snap->subxip[i++] = txn->xid;
+	snap->this_xip = (TransactionId *) (snap + 1);
+	snap->this_xip[i++] = txn->xid;
 
 	/*
 	 * nsubxcnt isn't decreased when subtransactions abort, so count manually.
 	 * Since it's an upper boundary it is safe to use it for the allocation
 	 * above.
 	 */
-	snap->subxcnt = 1;
+	snap->this_xcnt = 1;
 
 	dlist_foreach(iter, &txn->subtxns)
 	{
 		ReorderBufferTXN *sub_txn;
 
 		sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
-		snap->subxip[i++] = sub_txn->xid;
-		snap->subxcnt++;
+		snap->this_xip[i++] = sub_txn->xid;
+		snap->this_xcnt++;
 	}
 
 	/* sort so we can bsearch() later */
-	qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+	qsort(snap->this_xip, snap->this_xcnt, sizeof(TransactionId), xidComparator);
 
 	/* store the specified current CommandId */
 	snap->curcid = cid;
@@ -1346,6 +1342,7 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 	}
 
 	snapshot_now = txn->base_snapshot;
+	Assert(snapshot_now->snapshotcsn != InvalidCommitSeqNo);
 
 	/* build data to be able to lookup the CommandIds of catalog tuples */
 	ReorderBufferBuildTupleCidHash(rb, txn);
@@ -2238,10 +2235,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
-					sizeof(TransactionId) * snap->xcnt +
-					sizeof(TransactionId) * snap->subxcnt
-					;
+				sz += sizeof(SnapshotData);
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
@@ -2251,20 +2245,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
-
-				if (snap->xcnt)
-				{
-					memcpy(data, snap->xip,
-						   sizeof(TransactionId) * snap->xcnt);
-					data += sizeof(TransactionId) * snap->xcnt;
-				}
-
-				if (snap->subxcnt)
-				{
-					memcpy(data, snap->subxip,
-						   sizeof(TransactionId) * snap->subxcnt);
-					data += sizeof(TransactionId) * snap->subxcnt;
-				}
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
@@ -2530,24 +2510,16 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	oldsnap;
 				Snapshot	newsnap;
 				Size		size;
 
-				oldsnap = (Snapshot) data;
-
-				size = sizeof(SnapshotData) +
-					sizeof(TransactionId) * oldsnap->xcnt +
-					sizeof(TransactionId) * (oldsnap->subxcnt + 0);
+				size = sizeof(SnapshotData);
 
 				change->data.snapshot = MemoryContextAllocZero(rb->context, size);
 
 				newsnap = change->data.snapshot;
 
 				memcpy(newsnap, data, size);
-				newsnap->xip = (TransactionId *)
-					(((char *) newsnap) + sizeof(SnapshotData));
-				newsnap->subxip = newsnap->xip + newsnap->xcnt;
 				newsnap->copied = true;
 				break;
 			}
@@ -3199,7 +3171,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 			continue;
 
 		/* not for our transaction */
-		if (!TransactionIdInArray(f_mapped_xid, snapshot->subxip, snapshot->subxcnt))
+		if (!TransactionIdInArray(f_mapped_xid, snapshot->this_xip, snapshot->this_xcnt))
 			continue;
 
 		/* ok, relevant, queue for apply */
@@ -3227,7 +3199,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 		RewriteMappingFile *f = files_a[off];
 
 		elog(DEBUG1, "applying mapping: \"%s\" in %u", f->fname,
-			 snapshot->subxip[0]);
+			 snapshot->this_xip[0]);
 		ApplyLogicalMappingFile(tuplecid_data, relid, f->fname);
 		pfree(f);
 	}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index fba57a0470..580d45b252 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -164,17 +164,15 @@ struct SnapBuild
 	/* all transactions >= than this are uncommitted */
 	TransactionId xmax;
 
+	/* this determines the state of transactions between xmin and xmax */
+	CommitSeqNo snapshotcsn;
+
 	/*
 	 * Don't replay commits from an LSN < this LSN. This can be set externally
 	 * but it will also be advanced (never retreat) from within snapbuild.c.
 	 */
 	XLogRecPtr	start_decoding_at;
 
-	/*
-	 * Don't start decoding WAL until the "xl_running_xacts" information
-	 * indicates there are no running xids with an xid smaller than this.
-	 */
-	TransactionId initial_xmin_horizon;
 
 	/* Indicates if we are building full snapshot or just catalog one. */
 	bool		building_full_snapshot;
@@ -185,70 +183,9 @@ struct SnapBuild
 	Snapshot	snapshot;
 
 	/*
-	 * LSN of the last location we are sure a snapshot has been serialized to.
-	 */
-	XLogRecPtr	last_serialized_snapshot;
-
-	/*
 	 * The reorderbuffer we need to update with usable snapshots et al.
 	 */
 	ReorderBuffer *reorder;
-
-	/*
-	 * Outdated: This struct isn't used for its original purpose anymore, but
-	 * can't be removed / changed in a minor version, because it's stored
-	 * on-disk.
-	 */
-	struct
-	{
-		/*
-		 * NB: This field is misused, until a major version can break on-disk
-		 * compatibility. See SnapBuildNextPhaseAt() /
-		 * SnapBuildStartNextPhaseAt().
-		 */
-		TransactionId was_xmin;
-		TransactionId was_xmax;
-
-		size_t		was_xcnt;	/* number of used xip entries */
-		size_t		was_xcnt_space; /* allocated size of xip */
-		TransactionId *was_xip; /* running xacts array, xidComparator-sorted */
-	}			was_running;
-
-	/*
-	 * Array of transactions which could have catalog changes that committed
-	 * between xmin and xmax.
-	 */
-	struct
-	{
-		/* number of committed transactions */
-		size_t		xcnt;
-
-		/* available space for committed transactions */
-		size_t		xcnt_space;
-
-		/*
-		 * Until we reach a CONSISTENT state, we record commits of all
-		 * transactions, not just the catalog changing ones. Record when that
-		 * changes so we know we cannot export a snapshot safely anymore.
-		 */
-		bool		includes_all_transactions;
-
-		/*
-		 * Array of committed transactions that have modified the catalog.
-		 *
-		 * As this array is frequently modified we do *not* keep it in
-		 * xidComparator order. Instead we sort the array when building &
-		 * distributing a snapshot.
-		 *
-		 * TODO: It's unclear whether that reasoning has much merit. Every
-		 * time we add something here after becoming consistent will also
-		 * require distributing a snapshot. Storing them sorted would
-		 * potentially also make it easier to purge (but more complicated wrt
-		 * wraparound?). Should be improved if sorting while building the
-		 * snapshot shows up in profiles.
-		 */
-		TransactionId *xip;
-	}			committed;
 };
 
 /*
@@ -258,9 +195,6 @@ struct SnapBuild
 static ResourceOwner SavedResourceOwnerDuringExport = NULL;
 static bool ExportInProgress = false;
 
-/* ->committed manipulation */
-static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
-
 /* snapshot building/manipulation/distribution functions */
 static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
@@ -270,41 +204,6 @@ static void SnapBuildSnapIncRefcount(Snapshot snap);
 
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
-/* xlog reading helper functions for SnapBuildProcessRecord */
-static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
-static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff);
-
-/* serialization functions */
-static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
-static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
-
-/*
- * Return TransactionId after which the next phase of initial snapshot
- * building will happen.
- */
-static inline TransactionId
-SnapBuildNextPhaseAt(SnapBuild *builder)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	return builder->was_running.was_xmax;
-}
-
-/*
- * Set TransactionId after which the next phase of initial snapshot building
- * will happen.
- */
-static inline void
-SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	builder->was_running.was_xmax = at;
-}
 
 /*
  * Allocate a new snapshot builder.
@@ -314,7 +213,6 @@ SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
  */
 SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
-						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
 						bool need_full_snapshot)
 {
@@ -335,13 +233,6 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->reorder = reorder;
 	/* Other struct members initialized by zeroing via palloc0 above */
 
-	builder->committed.xcnt = 0;
-	builder->committed.xcnt_space = 128;	/* arbitrary number */
-	builder->committed.xip =
-		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
-	builder->committed.includes_all_transactions = true;
-
-	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
 
@@ -380,7 +271,6 @@ SnapBuildFreeSnapshot(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 	Assert(snap->regd_count == 0);
 
@@ -438,7 +328,6 @@ SnapBuildSnapDecRefcount(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 
 	Assert(snap->regd_count == 0);
@@ -468,10 +357,9 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	Snapshot	snapshot;
 	Size		ssize;
 
-	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+	Assert(builder->state >= SNAPBUILD_CONSISTENT);
 
 	ssize = sizeof(SnapshotData)
-		+ sizeof(TransactionId) * builder->committed.xcnt
 		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
 
 	snapshot = MemoryContextAllocZero(builder->context, ssize);
@@ -479,52 +367,34 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->satisfies = HeapTupleSatisfiesHistoricMVCC;
 
 	/*
-	 * We misuse the original meaning of SnapshotData's xip and subxip fields
-	 * to make the more fitting for our needs.
-	 *
-	 * In the 'xip' array we store transactions that have to be treated as
-	 * committed. Since we will only ever look at tuples from transactions
-	 * that have modified the catalog it's more efficient to store those few
-	 * that exist between xmin and xmax (frequently there are none).
-	 *
 	 * Snapshots that are used in transactions that have modified the catalog
-	 * also use the 'subxip' array to store their toplevel xid and all the
+	 * use the 'this_xip' array to store their toplevel xid and all the
 	 * subtransaction xids so we can recognize when we need to treat rows as
-	 * visible that are not in xip but still need to be visible. Subxip only
+	 * visible that would not normally be visible by the CSN test. this_xip only
 	 * gets filled when the transaction is copied into the context of a
 	 * catalog modifying transaction since we otherwise share a snapshot
 	 * between transactions. As long as a txn hasn't modified the catalog it
 	 * doesn't need to treat any uncommitted rows as visible, so there is no
 	 * need for those xids.
 	 *
-	 * Both arrays are qsort'ed so that we can use bsearch() on them.
+	 * this_xip array is qsort'ed so that we can use bsearch() on them.
 	 */
 	Assert(TransactionIdIsNormal(builder->xmin));
 	Assert(TransactionIdIsNormal(builder->xmax));
+	Assert(builder->snapshotcsn != InvalidCommitSeqNo);
 
 	snapshot->xmin = builder->xmin;
 	snapshot->xmax = builder->xmax;
-
-	/* store all transactions to be treated as committed by this snapshot */
-	snapshot->xip =
-		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
-	snapshot->xcnt = builder->committed.xcnt;
-	memcpy(snapshot->xip,
-		   builder->committed.xip,
-		   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+	snapshot->snapshotcsn = builder->snapshotcsn;
 
 	/*
-	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
+	 * Initially, this_xip is empty, i.e. it's a snapshot to be used by
 	 * transactions that don't modify the catalog. Will be filled by
 	 * ReorderBufferCopySnap() if necessary.
 	 */
-	snapshot->subxcnt = 0;
-	snapshot->subxip = NULL;
+	snapshot->this_xcnt = 0;
+	snapshot->this_xip = NULL;
 
-	snapshot->suboverflowed = false;
 	snapshot->takenDuringRecovery = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
@@ -545,9 +415,6 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -555,9 +422,6 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (builder->state != SNAPBUILD_CONSISTENT)
 		elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state");
 
-	if (!builder->committed.includes_all_transactions)
-		elog(ERROR, "cannot build an initial slot snapshot, not all transactions are monitored anymore");
-
 	/* so we don't overwrite the existing value */
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
@@ -569,56 +433,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * mechanism. Due to that we can do this without locks, we're only
 	 * changing our own value.
 	 */
-#ifdef USE_ASSERT_CHECKING
-	{
-		TransactionId safeXid;
-
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
-		safeXid = GetOldestSafeDecodingTransactionId(false);
-		LWLockRelease(ProcArrayLock);
-
-		Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin));
-	}
-#endif
-
-	MyPgXact->xmin = snap->xmin;
-
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
-
-	/*
-	 * snapbuild.c builds transactions in an "inverted" manner, which means it
-	 * stores committed transactions in ->xip, not ones in progress. Build a
-	 * classical snapshot by marking all non-committed transactions as
-	 * in-progress. This can be expensive.
-	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
-	{
-		void	   *test;
-
-		/*
-		 * Check whether transaction committed using the decoding snapshot
-		 * meaning of ->xip.
-		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
-					   sizeof(TransactionId), xidComparator);
-
-		if (test == NULL)
-		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-				ereport(ERROR,
-						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-						 errmsg("initial slot snapshot too large")));
-
-			newxip[newxcnt++] = xid;
-		}
-
-		TransactionIdAdvance(xid);
-	}
-
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	MyPgXact->snapshotcsn = snap->snapshotcsn;
 
 	return snap;
 }
@@ -661,10 +476,10 @@ SnapBuildExportSnapshot(SnapBuild *builder)
 	snapname = ExportSnapshot(snap);
 
 	ereport(LOG,
-			(errmsg_plural("exported logical decoding snapshot: \"%s\" with %u transaction ID",
-						   "exported logical decoding snapshot: \"%s\" with %u transaction IDs",
-						   snap->xcnt,
-						   snapname, snap->xcnt)));
+			(errmsg("exported logical decoding snapshot: \"%s\" at %X/%X",
+					snapname,
+					(uint32) (snap->snapshotcsn >> 32),
+					(uint32) snap->snapshotcsn)));
 	return snapname;
 }
 
@@ -722,16 +537,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 	 * We can't handle data in transactions if we haven't built a snapshot
 	 * yet, so don't store them.
 	 */
-	if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-		return false;
-
-	/*
-	 * No point in keeping track of changes in transactions that we don't have
-	 * enough information about to decode. This means that they started before
-	 * we got into the SNAPBUILD_FULL_SNAPSHOT state.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT &&
-		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+	if (builder->state < SNAPBUILD_CONSISTENT)
 		return false;
 
 	/*
@@ -851,76 +657,6 @@ SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 }
 
 /*
- * Keep track of a new catalog changing transaction that has committed.
- */
-static void
-SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	if (builder->committed.xcnt == builder->committed.xcnt_space)
-	{
-		builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
-
-		elog(DEBUG1, "increasing space for committed transactions to %u",
-			 (uint32) builder->committed.xcnt_space);
-
-		builder->committed.xip = repalloc(builder->committed.xip,
-										  builder->committed.xcnt_space * sizeof(TransactionId));
-	}
-
-	/*
-	 * TODO: It might make sense to keep the array sorted here instead of
-	 * doing it every time we build a new snapshot. On the other hand this
-	 * gets called repeatedly when a transaction with subtransactions commits.
-	 */
-	builder->committed.xip[builder->committed.xcnt++] = xid;
-}
-
-/*
- * Remove knowledge about transactions we treat as committed that are smaller
- * than ->xmin. Those won't ever get checked via the ->committed array but via
- * the clog machinery, so we don't need to waste memory on them.
- */
-static void
-SnapBuildPurgeCommittedTxn(SnapBuild *builder)
-{
-	int			off;
-	TransactionId *workspace;
-	int			surviving_xids = 0;
-
-	/* not ready yet */
-	if (!TransactionIdIsNormal(builder->xmin))
-		return;
-
-	/* TODO: Neater algorithm than just copying and iterating? */
-	workspace =
-		MemoryContextAlloc(builder->context,
-						   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* copy xids that still are interesting to workspace */
-	for (off = 0; off < builder->committed.xcnt; off++)
-	{
-		if (NormalTransactionIdPrecedes(builder->committed.xip[off],
-										builder->xmin))
-			;					/* remove */
-		else
-			workspace[surviving_xids++] = builder->committed.xip[off];
-	}
-
-	/* copy workspace back to persistent state */
-	memcpy(builder->committed.xip, workspace,
-		   surviving_xids * sizeof(TransactionId));
-
-	elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
-		 (uint32) builder->committed.xcnt, (uint32) surviving_xids,
-		 builder->xmin, builder->xmax);
-	builder->committed.xcnt = surviving_xids;
-
-	pfree(workspace);
-}
-
-/*
  * Handle everything that needs to be done when a transaction commits
  */
 void
@@ -929,26 +665,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 {
 	int			nxact;
 
-	bool		needs_snapshot = false;
-	bool		needs_timetravel = false;
-	bool		sub_needs_timetravel = false;
+	bool		forced_timetravel = false;
 
-	TransactionId xmax = xid;
+	TransactionId xmax;
 
 	/*
-	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
-	 * will they be part of a snapshot.  So we don't need to record anything.
+	 * If we couldn't observe every change of a transaction because it was
+	 * already running at the point we started to observe we have to assume it
+	 * made catalog changes.
+	 *
+	 * This has the positive benefit that we afterwards have enough
+	 * information to build an exportable snapshot that's usable by pg_dump et
+	 * al.
 	 */
-	if (builder->state == SNAPBUILD_START ||
-		(builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-		 TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder))))
-	{
-		/* ensure that only commits after this are getting replayed */
-		if (builder->start_decoding_at <= lsn)
-			builder->start_decoding_at = lsn + 1;
-		return;
-	}
-
 	if (builder->state < SNAPBUILD_CONSISTENT)
 	{
 		/* ensure that only commits after this are getting replayed */
@@ -956,104 +685,45 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			builder->start_decoding_at = lsn + 1;
 
 		/*
-		 * If building an exportable snapshot, force xid to be tracked, even
-		 * if the transaction didn't modify the catalog.
+		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
+		 * timetravel ones, but we want to be able to export a snapshot when
+		 * we reached consistency.
 		 */
-		if (builder->building_full_snapshot)
-		{
-			needs_timetravel = true;
-		}
+		forced_timetravel = true;
+		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
 	}
 
+	xmax = builder->xmax;
+
+	if (NormalTransactionIdFollows(xid, xmax))
+		xmax = xid;
+	if (!forced_timetravel)
+	{
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
+			forced_timetravel = true;
+	}
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
 
-		/*
-		 * Add subtransaction to base snapshot if catalog modifying, we don't
-		 * distinguish to toplevel transactions there.
-		 */
-		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
-		{
-			sub_needs_timetravel = true;
-			needs_snapshot = true;
-
-			elog(DEBUG1, "found subtransaction %u:%u with catalog changes",
-				 xid, subxid);
-
-			SnapBuildAddCommittedTxn(builder, subxid);
+		if (NormalTransactionIdFollows(subxid, xmax))
+			xmax = subxid;
 
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
-		}
-
-		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state, even
-		 * if not catalog modifying.  Don't need to distribute a snapshot in
-		 * that case.
-		 */
-		else if (needs_timetravel)
+		if (!forced_timetravel)
 		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
+			if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+				forced_timetravel = true;
 		}
 	}
 
-	/* if top-level modified catalog, it'll need a snapshot */
-	if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
-	{
-		elog(DEBUG2, "found top level transaction %u, with catalog changes",
-			 xid);
-		needs_snapshot = true;
-		needs_timetravel = true;
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (sub_needs_timetravel)
-	{
-		/* track toplevel txn as well, subxact alone isn't meaningful */
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (needs_timetravel)
-	{
-		elog(DEBUG2, "forced transaction %u to do timetravel", xid);
-
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-
-	if (!needs_timetravel)
-	{
-		/* record that we cannot export a general snapshot anymore */
-		builder->committed.includes_all_transactions = false;
-	}
-
-	Assert(!needs_snapshot || needs_timetravel);
-
-	/*
-	 * Adjust xmax of the snapshot builder, we only do that for committed,
-	 * catalog modifying, transactions, everything else isn't interesting for
-	 * us since we'll never look at the respective rows.
-	 */
-	if (needs_timetravel &&
-		(!TransactionIdIsValid(builder->xmax) ||
-		 TransactionIdFollowsOrEquals(xmax, builder->xmax)))
-	{
-		builder->xmax = xmax;
-		TransactionIdAdvance(builder->xmax);
-	}
+	builder->xmax = xmax;
+	/* We use the commit record's LSN as the snapshot */
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
 	/* if there's any reason to build a historic snapshot, do so now */
-	if (needs_snapshot)
+	if (forced_timetravel)
 	{
 		/*
-		 * If we haven't built a complete snapshot yet there's no need to hand
-		 * it out, it wouldn't (and couldn't) be used anyway.
-		 */
-		if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-			return;
-
-		/*
 		 * Decrease the snapshot builder's refcount of the old snapshot, note
 		 * that it still will be used if it has been handed out to the
 		 * reorderbuffer earlier.
@@ -1096,43 +766,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	ReorderBufferTXN *txn;
 
 	/*
-	 * If we're not consistent yet, inspect the record to see whether it
-	 * allows to get closer to being consistent. If we are consistent, dump
-	 * our snapshot so others or we, after a restart, can use it.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-	{
-		/* returns false if there's no point in performing cleanup just yet */
-		if (!SnapBuildFindSnapshot(builder, lsn, running))
-			return;
-	}
-	else
-		SnapBuildSerialize(builder, lsn);
-
-	/*
 	 * Update range of interesting xids based on the running xacts
-	 * information. We don't increase ->xmax using it, because once we are in
-	 * a consistent state we can do that ourselves and much more efficiently
-	 * so, because we only need to do it for catalog transactions since we
-	 * only ever look at those.
-	 *
-	 * NB: We only increase xmax when a catalog modifying transaction commits
-	 * (see SnapBuildCommitTxn).  Because of this, xmax can be lower than
-	 * xmin, which looks odd but is correct and actually more efficient, since
-	 * we hit fast paths in tqual.c.
+	 * information.
 	 */
 	builder->xmin = running->oldestRunningXid;
+	builder->xmax = running->nextXid;
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
-	/* Remove transactions we don't need to keep track off anymore */
-	SnapBuildPurgeCommittedTxn(builder);
-
-	elog(DEBUG3, "xmin: %u, xmax: %u, oldestrunning: %u",
-		 builder->xmin, builder->xmax,
-		 running->oldestRunningXid);
+	elog(DEBUG3, "xmin: %u, xmax: %u",
+		 builder->xmin, builder->xmax);
+	Assert(lsn != InvalidXLogRecPtr);
 
 	/*
-	 * Increase shared memory limits, so vacuum can work on tuples we
-	 * prevented from being pruned till now.
+	 * Increase shared memory limits, so vacuum can work on tuples we prevented
+	 * from being pruned till now.
 	 */
 	LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
 
@@ -1148,12 +795,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 * beginning. That point is where we can restart from.
 	 */
 
-	/*
-	 * Can't know about a serialized snapshot's location if we're not
-	 * consistent.
-	 */
 	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
+		builder->state = SNAPBUILD_CONSISTENT;
 
 	txn = ReorderBufferGetOldestTXN(builder->reorder);
 
@@ -1163,781 +806,4 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 */
 	if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
 		LogicalIncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);
-
-	/*
-	 * No in-progress transaction, can reuse the last serialized snapshot if
-	 * we have one.
-	 */
-	else if (txn == NULL &&
-			 builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
-			 builder->last_serialized_snapshot != InvalidXLogRecPtr)
-		LogicalIncreaseRestartDecodingForSlot(lsn,
-											  builder->last_serialized_snapshot);
-}
-
-
-/*
- * Build the start of a snapshot that's capable of decoding the catalog.
- *
- * Helper function for SnapBuildProcessRunningXacts() while we're not yet
- * consistent.
- *
- * Returns true if there is a point in performing internal maintenance/cleanup
- * using the xl_running_xacts record.
- */
-static bool
-SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
-{
-	/* ---
-	 * Build catalog decoding snapshot incrementally using information about
-	 * the currently running transactions. There are several ways to do that:
-	 *
-	 * a) There were no running transactions when the xl_running_xacts record
-	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state while waiting on c)'s sub-states.
-	 *
-	 * b) This (in a previous run) or another decoding slot serialized a
-	 *	  snapshot to disk that we can use.  Can't use this method for the
-	 *	  initial snapshot when slot is being created and needs full snapshot
-	 *	  for export or direct use, as that snapshot will only contain catalog
-	 *	  modifying transactions.
-	 *
-	 * c) First incrementally build a snapshot for catalog tuples
-	 *	  (BUILDING_SNAPSHOT), that requires all, already in-progress,
-	 *	  transactions to finish.  Every transaction starting after that
-	 *	  (FULL_SNAPSHOT state), has enough information to be decoded.  But
-	 *	  for older running transactions no viable snapshot exists yet, so
-	 *	  CONSISTENT will only be reached once all of those have finished.
-	 * ---
-	 */
-
-	/*
-	 * xl_running_xact record is older than what we can use, we might not have
-	 * all necessary catalog rows anymore.
-	 */
-	if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
-		NormalTransactionIdPrecedes(running->oldestRunningXid,
-									builder->initial_xmin_horizon))
-	{
-		ereport(DEBUG1,
-				(errmsg_internal("skipping snapshot at %X/%X while building logical decoding snapshot, xmin horizon too low",
-								 (uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail_internal("initial xmin horizon of %u vs the snapshot's %u",
-									builder->initial_xmin_horizon, running->oldestRunningXid)));
-
-
-		SnapBuildWaitSnapshot(running, builder->initial_xmin_horizon);
-
-		return true;
-	}
-
-	/*
-	 * a) No transaction were running, we can jump to consistent.
-	 *
-	 * This is not affected by races around xl_running_xacts, because we can
-	 * miss transaction commits, but currently not transactions starting.
-	 *
-	 * NB: We might have already started to incrementally assemble a snapshot,
-	 * so we need to be careful to deal with that.
-	 */
-	if (running->oldestRunningXid == running->nextXid)
-	{
-		if (builder->start_decoding_at == InvalidXLogRecPtr ||
-			builder->start_decoding_at <= lsn)
-			/* can decode everything after this */
-			builder->start_decoding_at = lsn + 1;
-
-		/* As no transactions were running xmin/xmax can be trivially set. */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no running transactions.")));
-
-		return false;
-	}
-	/* b) valid on disk state and not building full snapshot */
-	else if (!builder->building_full_snapshot &&
-			 SnapBuildRestore(builder, lsn))
-	{
-		/* there won't be any state to cleanup */
-		return false;
-	}
-
-	/*
-	 * c) transition from START to BUILDING_SNAPSHOT.
-	 *
-	 * In START state, and a xl_running_xacts record with running xacts is
-	 * encountered.  In that case, switch to BUILDING_SNAPSHOT state, and
-	 * record xl_running_xacts->nextXid.  Once all running xacts have finished
-	 * (i.e. they're all >= nextXid), we have a complete catalog snapshot.  It
-	 * might look that we could use xl_running_xact's ->xids information to
-	 * get there quicker, but that is problematic because transactions marked
-	 * as running, might already have inserted their commit record - it's
-	 * infeasible to change that with locking.
-	 */
-	else if (builder->state == SNAPBUILD_START)
-	{
-		builder->state = SNAPBUILD_BUILDING_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		/*
-		 * Start with an xmin/xmax that's correct for future, when all the
-		 * currently running transactions have finished. We'll update both
-		 * while waiting for the pending transactions to finish.
-		 */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial starting point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from BUILDING_SNAPSHOT to FULL_SNAPSHOT.
-	 *
-	 * In BUILDING_SNAPSHOT state, and this xl_running_xacts' oldestRunningXid
-	 * is >= than nextXid from when we switched to BUILDING_SNAPSHOT.  This
-	 * means all transactions starting afterwards have enough information to
-	 * be decoded.  Switch to FULL_SNAPSHOT.
-	 */
-	else if (builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_FULL_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from FULL_SNAPSHOT to CONSISTENT.
-	 *
-	 * In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
-	 * oldestRunningXid is >= than nextXid from when we switched to
-	 * FULL_SNAPSHOT.  This means all transactions that are currently in
-	 * progress have a catalog snapshot, and all their changes have been
-	 * collected.  Switch to CONSISTENT.
-	 */
-	else if (builder->state == SNAPBUILD_FULL_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no old transactions anymore.")));
-	}
-
-	/*
-	 * We already started to track running xacts and need to wait for all
-	 * in-progress ones to finish. We fall through to the normal processing of
-	 * records so incremental cleanup can be performed.
-	 */
-	return true;
-
-}
-
-/* ---
- * Iterate through xids in record, wait for all older than the cutoff to
- * finish.  Then, if possible, log a new xl_running_xacts record.
- *
- * This isn't required for the correctness of decoding, but to:
- * a) allow isolationtester to notice that we're currently waiting for
- *	  something.
- * b) log a new xl_running_xacts record where it'd be helpful, without having
- *	  to write for bgwriter or checkpointer.
- * ---
- */
-static void
-SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff)
-{
-	int			off;
-
-	for (off = 0; off < running->xcnt; off++)
-	{
-		TransactionId xid = running->xids[off];
-
-		/*
-		 * Upper layers should prevent that we ever need to wait on ourselves.
-		 * Check anyway, since failing to do so would either result in an
-		 * endless wait or an Assert() failure.
-		 */
-		if (TransactionIdIsCurrentTransactionId(xid))
-			elog(ERROR, "waiting for ourselves");
-
-		if (TransactionIdFollows(xid, cutoff))
-			continue;
-
-		XactLockTableWait(xid, NULL, NULL, XLTW_None);
-	}
-
-	/*
-	 * All transactions we needed to finish finished - try to ensure there is
-	 * another xl_running_xacts record in a timely manner, without having to
-	 * write for bgwriter or checkpointer to log one.  During recovery we
-	 * can't enforce that, so we'll have to wait.
-	 */
-	if (!RecoveryInProgress())
-	{
-		LogStandbySnapshot();
-	}
-}
-
-/* -----------------------------------
- * Snapshot serialization support
- * -----------------------------------
- */
-
-/*
- * We store current state of struct SnapBuild on disk in the following manner:
- *
- * struct SnapBuildOnDisk;
- * TransactionId * running.xcnt_space;
- * TransactionId * committed.xcnt; (*not xcnt_space*)
- *
- */
-typedef struct SnapBuildOnDisk
-{
-	/* first part of this struct needs to be version independent */
-
-	/* data not covered by checksum */
-	uint32		magic;
-	pg_crc32c	checksum;
-
-	/* data covered by checksum */
-
-	/* version, in case we want to support pg_upgrade */
-	uint32		version;
-	/* how large is the on disk data, excluding the constant sized part */
-	uint32		length;
-
-	/* version dependent part */
-	SnapBuild	builder;
-
-	/* variable amount of TransactionIds follows */
-} SnapBuildOnDisk;
-
-#define SnapBuildOnDiskConstantSize \
-	offsetof(SnapBuildOnDisk, builder)
-#define SnapBuildOnDiskNotChecksummedSize \
-	offsetof(SnapBuildOnDisk, version)
-
-#define SNAPBUILD_MAGIC 0x51A1E001
-#define SNAPBUILD_VERSION 2
-
-/*
- * Store/Load a snapshot from disk, depending on the snapshot builder's state.
- *
- * Supposed to be used by external (i.e. not snapbuild.c) code that just read
- * a record that's a potential location for a serialized snapshot.
- */
-void
-SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn)
-{
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		SnapBuildRestore(builder, lsn);
-	else
-		SnapBuildSerialize(builder, lsn);
-}
-
-/*
- * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
- * been done by another decoding process.
- */
-static void
-SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
-{
-	Size		needed_length;
-	SnapBuildOnDisk *ondisk;
-	char	   *ondisk_c;
-	int			fd;
-	char		tmppath[MAXPGPATH];
-	char		path[MAXPGPATH];
-	int			ret;
-	struct stat stat_buf;
-	Size		sz;
-
-	Assert(lsn != InvalidXLogRecPtr);
-	Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
-		   builder->last_serialized_snapshot <= lsn);
-
-	/*
-	 * no point in serializing if we cannot continue to work immediately after
-	 * restoring the snapshot
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
-
-	/*
-	 * We identify snapshots by the LSN they are valid for. We don't need to
-	 * include timelines in the name as each LSN maps to exactly one timeline
-	 * unless the user used pg_resetwal or similar. If a user did so, there's
-	 * no hope continuing to decode anyway.
-	 */
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	/*
-	 * first check whether some other backend already has written the snapshot
-	 * for this LSN. It's perfectly fine if there's none, so we accept ENOENT
-	 * as a valid state. Everything else is an unexpected error.
-	 */
-	ret = stat(path, &stat_buf);
-
-	if (ret != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errmsg("could not stat file \"%s\": %m", path)));
-
-	else if (ret == 0)
-	{
-		/*
-		 * somebody else has already serialized to this point, don't overwrite
-		 * but remember location, so we don't need to read old data again.
-		 *
-		 * To be sure it has been synced to disk after the rename() from the
-		 * tempfile filename to the real filename, we just repeat the fsync.
-		 * That ought to be cheap because in most scenarios it should already
-		 * be safely on disk.
-		 */
-		fsync_fname(path, false);
-		fsync_fname("pg_logical/snapshots", true);
-
-		builder->last_serialized_snapshot = lsn;
-		goto out;
-	}
-
-	/*
-	 * there is an obvious race condition here between the time we stat(2) the
-	 * file and us writing the file. But we rename the file into place
-	 * atomically and all files created need to contain the same data anyway,
-	 * so this is perfectly fine, although a bit of a resource waste. Locking
-	 * seems like pointless complication.
-	 */
-	elog(DEBUG1, "serializing snapshot to %s", path);
-
-	/* to make sure only we will write to this tempfile, include pid */
-	sprintf(tmppath, "pg_logical/snapshots/%X-%X.snap.%u.tmp",
-			(uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
-
-	/*
-	 * Unlink temporary file if it already exists, needs to have been before a
-	 * crash/error since we won't enter this function twice from within a
-	 * single decoding slot/backend and the temporary file contains the pid of
-	 * the current process.
-	 */
-	if (unlink(tmppath) != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not remove file \"%s\": %m", path)));
-
-	needed_length = sizeof(SnapBuildOnDisk) +
-		sizeof(TransactionId) * builder->committed.xcnt;
-
-	ondisk_c = MemoryContextAllocZero(builder->context, needed_length);
-	ondisk = (SnapBuildOnDisk *) ondisk_c;
-	ondisk->magic = SNAPBUILD_MAGIC;
-	ondisk->version = SNAPBUILD_VERSION;
-	ondisk->length = needed_length;
-	INIT_CRC32C(ondisk->checksum);
-	COMP_CRC32C(ondisk->checksum,
-				((char *) ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-	ondisk_c += sizeof(SnapBuildOnDisk);
-
-	memcpy(&ondisk->builder, builder, sizeof(SnapBuild));
-	/* NULL-ify memory-only data */
-	ondisk->builder.context = NULL;
-	ondisk->builder.snapshot = NULL;
-	ondisk->builder.reorder = NULL;
-	ondisk->builder.committed.xip = NULL;
-
-	COMP_CRC32C(ondisk->checksum,
-				&ondisk->builder,
-				sizeof(SnapBuild));
-
-	/* there shouldn't be any running xacts */
-	Assert(builder->was_running.was_xcnt == 0);
-
-	/* copy committed xacts */
-	sz = sizeof(TransactionId) * builder->committed.xcnt;
-	memcpy(ondisk_c, builder->committed.xip, sz);
-	COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
-	ondisk_c += sz;
-
-	FIN_CRC32C(ondisk->checksum);
-
-	/* we have valid data now, open tempfile and write it there */
-	fd = OpenTransientFile(tmppath,
-						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY,
-						   S_IRUSR | S_IWUSR);
-	if (fd < 0)
-		ereport(ERROR,
-				(errmsg("could not open file \"%s\": %m", path)));
-
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
-	if ((write(fd, ondisk, needed_length)) != needed_length)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-
-	/*
-	 * fsync the file before renaming so that even if we crash after this we
-	 * have either a fully valid file or nothing.
-	 *
-	 * TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
-	 * some noticeable overhead since it's performed synchronously during
-	 * decoding?
-	 */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
-	if (pg_fsync(fd) != 0)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not fsync file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-	CloseTransientFile(fd);
-
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * We may overwrite the work from some other backend, but that's ok, our
-	 * snapshot is valid as well, we'll just have done some superfluous work.
-	 */
-	if (rename(tmppath, path) != 0)
-	{
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not rename file \"%s\" to \"%s\": %m",
-						tmppath, path)));
-	}
-
-	/* make sure we persist */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
-	 * as a serialization point.
-	 */
-	builder->last_serialized_snapshot = lsn;
-
-out:
-	ReorderBufferSetRestartPoint(builder->reorder,
-								 builder->last_serialized_snapshot);
-}
-
-/*
- * Restore a snapshot into 'builder' if previously one has been stored at the
- * location indicated by 'lsn'. Returns true if successful, false otherwise.
- */
-static bool
-SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
-{
-	SnapBuildOnDisk ondisk;
-	int			fd;
-	char		path[MAXPGPATH];
-	Size		sz;
-	int			readBytes;
-	pg_crc32c	checksum;
-
-	/* no point in loading a snapshot if we're already there */
-	if (builder->state == SNAPBUILD_CONSISTENT)
-		return false;
-
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
-
-	if (fd < 0 && errno == ENOENT)
-		return false;
-	else if (fd < 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", path)));
-
-	/* ----
-	 * Make sure the snapshot had been stored safely to disk, that's normally
-	 * cheap.
-	 * Note that we do not need PANIC here, nobody will be able to use the
-	 * slot without fsyncing, and saving it won't succeed without an fsync()
-	 * either...
-	 * ----
-	 */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-
-	/* read statically sized portion of snapshot */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
-	pgstat_report_wait_end();
-	if (readBytes != SnapBuildOnDiskConstantSize)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) SnapBuildOnDiskConstantSize)));
-	}
-
-	if (ondisk.magic != SNAPBUILD_MAGIC)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has wrong magic number: %u instead of %u",
-						path, ondisk.magic, SNAPBUILD_MAGIC)));
-
-	if (ondisk.version != SNAPBUILD_VERSION)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has unsupported version: %u instead of %u",
-						path, ondisk.version, SNAPBUILD_VERSION)));
-
-	INIT_CRC32C(checksum);
-	COMP_CRC32C(checksum,
-				((char *) &ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-
-	/* read SnapBuild */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk.builder, sizeof(SnapBuild));
-	pgstat_report_wait_end();
-	if (readBytes != sizeof(SnapBuild))
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sizeof(SnapBuild))));
-	}
-	COMP_CRC32C(checksum, &ondisk.builder, sizeof(SnapBuild));
-
-	/* restore running xacts (dead, but kept for backward compat) */
-	sz = sizeof(TransactionId) * ondisk.builder.was_running.was_xcnt_space;
-	ondisk.builder.was_running.was_xip =
-		MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.was_running.was_xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.was_running.was_xip, sz);
-
-	/* restore committed xacts information */
-	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
-	ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.committed.xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.committed.xip, sz);
-
-	CloseTransientFile(fd);
-
-	FIN_CRC32C(checksum);
-
-	/* verify checksum of what we've read */
-	if (!EQ_CRC32C(checksum, ondisk.checksum))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("checksum mismatch for snapbuild state file \"%s\": is %u, should be %u",
-						path, checksum, ondisk.checksum)));
-
-	/*
-	 * ok, we now have a sensible snapshot here, figure out if it has more
-	 * information than we have.
-	 */
-
-	/*
-	 * We are only interested in consistent snapshots for now, comparing
-	 * whether one incomplete snapshot is more "advanced" seems to be
-	 * unnecessarily complex.
-	 */
-	if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
-		goto snapshot_not_interesting;
-
-	/*
-	 * Don't use a snapshot that requires an xmin that we cannot guarantee to
-	 * be available.
-	 */
-	if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
-		goto snapshot_not_interesting;
-
-
-	/* ok, we think the snapshot is sensible, copy over everything important */
-	builder->xmin = ondisk.builder.xmin;
-	builder->xmax = ondisk.builder.xmax;
-	builder->state = ondisk.builder.state;
-
-	builder->committed.xcnt = ondisk.builder.committed.xcnt;
-	/* We only allocated/stored xcnt, not xcnt_space xids ! */
-	/* don't overwrite preallocated xip, if we don't have anything here */
-	if (builder->committed.xcnt > 0)
-	{
-		pfree(builder->committed.xip);
-		builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
-		builder->committed.xip = ondisk.builder.committed.xip;
-	}
-	ondisk.builder.committed.xip = NULL;
-
-	/* our snapshot is not interesting anymore, build a new one */
-	if (builder->snapshot != NULL)
-	{
-		SnapBuildSnapDecRefcount(builder->snapshot);
-	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
-	SnapBuildSnapIncRefcount(builder->snapshot);
-
-	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-
-	Assert(builder->state == SNAPBUILD_CONSISTENT);
-
-	ereport(LOG,
-			(errmsg("logical decoding found consistent point at %X/%X",
-					(uint32) (lsn >> 32), (uint32) lsn),
-			 errdetail("Logical decoding will begin using saved snapshot.")));
-	return true;
-
-snapshot_not_interesting:
-	if (ondisk.builder.committed.xip != NULL)
-		pfree(ondisk.builder.committed.xip);
-	return false;
-}
-
-/*
- * Remove all serialized snapshots that are not required anymore because no
- * slot can need them. This doesn't actually have to run during a checkpoint,
- * but it's a convenient point to schedule this.
- *
- * NB: We run this during checkpoints even if logical decoding is disabled so
- * we cleanup old slots at some point after it got disabled.
- */
-void
-CheckPointSnapBuild(void)
-{
-	XLogRecPtr	cutoff;
-	XLogRecPtr	redo;
-	DIR		   *snap_dir;
-	struct dirent *snap_de;
-	char		path[MAXPGPATH + 21];
-
-	/*
-	 * We start off with a minimum of the last redo pointer. No new
-	 * replication slot will start before that, so that's a safe upper bound
-	 * for removal.
-	 */
-	redo = GetRedoRecPtr();
-
-	/* now check for the restart ptrs from existing slots */
-	cutoff = ReplicationSlotsComputeLogicalRestartLSN();
-
-	/* don't start earlier than the restart lsn */
-	if (redo < cutoff)
-		cutoff = redo;
-
-	snap_dir = AllocateDir("pg_logical/snapshots");
-	while ((snap_de = ReadDir(snap_dir, "pg_logical/snapshots")) != NULL)
-	{
-		uint32		hi;
-		uint32		lo;
-		XLogRecPtr	lsn;
-		struct stat statbuf;
-
-		if (strcmp(snap_de->d_name, ".") == 0 ||
-			strcmp(snap_de->d_name, "..") == 0)
-			continue;
-
-		snprintf(path, sizeof(path), "pg_logical/snapshots/%s", snap_de->d_name);
-
-		if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
-		{
-			elog(DEBUG1, "only regular files expected: %s", path);
-			continue;
-		}
-
-		/*
-		 * temporary filenames from SnapBuildSerialize() include the LSN and
-		 * everything but are postfixed by .$pid.tmp. We can just remove them
-		 * the same as other files because there can be none that are
-		 * currently being written that are older than cutoff.
-		 *
-		 * We just log a message if a file doesn't fit the pattern, it's
-		 * probably some editors lock/state file or similar...
-		 */
-		if (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2)
-		{
-			ereport(LOG,
-					(errmsg("could not parse file name \"%s\"", path)));
-			continue;
-		}
-
-		lsn = ((uint64) hi) << 32 | lo;
-
-		/* check whether we still need it */
-		if (lsn < cutoff || cutoff == InvalidXLogRecPtr)
-		{
-			elog(DEBUG1, "removing snapbuild snapshot %s", path);
-
-			/*
-			 * It's not particularly harmful, though strange, if we can't
-			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be a cure worse than the disease.
-			 */
-			if (unlink(path) < 0)
-			{
-				ereport(LOG,
-						(errcode_for_file_access(),
-						 errmsg("could not remove file \"%s\": %m",
-								path)));
-				continue;
-			}
-		}
-	}
-	FreeDir(snap_dir);
 }
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed143e0..4e9f14090f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,10 +16,10 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
-#include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -127,8 +127,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ProcGlobalShmemSize());
 		size = add_size(size, XLOGShmemSize());
 		size = add_size(size, CLOGShmemSize());
+		size = add_size(size, CSNLOGShmemSize());
 		size = add_size(size, CommitTsShmemSize());
-		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, BackgroundWorkerShmemSize());
 		size = add_size(size, MultiXactShmemSize());
@@ -219,8 +219,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 */
 	XLOGShmemInit();
 	CLOGShmemInit();
+	CSNLOGShmemInit();
 	CommitTsShmemInit();
-	SUBTRANSShmemInit();
 	MultiXactShmemInit();
 	InitBufferPool();
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ffa6180eff..c98f26df91 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -13,24 +13,14 @@
  * See notes in src/backend/access/transam/README.
  *
  * The process arrays now also include structures representing prepared
- * transactions.  The xid and subxids fields of these are valid, as are the
+ * transactions.  The xid fields of these are valid, as are the
  * myProcLocks lists.  They can be distinguished from regular backend PGPROCs
  * at need by checking for pid == 0.
  *
- * During hot standby, we also keep a list of XIDs representing transactions
- * that are known to be running in the master (or more precisely, were running
- * as of the current point in the WAL stream).  This list is kept in the
- * KnownAssignedXids array, and is updated by watching the sequence of
- * arriving XIDs.  This is necessary because if we leave those XIDs out of
- * snapshots taken for standby queries, then they will appear to be already
- * complete, leading to MVCC failures.  Note that in hot standby, the PGPROC
- * array represents standby processes, which by definition are not running
- * transactions that have XIDs.
- *
- * It is perhaps possible for a backend on the master to terminate without
- * writing an abort record for its transaction.  While that shouldn't really
- * happen, it would tie up KnownAssignedXids indefinitely, so we protect
- * ourselves by pruning the array when a valid list of running XIDs arrives.
+ * During hot standby, we update latestCompletedXid, oldestActiveXid, and
+ * latestObservedXid, as we replay transaction commit/abort and standby WAL
+ * records. Note that in hot standby, the PGPROC array represents standby
+ * processes, which by definition are not running transactions that have XIDs.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -46,7 +36,8 @@
 #include <signal.h>
 
 #include "access/clog.h"
-#include "access/subtrans.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -68,24 +59,6 @@ typedef struct ProcArrayStruct
 	int			numProcs;		/* number of valid procs entries */
 	int			maxProcs;		/* allocated size of procs array */
 
-	/*
-	 * Known assigned XIDs handling
-	 */
-	int			maxKnownAssignedXids;	/* allocated size of array */
-	int			numKnownAssignedXids;	/* current # of valid entries */
-	int			tailKnownAssignedXids;	/* index of oldest valid element */
-	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
-
-	/*
-	 * Highest subxid that has been removed from KnownAssignedXids array to
-	 * prevent overflow; or InvalidTransactionId if none.  We track this for
-	 * similar reasons to tracking overflowing cached subxids in PGXACT
-	 * entries.  Must hold exclusive ProcArrayLock to change this, and shared
-	 * lock to read it.
-	 */
-	TransactionId lastOverflowedXid;
-
 	/* oldest xmin of any replication slot */
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
@@ -101,76 +74,22 @@ static PGPROC *allProcs;
 static PGXACT *allPgXact;
 
 /*
- * Bookkeeping for tracking emulated transactions in recovery
+ * Cached values for GetRecentGlobalXmin().
+ *
+ * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
+ * InvalidTransactionId, to ensure that no one tries to use a stale
+ * value. Readers should ensure that it has been set to something else
+ * before using it.
  */
-static TransactionId *KnownAssignedXids;
-static bool *KnownAssignedXidsValid;
-static TransactionId latestObservedXid = InvalidTransactionId;
+static TransactionId RecentGlobalXmin = InvalidTransactionId;
+static TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /*
- * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
- * the highest xid that might still be running that we don't have in
- * KnownAssignedXids.
+ * Bookkeeping for tracking transactions in recovery
  */
-static TransactionId standbySnapshotPendingXmin;
-
-#ifdef XIDCACHE_DEBUG
-
-/* counters for XidCache measurement */
-static long xc_by_recent_xmin = 0;
-static long xc_by_known_xact = 0;
-static long xc_by_my_xact = 0;
-static long xc_by_latest_xid = 0;
-static long xc_by_main_xid = 0;
-static long xc_by_child_xid = 0;
-static long xc_by_known_assigned = 0;
-static long xc_no_overflow = 0;
-static long xc_slow_answer = 0;
-
-#define xc_by_recent_xmin_inc()		(xc_by_recent_xmin++)
-#define xc_by_known_xact_inc()		(xc_by_known_xact++)
-#define xc_by_my_xact_inc()			(xc_by_my_xact++)
-#define xc_by_latest_xid_inc()		(xc_by_latest_xid++)
-#define xc_by_main_xid_inc()		(xc_by_main_xid++)
-#define xc_by_child_xid_inc()		(xc_by_child_xid++)
-#define xc_by_known_assigned_inc()	(xc_by_known_assigned++)
-#define xc_no_overflow_inc()		(xc_no_overflow++)
-#define xc_slow_answer_inc()		(xc_slow_answer++)
-
-static void DisplayXidCache(void);
-#else							/* !XIDCACHE_DEBUG */
-
-#define xc_by_recent_xmin_inc()		((void) 0)
-#define xc_by_known_xact_inc()		((void) 0)
-#define xc_by_my_xact_inc()			((void) 0)
-#define xc_by_latest_xid_inc()		((void) 0)
-#define xc_by_main_xid_inc()		((void) 0)
-#define xc_by_child_xid_inc()		((void) 0)
-#define xc_by_known_assigned_inc()	((void) 0)
-#define xc_no_overflow_inc()		((void) 0)
-#define xc_slow_answer_inc()		((void) 0)
-#endif							/* XIDCACHE_DEBUG */
-
-/* Primitives for KnownAssignedXids array handling for standby */
-static void KnownAssignedXidsCompress(bool force);
-static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock);
-static bool KnownAssignedXidsSearch(TransactionId xid, bool remove);
-static bool KnownAssignedXidExists(TransactionId xid);
-static void KnownAssignedXidsRemove(TransactionId xid);
-static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids);
-static void KnownAssignedXidsRemovePreceding(TransactionId xid);
-static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
-static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
-							   TransactionId *xmin,
-							   TransactionId xmax);
-static TransactionId KnownAssignedXidsGetOldestXmin(void);
-static void KnownAssignedXidsDisplay(int trace_level);
-static void KnownAssignedXidsReset(void);
-static inline void ProcArrayEndTransactionInternal(PGPROC *proc,
-								PGXACT *pgxact, TransactionId latestXid);
-static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
+static TransactionId latestObservedXid = InvalidTransactionId;
+
+static void AdvanceOldestActiveXid(TransactionId myXid);
 
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -186,31 +105,6 @@ ProcArrayShmemSize(void)
 	size = offsetof(ProcArrayStruct, pgprocnos);
 	size = add_size(size, mul_size(sizeof(int), PROCARRAY_MAXPROCS));
 
-	/*
-	 * During Hot Standby processing we have a data structure called
-	 * KnownAssignedXids, created in shared memory. Local data structures are
-	 * also created in various backends during GetSnapshotData(),
-	 * TransactionIdIsInProgress() and GetRunningTransactionData(). All of the
-	 * main structures created in those functions must be identically sized,
-	 * since we may at times copy the whole of the data structures around. We
-	 * refer to this size as TOTAL_MAX_CACHED_SUBXIDS.
-	 *
-	 * Ideally we'd only create this structure if we were actually doing hot
-	 * standby in the current run, but we don't know that yet at the time
-	 * shared memory is being set up.
-	 */
-#define TOTAL_MAX_CACHED_SUBXIDS \
-	((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
-
-	if (EnableHotStandby)
-	{
-		size = add_size(size,
-						mul_size(sizeof(TransactionId),
-								 TOTAL_MAX_CACHED_SUBXIDS));
-		size = add_size(size,
-						mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS));
-	}
-
 	return size;
 }
 
@@ -237,12 +131,6 @@ CreateSharedProcArray(void)
 		 */
 		procArray->numProcs = 0;
 		procArray->maxProcs = PROCARRAY_MAXPROCS;
-		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
-		procArray->numKnownAssignedXids = 0;
-		procArray->tailKnownAssignedXids = 0;
-		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
-		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
 	}
@@ -250,20 +138,6 @@ CreateSharedProcArray(void)
 	allProcs = ProcGlobal->allProcs;
 	allPgXact = ProcGlobal->allPgXact;
 
-	/* Create or attach to the KnownAssignedXids arrays too, if needed */
-	if (EnableHotStandby)
-	{
-		KnownAssignedXids = (TransactionId *)
-			ShmemInitStruct("KnownAssignedXids",
-							mul_size(sizeof(TransactionId),
-									 TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-		KnownAssignedXidsValid = (bool *)
-			ShmemInitStruct("KnownAssignedXidsValid",
-							mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-	}
-
 	/* Register and initialize fields of ProcLWLockTranche */
 	LWLockRegisterTranche(LWTRANCHE_PROC, "proc");
 }
@@ -321,43 +195,15 @@ ProcArrayAdd(PGPROC *proc)
 
 /*
  * Remove the specified PGPROC from the shared array.
- *
- * When latestXid is a valid XID, we are removing a live 2PC gxact from the
- * array, and thus causing it to appear as "not running" anymore.  In this
- * case we must advance latestCompletedXid.  (This is essentially the same
- * as ProcArrayEndTransaction followed by removal of the PGPROC, but we take
- * the ProcArrayLock only once, and don't damage the content of the PGPROC;
- * twophase.c depends on the latter.)
  */
 void
-ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
+ProcArrayRemove(PGPROC *proc)
 {
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-#ifdef XIDCACHE_DEBUG
-	/* dump stats at backend shutdown, but not prepared-xact end */
-	if (proc->pid != 0)
-		DisplayXidCache();
-#endif
-
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/* Advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-	}
-	else
-	{
-		/* Shouldn't be trying to remove a live transaction here */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-	}
-
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
 		if (arrayP->pgprocnos[index] == proc->pgprocno)
@@ -386,211 +232,51 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
  * commit/abort must already be reported to WAL and pg_xact.
  *
  * proc is currently always MyProc, but we pass it explicitly for flexibility.
- * latestXid is the latest Xid among the transaction's main XID and
- * subtransactions, or InvalidTransactionId if it has no XID.  (We must ask
- * the caller to pass latestXid, instead of computing it from the PGPROC's
- * contents, because the subxid information in the PGPROC might be
- * incomplete.)
  */
 void
-ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
+ProcArrayEndTransaction(PGPROC *proc)
 {
 	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
+	TransactionId myXid;
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		/*
-		 * We must lock ProcArrayLock while clearing our advertised XID, so
-		 * that we do not exit the set of "running" transactions while someone
-		 * else is taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/*
-		 * If we can immediately acquire ProcArrayLock, we clear our own XID
-		 * and release the lock.  If not, use group XID clearing to improve
-		 * efficiency.
-		 */
-		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
-		{
-			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
-			LWLockRelease(ProcArrayLock);
-		}
-		else
-			ProcArrayGroupClearXid(proc, latestXid);
-	}
-	else
-	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		proc->lxid = InvalidLocalTransactionId;
-		pgxact->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->delayChkpt = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
+	myXid = pgxact->xid;
 
-		Assert(pgxact->nxids == 0);
-		Assert(pgxact->overflowed == false);
-	}
-}
-
-/*
- * Mark a write transaction as no longer running.
- *
- * We don't do any locking here; caller must handle that.
- */
-static inline void
-ProcArrayEndTransactionInternal(PGPROC *proc, PGXACT *pgxact,
-								TransactionId latestXid)
-{
+	/* A shared lock is enough to modify our own fields */
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
-	/* must be cleared with xid/xmin: */
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
+	/* must be cleared with xid/xmin/snapshotcsn: */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false; /* be sure this is cleared in abort */
 	proc->recoveryConflictPending = false;
 
-	/* Clear the subtransaction-XID cache too while holding the lock */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	LWLockRelease(ProcArrayLock);
+
+	/* If we were the oldest active XID, advance oldestXid */
+	if (TransactionIdIsValid(myXid))
+		AdvanceOldestActiveXid(myXid);
 
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
-/*
- * ProcArrayGroupClearXid -- group XID clearing
- *
- * When we cannot immediately acquire ProcArrayLock in exclusive mode at
- * commit time, add ourselves to a list of processes that need their XIDs
- * cleared.  The first process to add itself to the list will acquire
- * ProcArrayLock in exclusive mode and perform ProcArrayEndTransactionInternal
- * on behalf of all group members.  This avoids a great deal of contention
- * around ProcArrayLock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
- */
-static void
-ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
+void
+ProcArrayResetXmin(PGPROC *proc)
 {
-	volatile PROC_HDR *procglobal = ProcGlobal;
-	uint32		nextidx;
-	uint32		wakeidx;
-
-	/* We should definitely have an XID to clear. */
-	Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-	/* Add ourselves to the list of processes needing a group XID clear. */
-	proc->procArrayGroupMember = true;
-	proc->procArrayGroupMemberXid = latestXid;
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, nextidx);
-
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   (uint32) proc->pgprocno))
-			break;
-	}
-
-	/*
-	 * If the list was not empty, the leader will clear our XID.  It is
-	 * impossible to have followers without a leader because the first process
-	 * that has added itself to the list will always have nextidx as
-	 * INVALID_PGPROCNO.
-	 */
-	if (nextidx != INVALID_PGPROCNO)
-	{
-		int			extraWaits = 0;
-
-		/* Sleep until the leader clears our XID. */
-		pgstat_report_wait_start(WAIT_EVENT_PROCARRAY_GROUP_UPDATE);
-		for (;;)
-		{
-			/* acts as a read barrier */
-			PGSemaphoreLock(proc->sem);
-			if (!proc->procArrayGroupMember)
-				break;
-			extraWaits++;
-		}
-		pgstat_report_wait_end();
-
-		Assert(pg_atomic_read_u32(&proc->procArrayGroupNext) == INVALID_PGPROCNO);
-
-		/* Fix semaphore count for any absorbed wakeups */
-		while (extraWaits-- > 0)
-			PGSemaphoreUnlock(proc->sem);
-		return;
-	}
-
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID clearing, saving a pointer to the head of the list.  Trying
-	 * to pop elements one at a time could lead to an ABA problem.
-	 */
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   INVALID_PGPROCNO))
-			break;
-	}
-
-	/* Remember head of list so we can perform wakeups after dropping lock. */
-	wakeidx = nextidx;
-
-	/* Walk the list and clear all XIDs. */
-	while (nextidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[nextidx];
-		PGXACT	   *pgxact = &allPgXact[nextidx];
-
-		ProcArrayEndTransactionInternal(proc, pgxact, proc->procArrayGroupMemberXid);
-
-		/* Move to next proc in list. */
-		nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-	}
-
-	/* We're done with the lock now. */
-	LWLockRelease(ProcArrayLock);
+	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
 
 	/*
-	 * Now that we've released the lock, go back and wake everybody up.  We
-	 * don't do this under the lock so as to keep lock hold times to a
-	 * minimum.  The system calls we need to perform to wake other processes
-	 * up are probably much slower than the simple memory writes we did while
-	 * holding the lock.
+	 * Note we can do this without locking because we assume that storing an Xid
+	 * is atomic.
 	 */
-	while (wakeidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[wakeidx];
-
-		wakeidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, INVALID_PGPROCNO);
-
-		/* ensure all previous writes are visible before follower continues. */
-		pg_write_barrier();
-
-		proc->procArrayGroupMember = false;
+	pgxact->xmin = InvalidTransactionId;
 
-		if (proc != MyProc)
-			PGSemaphoreUnlock(proc->sem);
-	}
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
 /*
@@ -615,38 +301,47 @@ ProcArrayClearTransaction(PGPROC *proc)
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	proc->recoveryConflictPending = false;
 
 	/* redundant, but just in case */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false;
 
-	/* Clear the subtransaction-XID cache too */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	/*
+	 * We don't need to update oldestActiveXid, because the gxact entry in
+	 * the procarray is still running with the same XID.
+	 */
+
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
 /*
  * ProcArrayInitRecovery -- initialize recovery xid mgmt environment
  *
- * Remember up to where the startup process initialized the CLOG and subtrans
+ * Remember up to where the startup process initialized the CLOG and CSNLOG
  * so we can ensure it's initialized gaplessly up to the point where necessary
  * while in recovery.
  */
 void
-ProcArrayInitRecovery(TransactionId initializedUptoXID)
+ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID)
 {
 	Assert(standbyState == STANDBY_INITIALIZED);
 	Assert(TransactionIdIsNormal(initializedUptoXID));
 
 	/*
-	 * we set latestObservedXid to the xid SUBTRANS has been initialized up
+	 * we set latestObservedXid to the xid SUBTRANS (XXX csnlog?) has been initialized up
 	 * to, so we can extend it from that point onwards in
 	 * RecordKnownAssignedTransactionIds, and when we get consistent in
 	 * ProcArrayApplyRecoveryInfo().
 	 */
 	latestObservedXid = initializedUptoXID;
 	TransactionIdRetreat(latestObservedXid);
+
+	/* also initialize oldestActiveXid */
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, oldestActiveXID);
 }
 
 /*
@@ -667,20 +362,11 @@ ProcArrayInitRecovery(TransactionId initializedUptoXID)
 void
 ProcArrayApplyRecoveryInfo(RunningTransactions running)
 {
-	TransactionId *xids;
-	int			nxids;
 	TransactionId nextXid;
-	int			i;
 
 	Assert(standbyState >= STANDBY_INITIALIZED);
 	Assert(TransactionIdIsValid(running->nextXid));
 	Assert(TransactionIdIsValid(running->oldestRunningXid));
-	Assert(TransactionIdIsNormal(running->latestCompletedXid));
-
-	/*
-	 * Remove stale transactions, if any.
-	 */
-	ExpireOldKnownAssignedTransactionIds(running->oldestRunningXid);
 
 	/*
 	 * Remove stale locks, if any.
@@ -688,7 +374,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 * Locks are always assigned to the toplevel xid so we don't need to care
 	 * about subxcnt/subxids (and by extension not about ->suboverflowed).
 	 */
-	StandbyReleaseOldLocks(running->xcnt, running->xids);
+	StandbyReleaseOldLocks(running->oldestRunningXid);
 
 	/*
 	 * If our snapshot is already valid, nothing else to do...
@@ -696,51 +382,6 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	if (standbyState == STANDBY_SNAPSHOT_READY)
 		return;
 
-	/*
-	 * If our initial RunningTransactionsData had an overflowed snapshot then
-	 * we knew we were missing some subxids from our snapshot. If we continue
-	 * to see overflowed snapshots then we might never be able to start up, so
-	 * we make another test to see if our snapshot is now valid. We know that
-	 * the missing subxids are equal to or earlier than nextXid. After we
-	 * initialise we continue to apply changes during recovery, so once the
-	 * oldestRunningXid is later than the nextXid from the initial snapshot we
-	 * know that we no longer have missing information and can mark the
-	 * snapshot as valid.
-	 */
-	if (standbyState == STANDBY_SNAPSHOT_PENDING)
-	{
-		/*
-		 * If the snapshot isn't overflowed or if its empty we can reset our
-		 * pending state and use this snapshot instead.
-		 */
-		if (!running->subxid_overflow || running->xcnt == 0)
-		{
-			/*
-			 * If we have already collected known assigned xids, we need to
-			 * throw them away before we apply the recovery snapshot.
-			 */
-			KnownAssignedXidsReset();
-			standbyState = STANDBY_INITIALIZED;
-		}
-		else
-		{
-			if (TransactionIdPrecedes(standbySnapshotPendingXmin,
-									  running->oldestRunningXid))
-			{
-				standbyState = STANDBY_SNAPSHOT_READY;
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshots are now enabled");
-			}
-			else
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshot waiting for non-overflowed snapshot or "
-					 "until oldest active xid on standby is at least %u (now %u)",
-					 standbySnapshotPendingXmin,
-					 running->oldestRunningXid);
-			return;
-		}
-	}
-
 	Assert(standbyState == STANDBY_INITIALIZED);
 
 	/*
@@ -751,78 +392,10 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 */
 
 	/*
-	 * Nobody else is running yet, but take locks anyhow
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
-	 * sort them first.
-	 *
-	 * Some of the new xids are top-level xids and some are subtransactions.
-	 * We don't call SubtransSetParent because it doesn't matter yet. If we
-	 * aren't overflowed then all xids will fit in snapshot and so we don't
-	 * need subtrans. If we later overflow, an xid assignment record will add
-	 * xids to subtrans. If RunningXacts is overflowed then we don't have
-	 * enough information to correctly update subtrans anyway.
-	 */
-
-	/*
-	 * Allocate a temporary array to avoid modifying the array passed as
-	 * argument.
-	 */
-	xids = palloc(sizeof(TransactionId) * (running->xcnt + running->subxcnt));
-
-	/*
-	 * Add to the temp array any xids which have not already completed.
-	 */
-	nxids = 0;
-	for (i = 0; i < running->xcnt + running->subxcnt; i++)
-	{
-		TransactionId xid = running->xids[i];
-
-		/*
-		 * The running-xacts snapshot can contain xids that were still visible
-		 * in the procarray when the snapshot was taken, but were already
-		 * WAL-logged as completed. They're not running anymore, so ignore
-		 * them.
-		 */
-		if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
-			continue;
-
-		xids[nxids++] = xid;
-	}
-
-	if (nxids > 0)
-	{
-		if (procArray->numKnownAssignedXids != 0)
-		{
-			LWLockRelease(ProcArrayLock);
-			elog(ERROR, "KnownAssignedXids is not empty");
-		}
-
-		/*
-		 * Sort the array so that we can add them safely into
-		 * KnownAssignedXids.
-		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
-
-		/*
-		 * Add the sorted snapshot into KnownAssignedXids
-		 */
-		for (i = 0; i < nxids; i++)
-			KnownAssignedXidsAdd(xids[i], xids[i], true);
-
-		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	}
-
-	pfree(xids);
-
-	/*
-	 * latestObservedXid is at least set to the point where SUBTRANS was
+	 * latestObservedXid is at least set to the point where CSNLOG was
 	 * started up to (c.f. ProcArrayInitRecovery()) or to the biggest xid
-	 * RecordKnownAssignedTransactionIds() was called for.  Initialize
-	 * subtrans from thereon, up to nextXid - 1.
+	 * RecordKnownAssignedTransactionIds() (FIXME: gone!) was called for.  Initialize
+	 * csnlog from thereon, up to nextXid - 1.
 	 *
 	 * We need to duplicate parts of RecordKnownAssignedTransactionId() here,
 	 * because we've just added xids to the known assigned xids machinery that
@@ -832,52 +405,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	TransactionIdAdvance(latestObservedXid);
 	while (TransactionIdPrecedes(latestObservedXid, running->nextXid))
 	{
-		ExtendSUBTRANS(latestObservedXid);
+		ExtendCSNLOG(latestObservedXid);
 		TransactionIdAdvance(latestObservedXid);
 	}
 	TransactionIdRetreat(latestObservedXid);	/* = running->nextXid - 1 */
 
-	/* ----------
-	 * Now we've got the running xids we need to set the global values that
-	 * are used to track snapshots as they evolve further.
-	 *
-	 * - latestCompletedXid which will be the xmax for snapshots
-	 * - lastOverflowedXid which shows whether snapshots overflow
-	 * - nextXid
-	 *
-	 * If the snapshot overflowed, then we still initialise with what we know,
-	 * but the recovery snapshot isn't fully valid yet because we know there
-	 * are some subxids missing. We don't know the specific subxids that are
-	 * missing, so conservatively assume the last one is latestObservedXid.
-	 * ----------
-	 */
-	if (running->subxid_overflow)
-	{
-		standbyState = STANDBY_SNAPSHOT_PENDING;
-
-		standbySnapshotPendingXmin = latestObservedXid;
-		procArray->lastOverflowedXid = latestObservedXid;
-	}
-	else
-	{
-		standbyState = STANDBY_SNAPSHOT_READY;
-
-		standbySnapshotPendingXmin = InvalidTransactionId;
-	}
-
-	/*
-	 * If a transaction wrote a commit record in the gap between taking and
-	 * logging the snapshot then latestCompletedXid may already be higher than
-	 * the value from the snapshot, so check before we use the incoming value.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  running->latestCompletedXid))
-		ShmemVariableCache->latestCompletedXid = running->latestCompletedXid;
-
-	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
-
-	LWLockRelease(ProcArrayLock);
-
 	/*
 	 * ShmemVariableCache->nextXid must be beyond any observed xid.
 	 *
@@ -896,366 +428,224 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	if (standbyState == STANDBY_SNAPSHOT_READY)
-		elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
-	else
-		elog(trace_recovery(DEBUG1),
-			 "recovery snapshot waiting for non-overflowed snapshot or "
-			 "until oldest active xid on standby is at least %u (now %u)",
-			 standbySnapshotPendingXmin,
-			 running->oldestRunningXid);
+	standbyState = STANDBY_SNAPSHOT_READY;
+	elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
 }
 
 /*
- * ProcArrayApplyXidAssignment
- *		Process an XLOG_XACT_ASSIGNMENT WAL record
+ * TransactionIdIsActive -- is xid the top-level XID of an active backend?
+ *
+ * This ignores prepared transactions and subtransactions, since that's not
+ * needed for current uses.
  */
-void
-ProcArrayApplyXidAssignment(TransactionId topxid,
-							int nsubxids, TransactionId *subxids)
+bool
+TransactionIdIsActive(TransactionId xid)
 {
-	TransactionId max_xid;
+	bool		result = false;
+	ProcArrayStruct *arrayP = procArray;
 	int			i;
 
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	max_xid = TransactionIdLatest(topxid, nsubxids, subxids);
-
-	/*
-	 * Mark all the subtransactions as observed.
-	 *
-	 * NOTE: This will fail if the subxid contains too many previously
-	 * unobserved xids to fit into known-assigned-xids. That shouldn't happen
-	 * as the code stands, because xid-assignment records should never contain
-	 * more than PGPROC_MAX_CACHED_SUBXIDS entries.
-	 */
-	RecordKnownAssignedTransactionIds(max_xid);
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	/*
-	 * Notice that we update pg_subtrans with the top-level xid, rather than
-	 * the parent xid. This is a difference between normal processing and
-	 * recovery, yet is still correct in all cases. The reason is that
-	 * subtransaction commit is not marked in clog until commit processing, so
-	 * all aborted subtransactions have already been clearly marked in clog.
-	 * As a result we are able to refer directly to the top-level
-	 * transaction's state rather than skipping through all the intermediate
-	 * states in the subtransaction tree. This should be the first time we
-	 * have attempted to SubTransSetParent().
-	 */
-	for (i = 0; i < nsubxids; i++)
-		SubTransSetParent(subxids[i], topxid);
+	for (i = 0; i < arrayP->numProcs; i++)
+	{
+		int			pgprocno = arrayP->pgprocnos[i];
+		volatile PGPROC *proc = &allProcs[pgprocno];
+		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		TransactionId pxid;
 
-	/* KnownAssignedXids isn't maintained yet, so we're done for now */
-	if (standbyState == STANDBY_INITIALIZED)
-		return;
+		/* Fetch xid just once - see GetNewTransactionId */
+		pxid = pgxact->xid;
 
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		if (!TransactionIdIsValid(pxid))
+			continue;
 
-	/*
-	 * Remove subxids from known-assigned-xacts.
-	 */
-	KnownAssignedXidsRemoveTree(InvalidTransactionId, nsubxids, subxids);
+		if (proc->pid == 0)
+			continue;			/* ignore prepared transactions */
 
-	/*
-	 * Advance lastOverflowedXid to be at least the last of these subxids.
-	 */
-	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
-		procArray->lastOverflowedXid = max_xid;
+		if (TransactionIdEquals(pxid, xid))
+		{
+			result = true;
+			break;
+		}
+	}
 
 	LWLockRelease(ProcArrayLock);
+
+	return result;
 }
 
 /*
- * TransactionIdIsInProgress -- is given transaction running in some backend
- *
- * Aside from some shortcuts such as checking RecentXmin and our own Xid,
- * there are four possibilities for finding a running transaction:
- *
- * 1. The given Xid is a main transaction Id.  We will find this out cheaply
- * by looking at the PGXACT struct for each backend.
+ * AdvanceOldestActiveXid --
  *
- * 2. The given Xid is one of the cached subxact Xids in the PGPROC array.
- * We can find this out cheaply too.
- *
- * 3. In Hot Standby mode, we must search the KnownAssignedXids list to see
- * if the Xid is running on the master.
- *
- * 4. Search the SubTrans tree to find the Xid's topmost parent, and then see
- * if that is running according to PGXACT or KnownAssignedXids.  This is the
- * slowest way, but sadly it has to be done always if the others failed,
- * unless we see that the cached subxact sets are complete (none have
- * overflowed).
- *
- * ProcArrayLock has to be held while we do 1, 2, 3.  If we save the top Xids
- * while doing 1 and 3, we can release the ProcArrayLock while we do 4.
- * This buys back some concurrency (and we can't retrieve the main Xids from
- * PGXACT again anyway; see GetNewTransactionId).
+ * Advance oldestActiveXid. 'oldXid' is the current value, and it's known to be
+ * finished now.
  */
-bool
-TransactionIdIsInProgress(TransactionId xid)
+static void
+AdvanceOldestActiveXid(TransactionId myXid)
 {
-	static TransactionId *xids = NULL;
-	int			nxids = 0;
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId topxid;
-	int			i,
-				j;
+	TransactionId nextXid;
+	TransactionId xid;
+	TransactionId oldValue;
 
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.  (Note: in particular, this guarantees that
-	 * we reject InvalidTransactionId, FrozenTransactionId, etc as not
-	 * running.)
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-	{
-		xc_by_recent_xmin_inc();
-		return false;
-	}
+	oldValue = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
 
-	/*
-	 * We may have just checked the status of this transaction, so if it is
-	 * already known to be completed, we can fall out without any access to
-	 * shared memory.
-	 */
-	if (TransactionIdIsKnownCompleted(xid))
-	{
-		xc_by_known_xact_inc();
-		return false;
-	}
+	/* Quick exit if we were not the oldest active XID. */
+	if (myXid != oldValue)
+		return;
 
-	/*
-	 * Also, we can handle our own transaction (and subtransactions) without
-	 * any access to shared memory.
-	 */
-	if (TransactionIdIsCurrentTransactionId(xid))
-	{
-		xc_by_my_xact_inc();
-		return true;
-	}
+	xid = myXid;
+	TransactionIdAdvance(xid);
 
-	/*
-	 * If first time through, get workspace to remember main XIDs in. We
-	 * malloc it permanently to avoid repeated palloc/pfree overhead.
-	 */
-	if (xids == NULL)
+	for (;;)
 	{
 		/*
-		 * In hot standby mode, reserve enough space to hold all xids in the
-		 * known-assigned list. If we later finish recovery, we no longer need
-		 * the bigger array, but we don't bother to shrink it.
+		 * Current nextXid is the upper bound, if there are no transactions
+		 * active at all.
 		 */
-		int			maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
-
-		xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
-		if (xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
+		/* assume we can read nextXid atomically without holding XidGenlock. */
+		nextXid = ShmemVariableCache->nextXid;
+		/* Scan the CSN Log for the next active xid */
+		xid = CSNLogGetNextActiveXid(xid, nextXid);
 
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	/*
-	 * Now that we have the lock, we can check latestCompletedXid; if the
-	 * target Xid is after that, it's surely still running.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
-	{
-		LWLockRelease(ProcArrayLock);
-		xc_by_latest_xid_inc();
-		return true;
-	}
-
-	/* No shortcuts, gotta grovel through the array */
-	for (i = 0; i < arrayP->numProcs; i++)
-	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
-		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
-
-		/* Ignore my own proc --- dealt with it above */
-		if (proc == MyProc)
-			continue;
-
-		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
-
-		if (!TransactionIdIsValid(pxid))
-			continue;
-
-		/*
-		 * Step 1: check the main Xid
-		 */
-		if (TransactionIdEquals(pxid, xid))
-		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_main_xid_inc();
-			return true;
-		}
-
-		/*
-		 * We can ignore main Xids that are younger than the target Xid, since
-		 * the target could not possibly be their child.
-		 */
-		if (TransactionIdPrecedes(xid, pxid))
-			continue;
-
-		/*
-		 * Step 2: check the cached child-Xids arrays
-		 */
-		for (j = pgxact->nxids - 1; j >= 0; j--)
+		if (xid == oldValue)
 		{
-			/* Fetch xid just once - see GetNewTransactionId */
-			TransactionId cxid = proc->subxids.xids[j];
-
-			if (TransactionIdEquals(cxid, xid))
-			{
-				LWLockRelease(ProcArrayLock);
-				xc_by_child_xid_inc();
-				return true;
-			}
+			/* nothing more to do */
+			break;
 		}
 
 		/*
-		 * Save the main Xid for step 4.  We only need to remember main Xids
-		 * that have uncached children.  (Note: there is no race condition
-		 * here because the overflowed flag cannot be cleared, only set, while
-		 * we hold ProcArrayLock.  So we can't miss an Xid that we need to
-		 * worry about.)
+		 * Update oldestActiveXid with that value.
 		 */
-		if (pgxact->overflowed)
-			xids[nxids++] = pxid;
-	}
-
-	/*
-	 * Step 3: in hot standby mode, check the known-assigned-xids list.  XIDs
-	 * in the list must be treated as running.
-	 */
-	if (RecoveryInProgress())
-	{
-		/* none of the PGXACT entries should have XIDs in hot standby mode */
-		Assert(nxids == 0);
-
-		if (KnownAssignedXidExists(xid))
+		if (!pg_atomic_compare_exchange_u32(&ShmemVariableCache->oldestActiveXid,
+											&oldValue,
+											xid))
 		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_known_assigned_inc();
-			return true;
+			/*
+			 * Someone beat us to it. This can happen if we hit the race
+			 * condition described below. That's OK. We're no longer the oldest active
+			 * XID in that case, so we're done.
+			 */
+			Assert(TransactionIdFollows(oldValue, myXid));
+			break;
 		}
 
 		/*
-		 * If the KnownAssignedXids overflowed, we have to check pg_subtrans
-		 * too.  Fetch all xids from KnownAssignedXids that are lower than
-		 * xid, since if xid is a subtransaction its parent will always have a
-		 * lower value.  Note we will collect both main and subXIDs here, but
-		 * there's no help for it.
+		 * We're not necessarily done yet. It's possible that the XID that we saw
+		 * as still running committed just before we updated oldestActiveXid.
+		 * She didn't see herself as the oldest transaction, so she wouldn't
+		 * update oldestActiveXid. Loop back to check the XID that we saw as
+		 * the oldest in-progress one is still in-progress, and if not, update
+		 * oldestActiveXid again, on behalf of that transaction.
 		 */
-		if (TransactionIdPrecedesOrEquals(xid, procArray->lastOverflowedXid))
-			nxids = KnownAssignedXidsGet(xids, xid);
-	}
-
-	LWLockRelease(ProcArrayLock);
-
-	/*
-	 * If none of the relevant caches overflowed, we know the Xid is not
-	 * running without even looking at pg_subtrans.
-	 */
-	if (nxids == 0)
-	{
-		xc_no_overflow_inc();
-		return false;
-	}
-
-	/*
-	 * Step 4: have to check pg_subtrans.
-	 *
-	 * At this point, we know it's either a subtransaction of one of the Xids
-	 * in xids[], or it's not running.  If it's an already-failed
-	 * subtransaction, we want to say "not running" even though its parent may
-	 * still be running.  So first, check pg_xact to see if it's been aborted.
-	 */
-	xc_slow_answer_inc();
-
-	if (TransactionIdDidAbort(xid))
-		return false;
-
-	/*
-	 * It isn't aborted, so check whether the transaction tree it belongs to
-	 * is still running (or, more precisely, whether it was running when we
-	 * held ProcArrayLock).
-	 */
-	topxid = SubTransGetTopmostTransaction(xid);
-	Assert(TransactionIdIsValid(topxid));
-	if (!TransactionIdEquals(topxid, xid))
-	{
-		for (i = 0; i < nxids; i++)
-		{
-			if (TransactionIdEquals(xids[i], topxid))
-				return true;
-		}
+		oldValue = xid;
 	}
-
-	return false;
 }
 
+
 /*
- * TransactionIdIsActive -- is xid the top-level XID of an active backend?
- *
- * This differs from TransactionIdIsInProgress in that it ignores prepared
- * transactions, as well as transactions running on the master if we're in
- * hot standby.  Also, we ignore subtransactions since that's not needed
- * for current uses.
+ * This is like GetOldestXmin(NULL, true), but can return slightly stale, cached value.
  */
-bool
-TransactionIdIsActive(TransactionId xid)
+TransactionId
+GetRecentGlobalXmin(void)
 {
-	bool		result = false;
+	TransactionId globalXmin;
 	ProcArrayStruct *arrayP = procArray;
-	int			i;
+	int			index;
+	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
+	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
 
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-		return false;
+	if (TransactionIdIsValid(RecentGlobalXmin))
+		return RecentGlobalXmin;
 
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	for (i = 0; i < arrayP->numProcs; i++)
+	/*
+	 * We initialize the MIN() calculation with oldestActiveXid. This
+	 * is a lower bound for the XIDs that might appear in the ProcArray later,
+	 * and so protects us against overestimating the result due to future
+	 * additions.
+	 */
+	globalXmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+	Assert(TransactionIdIsNormal(globalXmin));
+
+	for (index = 0; index < arrayP->numProcs; index++)
 	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
+		int			pgprocno = arrayP->pgprocnos[index];
 		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
 
 		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
+		TransactionId xid = pgxact->xid;
 
-		if (!TransactionIdIsValid(pxid))
+		/*
+		 * Backend is doing logical decoding which manages xmin separately,
+		 * check below.
+		 */
+		if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
 			continue;
 
-		if (proc->pid == 0)
-			continue;			/* ignore prepared transactions */
+		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
+			continue;
 
-		if (TransactionIdEquals(pxid, xid))
-		{
-			result = true;
-			break;
-		}
+		/* First consider the transaction's own Xid, if any */
+		if (TransactionIdIsNormal(xid) &&
+			TransactionIdPrecedes(xid, globalXmin))
+			globalXmin = xid;
+
+		/*
+		 * Also consider the transaction's Xmin, if set.
+		 *
+		 * We must check both Xid and Xmin because a transaction might
+		 * have an Xmin but not (yet) an Xid; conversely, if it has an
+		 * Xid, that could determine some not-yet-set Xmin.
+		 */
+		xid = pgxact->xmin; /* Fetch just once */
+		if (TransactionIdIsNormal(xid) &&
+			TransactionIdPrecedes(xid, globalXmin))
+			globalXmin = xid;
 	}
 
+	/* fetch into volatile var while ProcArrayLock is held */
+	replication_slot_xmin = procArray->replication_slot_xmin;
+	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+
 	LWLockRelease(ProcArrayLock);
 
-	return result;
+	/* Update cached variables */
+	RecentGlobalXmin = globalXmin - vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(RecentGlobalXmin))
+		RecentGlobalXmin = FirstNormalTransactionId;
+
+	/* Check whether there's a replication slot requiring an older xmin. */
+	if (TransactionIdIsValid(replication_slot_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_xmin;
+
+	/* Non-catalog tables can be vacuumed if older than this xid */
+	RecentGlobalDataXmin = RecentGlobalXmin;
+
+	/*
+	 * Check whether there's a replication slot requiring an older catalog
+	 * xmin.
+	 */
+	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_catalog_xmin;
+
+	return RecentGlobalXmin;
 }
 
+TransactionId
+GetRecentGlobalDataXmin(void)
+{
+	if (TransactionIdIsValid(RecentGlobalDataXmin))
+		return RecentGlobalDataXmin;
+
+	(void) GetRecentGlobalXmin();
+	Assert(TransactionIdIsValid(RecentGlobalDataXmin));
+
+	return RecentGlobalDataXmin;
+}
 
 /*
  * GetOldestXmin -- returns oldest transaction that was running
@@ -1279,7 +669,7 @@ TransactionIdIsActive(TransactionId xid)
  * ignore concurrently running lazy VACUUMs because (a) they must be working
  * on other tables, and (b) they don't need to do snapshot-based lookups.
  *
- * This is also used to determine where to truncate pg_subtrans.  For that
+ * This is also used to determine where to truncate pg_csnlog. For that
  * backends in all databases have to be considered, so rel = NULL has to be
  * passed in.
  *
@@ -1310,6 +700,10 @@ TransactionIdIsActive(TransactionId xid)
  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.
+ *
+ *
+ * XXX: We track GlobalXmin in shared memory now. Would it makes sense to
+ * have GetOldestXmin() just return that? At least for the rel == NULL case.
  */
 TransactionId
 GetOldestXmin(Relation rel, int flags)
@@ -1340,7 +734,7 @@ GetOldestXmin(Relation rel, int flags)
 	 * and so protects us against overestimating the result due to future
 	 * additions.
 	 */
-	result = ShmemVariableCache->latestCompletedXid;
+	result = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
 	Assert(TransactionIdIsNormal(result));
 	TransactionIdAdvance(result);
 
@@ -1383,28 +777,11 @@ GetOldestXmin(Relation rel, int flags)
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
 
-	if (RecoveryInProgress())
-	{
-		/*
-		 * Check to see whether KnownAssignedXids contains an xid value older
-		 * than the main procarray.
-		 */
-		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
-
-		LWLockRelease(ProcArrayLock);
+	LWLockRelease(ProcArrayLock);
 
-		if (TransactionIdIsNormal(kaxmin) &&
-			TransactionIdPrecedes(kaxmin, result))
-			result = kaxmin;
-	}
-	else
+	if (!RecoveryInProgress())
 	{
 		/*
-		 * No other information needed, so release the lock immediately.
-		 */
-		LWLockRelease(ProcArrayLock);
-
-		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
 		 * being careful not to generate a "permanent" XID.
 		 *
@@ -1448,308 +825,167 @@ GetOldestXmin(Relation rel, int flags)
 }
 
 /*
- * GetMaxSnapshotXidCount -- get max size for snapshot XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotXidCount(void)
-{
-	return procArray->maxProcs;
-}
 
-/*
- * GetMaxSnapshotSubxidCount -- get max size for snapshot sub-XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotSubxidCount(void)
-{
-	return TOTAL_MAX_CACHED_SUBXIDS;
-}
+oldestActiveXid
+	oldest XID that's currently in-progress
+
+GlobalXmin
+	oldest XID that's *seen* by any active snapshot as still in-progress
+
+latestCompletedXid
+	latest XID that has committed.
+
+CSN
+	current CSN
+
+
+
+Get snapshot:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read oldestActiveXid. Store it in MyProc->xmin
+3. Read CSN
+4. LWLockRelease(ProcArrayLock)
+
+End-of-xact:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Reset MyProc->xmin, xid and CSN
+3. Was my XID == oldestActiveXid? If so, advance oldestActiveXid.
+4. Was my xmin == oldestXmin? If so, advance oldestXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceGlobalXmin:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read current oldestActiveXid. That's the upper bound. If a transaction
+   begins now, that's the xmin it would get.
+3. Scan ProcArray, for the smallest xmin.
+4. Set that as the new GlobalXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceOldestActiveXid:
+
+Two alternatives: scan the csnlog or scan the procarray. Scanning the
+procarray is tricky: it's possible that a backend has just read nextXid,
+but not set it in MyProc->xid yet.
+
+
+*/
+
+
 
 /*
- * GetSnapshotData -- returns information about running transactions.
- *
- * The returned snapshot includes xmin (lowest still-running xact ID),
- * xmax (highest completed xact ID + 1), and a list of running xact IDs
- * in the range xmin <= xid < xmax.  It is used as follows:
- *		All xact IDs < xmin are considered finished.
- *		All xact IDs >= xmax are considered still running.
- *		For an xact ID xmin <= xid < xmax, consult list to see whether
- *		it is considered running or not.
+ * GetSnapshotData -- returns an MVCC snapshot.
+ *
+ * The crux of the returned snapshot is the current Commit-Sequence-Number.
+ * All transactions that committed before the CSN is considered
+ * as visible to the snapshot, and all transactions that committed at or
+ * later are considered as still-in-progress.
+ *
+ * The returned snapshot also includes xmin (lowest still-running xact ID),
+ * and xmax (highest completed xact ID + 1). They can be used to avoid
+ * the more expensive check against the CSN:
+ *		All xact IDs < xmin are known to be finished.
+ *		All xact IDs >= xmax are known to be still running.
+ *		For an xact ID xmin <= xid < xmax, consult the CSNLOG to see
+ *		whether its CSN is before or after the snapshot's CSN.
+ *
  * This ensures that the set of transactions seen as "running" by the
  * current xact will not change after it takes the snapshot.
  *
- * All running top-level XIDs are included in the snapshot, except for lazy
- * VACUUM processes.  We also try to include running subtransaction XIDs,
- * but since PGPROC has only a limited cache area for subxact XIDs, full
- * information may not be available.  If we find any overflowed subxid arrays,
- * we have to mark the snapshot's subxid data as overflowed, and extra work
- * *may* need to be done to determine what's running (see XidInMVCCSnapshot()
- * in tqual.c).
- *
  * We also update the following backend-global variables:
  *		TransactionXmin: the oldest xmin of any snapshot in use in the
- *			current transaction (this is the same as MyPgXact->xmin).
- *		RecentXmin: the xmin computed for the most recent snapshot.  XIDs
- *			older than this are known not running any more.
+ *			current transaction.
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
- *			running transactions, except those running LAZY VACUUM).  This is
- *			the same computation done by
- *			GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM).
+ *			running transactions, except those running LAZY VACUUM). This
+ *			can be used to opportunistically remove old dead tuples.
  *		RecentGlobalDataXmin: the global xmin for non-catalog tables
  *			>= RecentGlobalXmin
- *
- * Note: this function should probably not be called with an argument that's
- * not statically allocated (see xip allocation below).
  */
 Snapshot
 GetSnapshotData(Snapshot snapshot)
 {
-	ProcArrayStruct *arrayP = procArray;
 	TransactionId xmin;
 	TransactionId xmax;
-	TransactionId globalxmin;
-	int			index;
-	int			count = 0;
-	int			subcount = 0;
-	bool		suboverflowed = false;
-	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
-	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	CommitSeqNo snapshotcsn;
+	bool		takenDuringRecovery;
 
 	Assert(snapshot != NULL);
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * This does open a possibility for avoiding repeated malloc/free: since
-	 * maxProcs does not change at runtime, we can simply reuse the previous
-	 * xip arrays if any.  (This relies on the fact that all callers pass
-	 * static SnapshotData structs.)
-	 */
-	if (snapshot->xip == NULL)
-	{
-		/*
-		 * First call for this snapshot. Snapshot is same size whether or not
-		 * we are in recovery, see later comments.
-		 */
-		snapshot->xip = (TransactionId *)
-			malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
-		if (snapshot->xip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
-		if (snapshot->subxip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	/*
-	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
-	 * going to set MyPgXact->xmin.
+	 * A shared lock is enough to modify my own entry
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	/* xmax is always latestCompletedXid + 1 */
-	xmax = ShmemVariableCache->latestCompletedXid;
-	Assert(TransactionIdIsNormal(xmax));
-	TransactionIdAdvance(xmax);
-
-	/* initialize xmin calculation with xmax */
-	globalxmin = xmin = xmax;
+	takenDuringRecovery = RecoveryInProgress();
 
-	snapshot->takenDuringRecovery = RecoveryInProgress();
+	/* Anything older than oldestActiveXid is surely finished by now. */
+	xmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
 
-	if (!snapshot->takenDuringRecovery)
+	/* Announce my xmin, to hold back GlobalXmin. */
+	if (!TransactionIdIsValid(MyPgXact->xmin))
 	{
-		int		   *pgprocnos = arrayP->pgprocnos;
-		int			numProcs;
+		TransactionId oldestActiveXid;
+
+		MyPgXact->xmin = xmin;
 
 		/*
-		 * Spin over procArray checking xid, xmin, and subxids.  The goal is
-		 * to gather all active xids, find the lowest xmin, and try to record
-		 * subxids.
+		 * Recheck, if oldestActiveXid advanced after we read it.
+		 *
+		 * This protects against a race condition with AdvanceGlobalXmin().
+		 * If a transaction ends runs AdvanceGlobalXmin(), just after we fetch
+		 * oldestActiveXid, but before we set MyPgXact->xmin, it's possible
+		 * that AdvanceGlobalXmin() computed a new GlobalXmin that doesn't
+		 * cover the xmin that we got. To fix that, check oldestActiveXid
+		 * again, after setting xmin. Redoing it once is enough, we don't need
+		 * to loop, because the (stale) xmin that we set prevents the same
+		 * race condition from advancing oldestXid again.
+		 *
+		 * For a brief moment, we can have the situation that our xmin is
+		 * lower than GlobalXmin, but it's OK because we don't use that xmin
+		 * until we've re-checked and corrected it if necessary.
 		 */
-		numProcs = arrayP->numProcs;
-		for (index = 0; index < numProcs; index++)
-		{
-			int			pgprocno = pgprocnos[index];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			TransactionId xid;
-
-			/*
-			 * Backend is doing logical decoding which manages xmin
-			 * separately, check below.
-			 */
-			if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
-				continue;
+		/*
+		 * memory barrier to make sure that setting the xmin in our PGPROC entry
+		 * is made visible to others, before the read below.
+		 */
+		pg_memory_barrier();
 
-			/* Ignore procs running LAZY VACUUM */
-			if (pgxact->vacuumFlags & PROC_IN_VACUUM)
-				continue;
+		oldestActiveXid  = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+		if (oldestActiveXid != xmin)
+		{
+			xmin = oldestActiveXid;
 
-			/* Update globalxmin to be the smallest valid xmin */
-			xid = pgxact->xmin; /* fetch just once */
-			if (TransactionIdIsNormal(xid) &&
-				NormalTransactionIdPrecedes(xid, globalxmin))
-				globalxmin = xid;
+			MyPgXact->xmin = xmin;
+		}
 
-			/* Fetch xid just once - see GetNewTransactionId */
-			xid = pgxact->xid;
+		TransactionXmin = xmin;
+	}
 
-			/*
-			 * If the transaction has no XID assigned, we can skip it; it
-			 * won't have sub-XIDs either.  If the XID is >= xmax, we can also
-			 * skip it; such transactions will be treated as running anyway
-			 * (and any sub-XIDs will also be >= xmax).
-			 */
-			if (!TransactionIdIsNormal(xid)
-				|| !NormalTransactionIdPrecedes(xid, xmax))
-				continue;
+	/*
+	 * Get the current snapshot CSN, and copy that to my PGPROC entry. This
+	 * serializes us with any concurrent commits.
+	 */
+	snapshotcsn = pg_atomic_read_u64(&ShmemVariableCache->nextCommitSeqNo);
+	if (MyPgXact->snapshotcsn == InvalidCommitSeqNo)
+		MyPgXact->snapshotcsn = snapshotcsn;
+	/*
+	 * Also get xmax. It is always latestCompletedXid + 1.
+	 * Make sure to read it after CSN (see TransactionIdAsyncCommitTree())
+	 */
+	pg_read_barrier();
+	xmax = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	Assert(TransactionIdIsNormal(xmax));
+	TransactionIdAdvance(xmax);
 
-			/*
-			 * We don't include our own XIDs (if any) in the snapshot, but we
-			 * must include them in xmin.
-			 */
-			if (NormalTransactionIdPrecedes(xid, xmin))
-				xmin = xid;
-			if (pgxact == MyPgXact)
-				continue;
+	LWLockRelease(ProcArrayLock);
 
-			/* Add XID to snapshot. */
-			snapshot->xip[count++] = xid;
-
-			/*
-			 * Save subtransaction XIDs if possible (if we've already
-			 * overflowed, there's no point).  Note that the subxact XIDs must
-			 * be later than their parent, so no need to check them against
-			 * xmin.  We could filter against xmax, but it seems better not to
-			 * do that much work while holding the ProcArrayLock.
-			 *
-			 * The other backend can add more subxids concurrently, but cannot
-			 * remove any.  Hence it's important to fetch nxids just once.
-			 * Should be safe to use memcpy, though.  (We needn't worry about
-			 * missing any xids added concurrently, because they must postdate
-			 * xmax.)
-			 *
-			 * Again, our own XIDs are not included in the snapshot.
-			 */
-			if (!suboverflowed)
-			{
-				if (pgxact->overflowed)
-					suboverflowed = true;
-				else
-				{
-					int			nxids = pgxact->nxids;
-
-					if (nxids > 0)
-					{
-						volatile PGPROC *proc = &allProcs[pgprocno];
-
-						memcpy(snapshot->subxip + subcount,
-							   (void *) proc->subxids.xids,
-							   nxids * sizeof(TransactionId));
-						subcount += nxids;
-					}
-				}
-			}
-		}
-	}
-	else
-	{
-		/*
-		 * We're in hot standby, so get XIDs from KnownAssignedXids.
-		 *
-		 * We store all xids directly into subxip[]. Here's why:
-		 *
-		 * In recovery we don't know which xids are top-level and which are
-		 * subxacts, a design choice that greatly simplifies xid processing.
-		 *
-		 * It seems like we would want to try to put xids into xip[] only, but
-		 * that is fairly small. We would either need to make that bigger or
-		 * to increase the rate at which we WAL-log xid assignment; neither is
-		 * an appealing choice.
-		 *
-		 * We could try to store xids into xip[] first and then into subxip[]
-		 * if there are too many xids. That only works if the snapshot doesn't
-		 * overflow because we do not search subxip[] in that case. A simpler
-		 * way is to just store all xids in the subxact array because this is
-		 * by far the bigger array. We just leave the xip array empty.
-		 *
-		 * Either way we need to change the way XidInMVCCSnapshot() works
-		 * depending upon when the snapshot was taken, or change normal
-		 * snapshot processing so it matches.
-		 *
-		 * Note: It is possible for recovery to end before we finish taking
-		 * the snapshot, and for newly assigned transaction ids to be added to
-		 * the ProcArray.  xmax cannot change while we hold ProcArrayLock, so
-		 * those newly added transaction ids would be filtered away, so we
-		 * need not be concerned about them.
-		 */
-		subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
-												  xmax);
-
-		if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
-			suboverflowed = true;
-	}
-
-
-	/* fetch into volatile var while ProcArrayLock is held */
-	replication_slot_xmin = procArray->replication_slot_xmin;
-	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
-
-	if (!TransactionIdIsValid(MyPgXact->xmin))
-		MyPgXact->xmin = TransactionXmin = xmin;
-
-	LWLockRelease(ProcArrayLock);
-
-	/*
-	 * Update globalxmin to include actual process xids.  This is a slightly
-	 * different way of computing it than GetOldestXmin uses, but should give
-	 * the same result.
-	 */
-	if (TransactionIdPrecedes(xmin, globalxmin))
-		globalxmin = xmin;
-
-	/* Update global variables too */
-	RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(RecentGlobalXmin))
-		RecentGlobalXmin = FirstNormalTransactionId;
-
-	/* Check whether there's a replication slot requiring an older xmin. */
-	if (TransactionIdIsValid(replication_slot_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_xmin;
-
-	/* Non-catalog tables can be vacuumed if older than this xid */
-	RecentGlobalDataXmin = RecentGlobalXmin;
-
-	/*
-	 * Check whether there's a replication slot requiring an older catalog
-	 * xmin.
-	 */
-	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_catalog_xmin;
-
-	RecentXmin = xmin;
-
-	snapshot->xmin = xmin;
-	snapshot->xmax = xmax;
-	snapshot->xcnt = count;
-	snapshot->subxcnt = subcount;
-	snapshot->suboverflowed = suboverflowed;
-
-	snapshot->curcid = GetCurrentCommandId(false);
+	snapshot->xmin = xmin;
+	snapshot->xmax = xmax;
+	snapshot->snapshotcsn = snapshotcsn;
+	snapshot->curcid = GetCurrentCommandId(false);
+	snapshot->takenDuringRecovery = takenDuringRecovery;
 
 	/*
 	 * This is a new snapshot, so set both refcounts are zero, and mark it as
@@ -1805,8 +1041,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin,
 	if (!sourcevxid)
 		return false;
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1878,8 +1116,10 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 	Assert(TransactionIdIsNormal(xmin));
 	Assert(proc != NULL);
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	pgxact = &allPgXact[proc->pgprocno];
 
@@ -1906,29 +1146,24 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 /*
  * GetRunningTransactionData -- returns information about running transactions.
  *
- * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * Returns the oldest running TransactionId among all backends, even VACUUM
+ * processes.
+ *
+ * We acquire XidGenlock, but the caller is responsible for releasing it.
+ * Acquiring XidGenLock ensures that no new XID can be assigned until
+ * the caller has WAL-logged this snapshot, and releases the lock.
+ * FIXME: this also used to hold ProcArrayLock, to prevent any transactions
+ * from committing until the caller has WAL-logged. I don't think we need
+ * that anymore, but verify.
  *
- * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
- * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
- * array until the caller has WAL-logged this snapshot, and releases the
- * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
- * lock is released.
+ * Returns the current xmin and xmax, like GetSnapshotData does.
  *
  * The returned data structure is statically allocated; caller should not
  * modify it, and must not assume it is valid past the next call.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
- *
- * Note that if any transaction has overflowed its cached subtransactions
- * then there is no real need include any subtransactions. That isn't a
- * common enough case to worry about optimising the size of the WAL record,
- * and we may wish to see that data for diagnostic purposes anyway.
  */
 RunningTransactions
 GetRunningTransactionData(void)
@@ -1938,52 +1173,18 @@ GetRunningTransactionData(void)
 
 	ProcArrayStruct *arrayP = procArray;
 	RunningTransactions CurrentRunningXacts = &CurrentRunningXactsData;
-	TransactionId latestCompletedXid;
 	TransactionId oldestRunningXid;
-	TransactionId *xids;
 	int			index;
-	int			count;
-	int			subcount;
-	bool		suboverflowed;
 
 	Assert(!RecoveryInProgress());
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * Should only be allocated in bgwriter, since only ever executed during
-	 * checkpoints.
-	 */
-	if (CurrentRunningXacts->xids == NULL)
-	{
-		/*
-		 * First call
-		 */
-		CurrentRunningXacts->xids = (TransactionId *)
-			malloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
-		if (CurrentRunningXacts->xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	xids = CurrentRunningXacts->xids;
-
-	count = subcount = 0;
-	suboverflowed = false;
-
-	/*
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
-	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
-
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
 	/*
@@ -2005,47 +1206,8 @@ GetRunningTransactionData(void)
 		if (!TransactionIdIsValid(xid))
 			continue;
 
-		xids[count++] = xid;
-
 		if (TransactionIdPrecedes(xid, oldestRunningXid))
 			oldestRunningXid = xid;
-
-		if (pgxact->overflowed)
-			suboverflowed = true;
-	}
-
-	/*
-	 * Spin over procArray collecting all subxids, but only if there hasn't
-	 * been a suboverflow.
-	 */
-	if (!suboverflowed)
-	{
-		for (index = 0; index < arrayP->numProcs; index++)
-		{
-			int			pgprocno = arrayP->pgprocnos[index];
-			volatile PGPROC *proc = &allProcs[pgprocno];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			int			nxids;
-
-			/*
-			 * Save subtransaction XIDs. Other backends can't add or remove
-			 * entries while we're holding XidGenLock.
-			 */
-			nxids = pgxact->nxids;
-			if (nxids > 0)
-			{
-				memcpy(&xids[count], (void *) proc->subxids.xids,
-					   nxids * sizeof(TransactionId));
-				count += nxids;
-				subcount += nxids;
-
-				/*
-				 * Top-level XID of a transaction is always less than any of
-				 * its subxids, so we don't need to check if any of the
-				 * subxids are smaller than oldestRunningXid
-				 */
-			}
-		}
 	}
 
 	/*
@@ -2057,18 +1219,14 @@ GetRunningTransactionData(void)
 	 * increases if slots do.
 	 */
 
-	CurrentRunningXacts->xcnt = count - subcount;
-	CurrentRunningXacts->subxcnt = subcount;
-	CurrentRunningXacts->subxid_overflow = suboverflowed;
 	CurrentRunningXacts->nextXid = ShmemVariableCache->nextXid;
 	CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
-	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
-	Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
 
-	/* We don't release the locks here, the caller is responsible for that */
+	LWLockRelease(ProcArrayLock);
+	/* We don't release XidGenLock here, the caller is responsible for that */
 
 	return CurrentRunningXacts;
 }
@@ -2076,17 +1234,18 @@ GetRunningTransactionData(void)
 /*
  * GetOldestActiveTransactionId()
  *
- * Similar to GetSnapshotData but returns just oldestActiveXid. We include
+ * Returns the oldest XID that's still running. We include
  * all PGXACTs with an assigned TransactionId, even VACUUM processes.
  * We look at all databases, though there is no need to include WALSender
  * since this has no effect on hot standby conflicts.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
+ *
+ * XXX: We could just use return ShmemVariableCache->oldestActiveXid. this
+ * uses a different method of computing the value though, so maybe this is
+ * useful as a cross-check?
  */
 TransactionId
 GetOldestActiveTransactionId(void)
@@ -2541,7 +1700,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
  *
  * All callers that are checking xmins always now supply a valid and useful
  * value for limitXmin. The limitXmin is always lower than the lowest
- * numbered KnownAssignedXid that is not already a FATAL error. This is
+ * numbered KnownAssignedXid (XXX) that is not already a FATAL error. This is
  * because we only care about cleanup records that are cleaning up tuple
  * versions from committed transactions. In that case they will only occur
  * at the point where the record is less than the lowest running xid. That
@@ -2997,170 +2156,9 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
-
-#define XidCacheRemove(i) \
-	do { \
-		MyProc->subxids.xids[i] = MyProc->subxids.xids[MyPgXact->nxids - 1]; \
-		MyPgXact->nxids--; \
-	} while (0)
-
-/*
- * XidCacheRemoveRunningXids
- *
- * Remove a bunch of TransactionIds from the list of known-running
- * subtransactions for my backend.  Both the specified xid and those in
- * the xids[] array (of length nxids) are removed from the subxids cache.
- * latestXid must be the latest XID among the group.
- */
-void
-XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid)
-{
-	int			i,
-				j;
-
-	Assert(TransactionIdIsValid(xid));
-
-	/*
-	 * We must hold ProcArrayLock exclusively in order to remove transactions
-	 * from the PGPROC array.  (See src/backend/access/transam/README.)  It's
-	 * possible this could be relaxed since we know this routine is only used
-	 * to abort subtransactions, but pending closer analysis we'd best be
-	 * conservative.
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Under normal circumstances xid and xids[] will be in increasing order,
-	 * as will be the entries in subxids.  Scan backwards to avoid O(N^2)
-	 * behavior when removing a lot of xids.
-	 */
-	for (i = nxids - 1; i >= 0; i--)
-	{
-		TransactionId anxid = xids[i];
-
-		for (j = MyPgXact->nxids - 1; j >= 0; j--)
-		{
-			if (TransactionIdEquals(MyProc->subxids.xids[j], anxid))
-			{
-				XidCacheRemove(j);
-				break;
-			}
-		}
-
-		/*
-		 * Ordinarily we should have found it, unless the cache has
-		 * overflowed. However it's also possible for this routine to be
-		 * invoked multiple times for the same subtransaction, in case of an
-		 * error during AbortSubTransaction.  So instead of Assert, emit a
-		 * debug warning.
-		 */
-		if (j < 0 && !MyPgXact->overflowed)
-			elog(WARNING, "did not find subXID %u in MyProc", anxid);
-	}
-
-	for (j = MyPgXact->nxids - 1; j >= 0; j--)
-	{
-		if (TransactionIdEquals(MyProc->subxids.xids[j], xid))
-		{
-			XidCacheRemove(j);
-			break;
-		}
-	}
-	/* Ordinarily we should have found it, unless the cache has overflowed */
-	if (j < 0 && !MyPgXact->overflowed)
-		elog(WARNING, "did not find subXID %u in MyProc", xid);
-
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-#ifdef XIDCACHE_DEBUG
-
-/*
- * Print stats about effectiveness of XID cache
- */
-static void
-DisplayXidCache(void)
-{
-	fprintf(stderr,
-			"XidCache: xmin: %ld, known: %ld, myxact: %ld, latest: %ld, mainxid: %ld, childxid: %ld, knownassigned: %ld, nooflo: %ld, slow: %ld\n",
-			xc_by_recent_xmin,
-			xc_by_known_xact,
-			xc_by_my_xact,
-			xc_by_latest_xid,
-			xc_by_main_xid,
-			xc_by_child_xid,
-			xc_by_known_assigned,
-			xc_no_overflow,
-			xc_slow_answer);
-}
-#endif							/* XIDCACHE_DEBUG */
-
-
-/* ----------------------------------------------
- *		KnownAssignedTransactions sub-module
- * ----------------------------------------------
- */
-
-/*
- * In Hot Standby mode, we maintain a list of transactions that are (or were)
- * running in the master at the current point in WAL.  These XIDs must be
- * treated as running by standby transactions, even though they are not in
- * the standby server's PGXACT array.
- *
- * We record all XIDs that we know have been assigned.  That includes all the
- * XIDs seen in WAL records, plus all unobserved XIDs that we can deduce have
- * been assigned.  We can deduce the existence of unobserved XIDs because we
- * know XIDs are assigned in sequence, with no gaps.  The KnownAssignedXids
- * list expands as new XIDs are observed or inferred, and contracts when
- * transaction completion records arrive.
- *
- * During hot standby we do not fret too much about the distinction between
- * top-level XIDs and subtransaction XIDs. We store both together in the
- * KnownAssignedXids list.  In backends, this is copied into snapshots in
- * GetSnapshotData(), taking advantage of the fact that XidInMVCCSnapshot()
- * doesn't care about the distinction either.  Subtransaction XIDs are
- * effectively treated as top-level XIDs and in the typical case pg_subtrans
- * links are *not* maintained (which does not affect visibility).
- *
- * We have room in KnownAssignedXids and in snapshots to hold maxProcs *
- * (1 + PGPROC_MAX_CACHED_SUBXIDS) XIDs, so every master transaction must
- * report its subtransaction XIDs in a WAL XLOG_XACT_ASSIGNMENT record at
- * least every PGPROC_MAX_CACHED_SUBXIDS.  When we receive one of these
- * records, we mark the subXIDs as children of the top XID in pg_subtrans,
- * and then remove them from KnownAssignedXids.  This prevents overflow of
- * KnownAssignedXids and snapshots, at the cost that status checks for these
- * subXIDs will take a slower path through TransactionIdIsInProgress().
- * This means that KnownAssignedXids is not necessarily complete for subXIDs,
- * though it should be complete for top-level XIDs; this is the same situation
- * that holds with respect to the PGPROC entries in normal running.
- *
- * When we throw away subXIDs from KnownAssignedXids, we need to keep track of
- * that, similarly to tracking overflow of a PGPROC's subxids array.  We do
- * that by remembering the lastOverflowedXID, ie the last thrown-away subXID.
- * As long as that is within the range of interesting XIDs, we have to assume
- * that subXIDs are missing from snapshots.  (Note that subXID overflow occurs
- * on primary when 65th subXID arrives, whereas on standby it occurs when 64th
- * subXID arrives - that is not an error.)
- *
- * Should a backend on primary somehow disappear before it can write an abort
- * record, then we just leave those XIDs in KnownAssignedXids. They actually
- * aborted but we think they were running; the distinction is irrelevant
- * because either way any changes done by the transaction are not visible to
- * backends in the standby.  We prune KnownAssignedXids when
- * XLOG_RUNNING_XACTS arrives, to forestall possible overflow of the
- * array due to such dead XIDs.
- */
-
 /*
  * RecordKnownAssignedTransactionIds
- *		Record the given XID in KnownAssignedXids, as well as any preceding
+ *		Record the given XID in KnownAssignedXids (FIXME: update comment, KnownAssignedXid is no more), as well as any preceding
  *		unobserved XIDs.
  *
  * RecordKnownAssignedTransactionIds() should be run for *every* WAL record
@@ -3189,7 +2187,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		TransactionId next_expected_xid;
 
 		/*
-		 * Extend subtrans like we do in GetNewTransactionId() during normal
+		 * Extend csnlog like we do in GetNewTransactionId() during normal
 		 * operation using individual extend steps. Note that we do not need
 		 * to extend clog since its extensions are WAL logged.
 		 *
@@ -3201,28 +2199,11 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		while (TransactionIdPrecedes(next_expected_xid, xid))
 		{
 			TransactionIdAdvance(next_expected_xid);
-			ExtendSUBTRANS(next_expected_xid);
+			ExtendCSNLOG(next_expected_xid);
 		}
 		Assert(next_expected_xid == xid);
 
 		/*
-		 * If the KnownAssignedXids machinery isn't up yet, there's nothing
-		 * more to do since we don't track assigned xids yet.
-		 */
-		if (standbyState <= STANDBY_INITIALIZED)
-		{
-			latestObservedXid = xid;
-			return;
-		}
-
-		/*
-		 * Add (latestObservedXid, xid] onto the KnownAssignedXids array.
-		 */
-		next_expected_xid = latestObservedXid;
-		TransactionIdAdvance(next_expected_xid);
-		KnownAssignedXidsAdd(next_expected_xid, xid, false);
-
-		/*
 		 * Now we can advance latestObservedXid
 		 */
 		latestObservedXid = xid;
@@ -3235,726 +2216,3 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		LWLockRelease(XidGenLock);
 	}
 }
-
-/*
- * ExpireTreeKnownAssignedTransactionIds
- *		Remove the given XIDs from KnownAssignedXids.
- *
- * Called during recovery in analogy with and in place of ProcArrayEndTransaction()
- */
-void
-ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
-									  TransactionId *subxids, TransactionId max_xid)
-{
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
-
-	/* As in ProcArrayEndTransaction, advance latestCompletedXid */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  max_xid))
-		ShmemVariableCache->latestCompletedXid = max_xid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireAllKnownAssignedTransactionIds
- *		Remove all entries in KnownAssignedXids
- */
-void
-ExpireAllKnownAssignedTransactionIds(void)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireOldKnownAssignedTransactionIds
- *		Remove KnownAssignedXids entries preceding the given XID
- */
-void
-ExpireOldKnownAssignedTransactionIds(TransactionId xid)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
-}
-
-
-/*
- * Private module functions to manipulate KnownAssignedXids
- *
- * There are 5 main uses of the KnownAssignedXids data structure:
- *
- *	* backends taking snapshots - all valid XIDs need to be copied out
- *	* backends seeking to determine presence of a specific XID
- *	* startup process adding new known-assigned XIDs
- *	* startup process removing specific XIDs as transactions end
- *	* startup process pruning array when special WAL records arrive
- *
- * This data structure is known to be a hot spot during Hot Standby, so we
- * go to some lengths to make these operations as efficient and as concurrent
- * as possible.
- *
- * The XIDs are stored in an array in sorted order --- TransactionIdPrecedes
- * order, to be exact --- to allow binary search for specific XIDs.  Note:
- * in general TransactionIdPrecedes would not provide a total order, but
- * we know that the entries present at any instant should not extend across
- * a large enough fraction of XID space to wrap around (the master would
- * shut down for fear of XID wrap long before that happens).  So it's OK to
- * use TransactionIdPrecedes as a binary-search comparator.
- *
- * It's cheap to maintain the sortedness during insertions, since new known
- * XIDs are always reported in XID order; we just append them at the right.
- *
- * To keep individual deletions cheap, we need to allow gaps in the array.
- * This is implemented by marking array elements as valid or invalid using
- * the parallel boolean array KnownAssignedXidsValid[].  A deletion is done
- * by setting KnownAssignedXidsValid[i] to false, *without* clearing the
- * XID entry itself.  This preserves the property that the XID entries are
- * sorted, so we can do binary searches easily.  Periodically we compress
- * out the unused entries; that's much cheaper than having to compress the
- * array immediately on every deletion.
- *
- * The actually valid items in KnownAssignedXids[] and KnownAssignedXidsValid[]
- * are those with indexes tail <= i < head; items outside this subscript range
- * have unspecified contents.  When head reaches the end of the array, we
- * force compression of unused entries rather than wrapping around, since
- * allowing wraparound would greatly complicate the search logic.  We maintain
- * an explicit tail pointer so that pruning of old XIDs can be done without
- * immediately moving the array contents.  In most cases only a small fraction
- * of the array contains valid entries at any instant.
- *
- * Although only the startup process can ever change the KnownAssignedXids
- * data structure, we still need interlocking so that standby backends will
- * not observe invalid intermediate states.  The convention is that backends
- * must hold shared ProcArrayLock to examine the array.  To remove XIDs from
- * the array, the startup process must hold ProcArrayLock exclusively, for
- * the usual transactional reasons (compare commit/abort of a transaction
- * during normal running).  Compressing unused entries out of the array
- * likewise requires exclusive lock.  To add XIDs to the array, we just insert
- * them into slots to the right of the head pointer and then advance the head
- * pointer.  This wouldn't require any lock at all, except that on machines
- * with weak memory ordering we need to be careful that other processors
- * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
- *
- * Algorithmic analysis:
- *
- * If we have a maximum of M slots, with N XIDs currently spread across
- * S elements then we have N <= S <= M always.
- *
- *	* Adding a new XID is O(1) and needs little locking (unless compression
- *		must happen)
- *	* Compressing the array is O(S) and requires exclusive lock
- *	* Removing an XID is O(logS) and requires exclusive lock
- *	* Taking a snapshot is O(S) and requires shared lock
- *	* Checking for an XID is O(logS) and requires shared lock
- *
- * In comparison, using a hash table for KnownAssignedXids would mean that
- * taking snapshots would be O(M). If we can maintain S << M then the
- * sorted array technique will deliver significantly faster snapshots.
- * If we try to keep S too small then we will spend too much time compressing,
- * so there is an optimal point for any workload mix. We use a heuristic to
- * decide when to compress the array, though trimming also helps reduce
- * frequency of compressing. The heuristic requires us to track the number of
- * currently valid XIDs in the array.
- */
-
-
-/*
- * Compress KnownAssignedXids by shifting valid data down to the start of the
- * array, removing any gaps.
- *
- * A compression step is forced if "force" is true, otherwise we do it
- * only if a heuristic indicates it's a good time to do it.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsCompress(bool force)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			head,
-				tail;
-	int			compress_index;
-	int			i;
-
-	/* no spinlock required since we hold ProcArrayLock exclusively */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	if (!force)
-	{
-		/*
-		 * If we can choose how much to compress, use a heuristic to avoid
-		 * compressing too often or not often enough.
-		 *
-		 * Heuristic is if we have a large enough current spread and less than
-		 * 50% of the elements are currently in use, then compress. This
-		 * should ensure we compress fairly infrequently. We could compress
-		 * less often though the virtual array would spread out more and
-		 * snapshots would become more expensive.
-		 */
-		int			nelements = head - tail;
-
-		if (nelements < 4 * PROCARRAY_MAXPROCS ||
-			nelements < 2 * pArray->numKnownAssignedXids)
-			return;
-	}
-
-	/*
-	 * We compress the array by reading the valid values from tail to head,
-	 * re-aligning data to 0th element.
-	 */
-	compress_index = 0;
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			KnownAssignedXids[compress_index] = KnownAssignedXids[i];
-			KnownAssignedXidsValid[compress_index] = true;
-			compress_index++;
-		}
-	}
-
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = compress_index;
-}
-
-/*
- * Add xids into KnownAssignedXids at the head of the array.
- *
- * xids from from_xid to to_xid, inclusive, are added to the array.
- *
- * If exclusive_lock is true then caller already holds ProcArrayLock in
- * exclusive mode, so we need no extra locking here.  Else caller holds no
- * lock, so we need to be sure we maintain sufficient interlocks against
- * concurrent readers.  (Only the startup process ever calls this, so no need
- * to worry about concurrent writers.)
- */
-static void
-KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	TransactionId next_xid;
-	int			head,
-				tail;
-	int			nxids;
-	int			i;
-
-	Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
-
-	/*
-	 * Calculate how many array slots we'll need.  Normally this is cheap; in
-	 * the unusual case where the XIDs cross the wrap point, we do it the hard
-	 * way.
-	 */
-	if (to_xid >= from_xid)
-		nxids = to_xid - from_xid + 1;
-	else
-	{
-		nxids = 1;
-		next_xid = from_xid;
-		while (TransactionIdPrecedes(next_xid, to_xid))
-		{
-			nxids++;
-			TransactionIdAdvance(next_xid);
-		}
-	}
-
-	/*
-	 * Since only the startup process modifies the head/tail pointers, we
-	 * don't need a lock to read them here.
-	 */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	Assert(head >= 0 && head <= pArray->maxKnownAssignedXids);
-	Assert(tail >= 0 && tail < pArray->maxKnownAssignedXids);
-
-	/*
-	 * Verify that insertions occur in TransactionId sequence.  Note that even
-	 * if the last existing element is marked invalid, it must still have a
-	 * correctly sequenced XID value.
-	 */
-	if (head > tail &&
-		TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
-	{
-		KnownAssignedXidsDisplay(LOG);
-		elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
-	}
-
-	/*
-	 * If our xids won't fit in the remaining space, compress out free space
-	 */
-	if (head + nxids > pArray->maxKnownAssignedXids)
-	{
-		/* must hold lock to compress */
-		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		KnownAssignedXidsCompress(true);
-
-		head = pArray->headKnownAssignedXids;
-		/* note: we no longer care about the tail pointer */
-
-		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
-
-		/*
-		 * If it still won't fit then we're out of memory
-		 */
-		if (head + nxids > pArray->maxKnownAssignedXids)
-			elog(ERROR, "too many KnownAssignedXids");
-	}
-
-	/* Now we can insert the xids into the space starting at head */
-	next_xid = from_xid;
-	for (i = 0; i < nxids; i++)
-	{
-		KnownAssignedXids[head] = next_xid;
-		KnownAssignedXidsValid[head] = true;
-		TransactionIdAdvance(next_xid);
-		head++;
-	}
-
-	/* Adjust count of number of valid entries */
-	pArray->numKnownAssignedXids += nxids;
-
-	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
-	 * pointer, not because the update is likely to be non-atomic, but to
-	 * ensure that other processors see the above array updates before they
-	 * see the head pointer change.
-	 *
-	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
-	 */
-	if (exclusive_lock)
-		pArray->headKnownAssignedXids = head;
-	else
-	{
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-}
-
-/*
- * KnownAssignedXidsSearch
- *
- * Searches KnownAssignedXids for a specific xid and optionally removes it.
- * Returns true if it was found, false if not.
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- * Exclusive lock must be held for remove = true.
- */
-static bool
-KnownAssignedXidsSearch(TransactionId xid, bool remove)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			first,
-				last;
-	int			head;
-	int			tail;
-	int			result_index = -1;
-
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for spinlock */
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-	}
-	else
-	{
-		/* take spinlock to ensure we see up-to-date array contents */
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-
-	/*
-	 * Standard binary search.  Note we can ignore the KnownAssignedXidsValid
-	 * array here, since even invalid entries will contain sorted XIDs.
-	 */
-	first = tail;
-	last = head - 1;
-	while (first <= last)
-	{
-		int			mid_index;
-		TransactionId mid_xid;
-
-		mid_index = (first + last) / 2;
-		mid_xid = KnownAssignedXids[mid_index];
-
-		if (xid == mid_xid)
-		{
-			result_index = mid_index;
-			break;
-		}
-		else if (TransactionIdPrecedes(xid, mid_xid))
-			last = mid_index - 1;
-		else
-			first = mid_index + 1;
-	}
-
-	if (result_index < 0)
-		return false;			/* not in array */
-
-	if (!KnownAssignedXidsValid[result_index])
-		return false;			/* in array, but invalid */
-
-	if (remove)
-	{
-		KnownAssignedXidsValid[result_index] = false;
-
-		pArray->numKnownAssignedXids--;
-		Assert(pArray->numKnownAssignedXids >= 0);
-
-		/*
-		 * If we're removing the tail element then advance tail pointer over
-		 * any invalid elements.  This will speed future searches.
-		 */
-		if (result_index == tail)
-		{
-			tail++;
-			while (tail < head && !KnownAssignedXidsValid[tail])
-				tail++;
-			if (tail >= head)
-			{
-				/* Array is empty, so we can reset both pointers */
-				pArray->headKnownAssignedXids = 0;
-				pArray->tailKnownAssignedXids = 0;
-			}
-			else
-			{
-				pArray->tailKnownAssignedXids = tail;
-			}
-		}
-	}
-
-	return true;
-}
-
-/*
- * Is the specified XID present in KnownAssignedXids[]?
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- */
-static bool
-KnownAssignedXidExists(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	return KnownAssignedXidsSearch(xid, false);
-}
-
-/*
- * Remove the specified XID from KnownAssignedXids[].
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemove(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	elog(trace_recovery(DEBUG4), "remove KnownAssignedXid %u", xid);
-
-	/*
-	 * Note: we cannot consider it an error to remove an XID that's not
-	 * present.  We intentionally remove subxact IDs while processing
-	 * XLOG_XACT_ASSIGNMENT, to avoid array overflow.  Then those XIDs will be
-	 * removed again when the top-level xact commits or aborts.
-	 *
-	 * It might be possible to track such XIDs to distinguish this case from
-	 * actual errors, but it would be complicated and probably not worth it.
-	 * So, just ignore the search result.
-	 */
-	(void) KnownAssignedXidsSearch(xid, true);
-}
-
-/*
- * KnownAssignedXidsRemoveTree
- *		Remove xid (if it's not InvalidTransactionId) and all the subxids.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids)
-{
-	int			i;
-
-	if (TransactionIdIsValid(xid))
-		KnownAssignedXidsRemove(xid);
-
-	for (i = 0; i < nsubxids; i++)
-		KnownAssignedXidsRemove(subxids[i]);
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * Prune KnownAssignedXids up to, but *not* including xid. If xid is invalid
- * then clear the whole table.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemovePreceding(TransactionId removeXid)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			count = 0;
-	int			head,
-				tail,
-				i;
-
-	if (!TransactionIdIsValid(removeXid))
-	{
-		elog(trace_recovery(DEBUG4), "removing all KnownAssignedXids");
-		pArray->numKnownAssignedXids = 0;
-		pArray->headKnownAssignedXids = pArray->tailKnownAssignedXids = 0;
-		return;
-	}
-
-	elog(trace_recovery(DEBUG4), "prune KnownAssignedXids to %u", removeXid);
-
-	/*
-	 * Mark entries invalid starting at the tail.  Since array is sorted, we
-	 * can stop as soon as we reach an entry >= removeXid.
-	 */
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			if (TransactionIdFollowsOrEquals(knownXid, removeXid))
-				break;
-
-			if (!StandbyTransactionIdIsPrepared(knownXid))
-			{
-				KnownAssignedXidsValid[i] = false;
-				count++;
-			}
-		}
-	}
-
-	pArray->numKnownAssignedXids -= count;
-	Assert(pArray->numKnownAssignedXids >= 0);
-
-	/*
-	 * Advance the tail pointer if we've marked the tail item invalid.
-	 */
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-			break;
-	}
-	if (i >= head)
-	{
-		/* Array is empty, so we can reset both pointers */
-		pArray->headKnownAssignedXids = 0;
-		pArray->tailKnownAssignedXids = 0;
-	}
-	else
-	{
-		pArray->tailKnownAssignedXids = i;
-	}
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * KnownAssignedXidsGet - Get an array of xids by scanning KnownAssignedXids.
- * We filter out anything >= xmax.
- *
- * Returns the number of XIDs stored into xarray[].  Caller is responsible
- * that array is large enough.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax)
-{
-	TransactionId xtmp = InvalidTransactionId;
-
-	return KnownAssignedXidsGetAndSetXmin(xarray, &xtmp, xmax);
-}
-
-/*
- * KnownAssignedXidsGetAndSetXmin - as KnownAssignedXidsGet, plus
- * we reduce *xmin to the lowest xid value seen if not already lower.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
-							   TransactionId xmax)
-{
-	int			count = 0;
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop. We can stop
-	 * once we reach the initially seen head, since we are certain that an xid
-	 * cannot enter and then leave the array while we hold ProcArrayLock.  We
-	 * might miss newly-added xids, but they should be >= xmax so irrelevant
-	 * anyway.
-	 *
-	 * Must take spinlock to ensure we see up-to-date array contents.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			/*
-			 * Update xmin if required.  Only the first XID need be checked,
-			 * since the array is sorted.
-			 */
-			if (count == 0 &&
-				TransactionIdPrecedes(knownXid, *xmin))
-				*xmin = knownXid;
-
-			/*
-			 * Filter out anything >= xmax, again relying on sorted property
-			 * of array.
-			 */
-			if (TransactionIdIsValid(xmax) &&
-				TransactionIdFollowsOrEquals(knownXid, xmax))
-				break;
-
-			/* Add knownXid into output array */
-			xarray[count++] = knownXid;
-		}
-	}
-
-	return count;
-}
-
-/*
- * Get oldest XID in the KnownAssignedXids array, or InvalidTransactionId
- * if nothing there.
- */
-static TransactionId
-KnownAssignedXidsGetOldestXmin(void)
-{
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-			return KnownAssignedXids[i];
-	}
-
-	return InvalidTransactionId;
-}
-
-/*
- * Display KnownAssignedXids to provide debug trail
- *
- * Currently this is only called within startup process, so we need no
- * special locking.
- *
- * Note this is pretty expensive, and much of the expense will be incurred
- * even if the elog message will get discarded.  It's not currently called
- * in any performance-critical places, however, so no need to be tenser.
- */
-static void
-KnownAssignedXidsDisplay(int trace_level)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	StringInfoData buf;
-	int			head,
-				tail,
-				i;
-	int			nxids = 0;
-
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	initStringInfo(&buf);
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			nxids++;
-			appendStringInfo(&buf, "[%d]=%u ", i, KnownAssignedXids[i]);
-		}
-	}
-
-	elog(trace_level, "%d KnownAssignedXids (num=%d tail=%d head=%d) %s",
-		 nxids,
-		 pArray->numKnownAssignedXids,
-		 pArray->tailKnownAssignedXids,
-		 pArray->headKnownAssignedXids,
-		 buf.data);
-
-	pfree(buf.data);
-}
-
-/*
- * KnownAssignedXidsReset
- *		Resets KnownAssignedXids to be empty
- */
-static void
-KnownAssignedXidsReset(void)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	pArray->numKnownAssignedXids = 0;
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = 0;
-
-	LWLockRelease(ProcArrayLock);
-}
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 81c291f6e3..e0cf10d09a 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -65,7 +65,7 @@
 
 #include "postgres.h"
 
-#include "access/transam.h"
+#include "access/mvccvars.h"
 #include "miscadmin.h"
 #include "storage/lwlock.h"
 #include "storage/pg_shmem.h"
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index d491ece60a..0ee15efaff 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -101,9 +101,6 @@ InitRecoveryTransactionEnvironment(void)
 void
 ShutdownRecoveryTransactionEnvironment(void)
 {
-	/* Mark all tracked in-progress transactions as finished. */
-	ExpireAllKnownAssignedTransactionIds();
-
 	/* Release all locks the tracked transactions were holding */
 	StandbyReleaseAllLocks();
 
@@ -309,7 +306,7 @@ ResolveRecoveryConflictWithTablespace(Oid tsid)
 	 *
 	 * We don't wait for commit because drop tablespace is non-transactional.
 	 */
-	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
+	temp_file_users = GetConflictingVirtualXIDs(InvalidCommitSeqNo,
 												InvalidOid);
 	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
@@ -606,8 +603,7 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
-		TransactionIdDidCommit(xid) ||
-		TransactionIdDidAbort(xid))
+		TransactionIdGetStatus(xid) != XID_INPROGRESS)
 		return;
 
 	elog(trace_recovery(DEBUG4),
@@ -722,7 +718,7 @@ StandbyReleaseAllLocks(void)
  *		as long as they're not prepared transactions.
  */
 void
-StandbyReleaseOldLocks(int nxids, TransactionId *xids)
+StandbyReleaseOldLocks(TransactionId oldestRunningXid)
 {
 	ListCell   *cell,
 			   *prev,
@@ -741,26 +737,8 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)
 
 		if (StandbyTransactionIdIsPrepared(lock->xid))
 			remove = false;
-		else
-		{
-			int			i;
-			bool		found = false;
-
-			for (i = 0; i < nxids; i++)
-			{
-				if (lock->xid == xids[i])
-				{
-					found = true;
-					break;
-				}
-			}
-
-			/*
-			 * If its not a running transaction, remove it.
-			 */
-			if (!found)
-				remove = true;
-		}
+		else if (TransactionIdPrecedes(lock->xid, oldestRunningXid))
+			remove = true;
 
 		if (remove)
 		{
@@ -815,13 +793,8 @@ standby_redo(XLogReaderState *record)
 		xl_running_xacts *xlrec = (xl_running_xacts *) XLogRecGetData(record);
 		RunningTransactionsData running;
 
-		running.xcnt = xlrec->xcnt;
-		running.subxcnt = xlrec->subxcnt;
-		running.subxid_overflow = xlrec->subxid_overflow;
 		running.nextXid = xlrec->nextXid;
-		running.latestCompletedXid = xlrec->latestCompletedXid;
 		running.oldestRunningXid = xlrec->oldestRunningXid;
-		running.xids = xlrec->xids;
 
 		ProcArrayApplyRecoveryInfo(&running);
 	}
@@ -929,27 +902,8 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 
-	/*
-	 * GetRunningTransactionData() acquired ProcArrayLock, we must release it.
-	 * For Hot Standby this can be done before inserting the WAL record
-	 * because ProcArrayApplyRecoveryInfo() rechecks the commit status using
-	 * the clog. For logical decoding, though, the lock can't be released
-	 * early because the clog might be "in the future" from the POV of the
-	 * historic snapshot. This would allow for situations where we're waiting
-	 * for the end of a transaction listed in the xl_running_xacts record
-	 * which, according to the WAL, has committed before the xl_running_xacts
-	 * record. Fortunately this routine isn't executed frequently, and it's
-	 * only a shared lock.
-	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	recptr = LogCurrentRunningXacts(running);
 
-	/* Release lock if we kept it longer ... */
-	if (wal_level >= WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
 
@@ -971,41 +925,21 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 	xl_running_xacts xlrec;
 	XLogRecPtr	recptr;
 
-	xlrec.xcnt = CurrRunningXacts->xcnt;
-	xlrec.subxcnt = CurrRunningXacts->subxcnt;
-	xlrec.subxid_overflow = CurrRunningXacts->subxid_overflow;
 	xlrec.nextXid = CurrRunningXacts->nextXid;
 	xlrec.oldestRunningXid = CurrRunningXacts->oldestRunningXid;
-	xlrec.latestCompletedXid = CurrRunningXacts->latestCompletedXid;
 
 	/* Header */
 	XLogBeginInsert();
 	XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
-	XLogRegisterData((char *) (&xlrec), MinSizeOfXactRunningXacts);
-
-	/* array of TransactionIds */
-	if (xlrec.xcnt > 0)
-		XLogRegisterData((char *) CurrRunningXacts->xids,
-						 (xlrec.xcnt + xlrec.subxcnt) * sizeof(TransactionId));
+	XLogRegisterData((char *) (&xlrec), SizeOfXactRunningXacts);
 
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
-	if (CurrRunningXacts->subxid_overflow)
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
-	else
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u+%u running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
+	elog(trace_recovery(DEBUG2),
+		 "snapshot of running transaction ids (lsn %X/%X oldest xid %u next xid %u)",
+		 (uint32) (recptr >> 32), (uint32) recptr,
+		 CurrRunningXacts->oldestRunningXid,
+		 CurrRunningXacts->nextXid);
 
 	/*
 	 * Ensure running_xacts information is synced to disk not too far in the
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index fe9889894b..fd7b455423 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -579,6 +579,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 	for (;;)
 	{
+		TransactionId parentXid;
+
 		Assert(TransactionIdIsValid(xid));
 		Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
 
@@ -588,9 +590,23 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		/*
+		 * Ok, this xid is not running anymore. But it might be a
+		 * subtransaction whose parent is still running.
+		 */
+		CommitSeqNo csn = TransactionIdGetCommitSeqNo(xid);
+		if (COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn))
+			break;
+
+		parentXid = SubTransGetParent(xid);
+		if (parentXid == InvalidTransactionId)
+		{
+			csn = TransactionIdGetCommitSeqNo(xid);
+			Assert(COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn));
 			break;
-		xid = SubTransGetParent(xid);
+		}
+
+		xid = parentXid;
 	}
 
 	if (oper != XLTW_None)
@@ -607,6 +623,7 @@ bool
 ConditionalXactLockTableWait(TransactionId xid)
 {
 	LOCKTAG		tag;
+	TransactionId parentXid;
 
 	for (;;)
 	{
@@ -620,9 +637,23 @@ ConditionalXactLockTableWait(TransactionId xid)
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		/*
+		 * Ok, this xid is not running anymore. But it might be a
+		 * subtransaction whose parent is still running.
+		 */
+		CommitSeqNo csn = TransactionIdGetCommitSeqNo(xid);
+		if (COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn))
 			break;
-		xid = SubTransGetParent(xid);
+
+		parentXid = SubTransGetParent(xid);
+		if (parentXid == InvalidTransactionId)
+		{
+			csn = TransactionIdGetCommitSeqNo(xid);
+			Assert(COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn));
+			break;
+		}
+
+		xid = parentXid;
 	}
 
 	return true;
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 82a1cf5150..1d4de19383 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -494,7 +494,7 @@ RegisterLWLockTranches(void)
 
 	if (LWLockTrancheArray == NULL)
 	{
-		LWLockTranchesAllocated = 64;
+		LWLockTranchesAllocated = 128;
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 								   LWLockTranchesAllocated * sizeof(char *));
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ecedb..75af22ec8a 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,7 +16,7 @@ WALWriteLock						8
 ControlFileLock						9
 CheckpointLock						10
 CLogControlLock						11
-SubtransControlLock					12
+CSNLogControlLock					12
 MultiXactGenLock					13
 MultiXactOffsetControlLock			14
 MultiXactMemberControlLock			15
@@ -47,6 +47,8 @@ CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
-BackendRandomLock					43
-LogicalRepWorkerLock				44
-CLogTruncationLock					45
+CommitSeqNoLock						43
+BackendRandomLock				44
+
+LogicalRepWorkerLock				45
+CLogTruncationLock				46
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 251a359bff..966fd36156 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -185,7 +185,9 @@
 
 #include "postgres.h"
 
+#include "access/clog.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -3902,7 +3904,7 @@ static bool
 XidIsConcurrent(TransactionId xid)
 {
 	Snapshot	snap;
-	uint32		i;
+	XLogRecPtr	csn;
 
 	Assert(TransactionIdIsValid(xid));
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -3915,11 +3917,11 @@ XidIsConcurrent(TransactionId xid)
 	if (TransactionIdFollowsOrEquals(xid, snap->xmax))
 		return true;
 
-	for (i = 0; i < snap->xcnt; i++)
-	{
-		if (xid == snap->xip[i])
-			return true;
-	}
+	csn = TransactionIdGetCommitSeqNo(xid);
+	if (COMMITSEQNO_IS_INPROGRESS(csn))
+		return true;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return csn >= snap->snapshotcsn;
 
 	return false;
 }
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5f6727d501..121cd93013 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -365,7 +365,7 @@ InitProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->pid = MyProcPid;
 	/* backendId, databaseId and roleId will be filled in later */
 	MyProc->backendId = InvalidBackendId;
@@ -412,9 +412,10 @@ InitProcess(void)
 	/* Initialize fields for group transaction status update. */
 	MyProc->clogGroupMember = false;
 	MyProc->clogGroupMemberXid = InvalidTransactionId;
-	MyProc->clogGroupMemberXidStatus = TRANSACTION_STATUS_IN_PROGRESS;
+	MyProc->clogGroupMemberXidStatus = CLOG_XID_STATUS_IN_PROGRESS;
 	MyProc->clogGroupMemberPage = -1;
 	MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
+	MyProc->clogGroupNSubxids = 0;
 	pg_atomic_init_u32(&MyProc->clogGroupNext, INVALID_PGPROCNO);
 
 	/*
@@ -548,7 +549,7 @@ InitAuxiliaryProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->backendId = InvalidBackendId;
 	MyProc->databaseId = InvalidOid;
 	MyProc->roleId = InvalidOid;
@@ -779,7 +780,7 @@ static void
 RemoveProcFromArray(int code, Datum arg)
 {
 	Assert(MyProc != NULL);
-	ProcArrayRemove(MyProc, InvalidTransactionId);
+	ProcArrayRemove(MyProc);
 }
 
 /*
diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c
index 973397cc85..93390c19c4 100644
--- a/src/backend/utils/adt/enum.c
+++ b/src/backend/utils/adt/enum.c
@@ -72,8 +72,7 @@ check_safe_enum_use(HeapTuple enumval_tup)
 	 * into syscache; but just in case not, let's check the xmin directly.
 	 */
 	xmin = HeapTupleHeaderGetXmin(enumval_tup->t_data);
-	if (!TransactionIdIsInProgress(xmin) &&
-		TransactionIdDidCommit(xmin))
+	if (TransactionIdDidCommit(xmin))
 		return;
 
 	/* It is a new enum value, so check to see if the whole enum is new */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 81b0bc37d2..969552038c 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -130,6 +130,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
 #include "statistics/statistics.h"
+#include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
@@ -5352,7 +5353,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel));
 			econtext->ecxt_scantuple = slot;
 			get_typlenbyval(vardata->atttype, &typLen, &typByVal);
-			InitNonVacuumableSnapshot(SnapshotNonVacuumable, RecentGlobalXmin);
+			InitNonVacuumableSnapshot(SnapshotNonVacuumable, GetRecentGlobalXmin());
 
 			/* set up an IS NOT NULL scan key so that we ignore nulls */
 			ScanKeyEntryInitialize(&scankeys[0],
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index 1e38ca2aa5..16a3663f1e 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -22,6 +22,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -53,6 +54,8 @@ typedef uint64 txid;
 
 /*
  * Snapshot containing 8byte txids.
+ *
+ * FIXME: this could be a fixed-length datatype now.
  */
 typedef struct
 {
@@ -63,17 +66,16 @@ typedef struct
 	 */
 	int32		__varsz;
 
-	uint32		nxip;			/* number of txids in xip array */
-	txid		xmin;
 	txid		xmax;
-	/* in-progress txids, xmin <= xip[i] < xmax: */
-	txid		xip[FLEXIBLE_ARRAY_MEMBER];
+	/*
+	 * FIXME: this is change in on-disk format if someone created a column
+	 * with txid datatype. Dump+reload won't load either.
+	 */
+	CommitSeqNo	snapshotcsn;
 } TxidSnapshot;
 
-#define TXID_SNAPSHOT_SIZE(nxip) \
-	(offsetof(TxidSnapshot, xip) + sizeof(txid) * (nxip))
-#define TXID_SNAPSHOT_MAX_NXIP \
-	((MaxAllocSize - offsetof(TxidSnapshot, xip)) / sizeof(txid))
+#define TXID_SNAPSHOT_SIZE \
+	(offsetof(TxidSnapshot, snapshotcsn) + sizeof(CommitSeqNo))
 
 /*
  * Epoch values from xact.c
@@ -183,60 +185,12 @@ convert_xid(TransactionId xid, const TxidEpoch *state)
 }
 
 /*
- * txid comparator for qsort/bsearch
- */
-static int
-cmp_txid(const void *aa, const void *bb)
-{
-	txid		a = *(const txid *) aa;
-	txid		b = *(const txid *) bb;
-
-	if (a < b)
-		return -1;
-	if (a > b)
-		return 1;
-	return 0;
-}
-
-/*
- * Sort a snapshot's txids, so we can use bsearch() later.  Also remove
- * any duplicates.
- *
- * For consistency of on-disk representation, we always sort even if bsearch
- * will not be used.
- */
-static void
-sort_snapshot(TxidSnapshot *snap)
-{
-	txid		last = 0;
-	int			nxip,
-				idx1,
-				idx2;
-
-	if (snap->nxip > 1)
-	{
-		qsort(snap->xip, snap->nxip, sizeof(txid), cmp_txid);
-
-		/* remove duplicates */
-		nxip = snap->nxip;
-		idx1 = idx2 = 0;
-		while (idx1 < nxip)
-		{
-			if (snap->xip[idx1] != last)
-				last = snap->xip[idx2++] = snap->xip[idx1];
-			else
-				snap->nxip--;
-			idx1++;
-		}
-	}
-}
-
-/*
  * check txid visibility.
  */
 static bool
 is_visible_txid(txid value, const TxidSnapshot *snap)
 {
+#ifdef BROKEN
 	if (value < snap->xmin)
 		return true;
 	else if (value >= snap->xmax)
@@ -262,50 +216,8 @@ is_visible_txid(txid value, const TxidSnapshot *snap)
 		}
 		return true;
 	}
-}
-
-/*
- * helper functions to use StringInfo for TxidSnapshot creation.
- */
-
-static StringInfo
-buf_init(txid xmin, txid xmax)
-{
-	TxidSnapshot snap;
-	StringInfo	buf;
-
-	snap.xmin = xmin;
-	snap.xmax = xmax;
-	snap.nxip = 0;
-
-	buf = makeStringInfo();
-	appendBinaryStringInfo(buf, (char *) &snap, TXID_SNAPSHOT_SIZE(0));
-	return buf;
-}
-
-static void
-buf_add_txid(StringInfo buf, txid xid)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	/* do this before possible realloc */
-	snap->nxip++;
-
-	appendBinaryStringInfo(buf, (char *) &xid, sizeof(xid));
-}
-
-static TxidSnapshot *
-buf_finalize(StringInfo buf)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	SET_VARSIZE(snap, buf->len);
-
-	/* buf is not needed anymore */
-	buf->data = NULL;
-	pfree(buf);
-
-	return snap;
+#endif
+	return false;
 }
 
 /*
@@ -350,54 +262,29 @@ str2txid(const char *s, const char **endp)
 static TxidSnapshot *
 parse_snapshot(const char *str)
 {
-	txid		xmin;
-	txid		xmax;
-	txid		last_val = 0,
-				val;
 	const char *str_start = str;
 	const char *endp;
-	StringInfo	buf;
+	TxidSnapshot *snap;
+	uint32		csn_hi,
+				csn_lo;
 
-	xmin = str2txid(str, &endp);
-	if (*endp != ':')
-		goto bad_format;
-	str = endp + 1;
+	snap = palloc0(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
-	xmax = str2txid(str, &endp);
+	snap->xmax = str2txid(str, &endp);
 	if (*endp != ':')
 		goto bad_format;
 	str = endp + 1;
 
 	/* it should look sane */
-	if (xmin == 0 || xmax == 0 || xmin > xmax)
+	if (snap->xmax == 0)
 		goto bad_format;
 
-	/* allocate buffer */
-	buf = buf_init(xmin, xmax);
-
-	/* loop over values */
-	while (*str != '\0')
-	{
-		/* read next value */
-		val = str2txid(str, &endp);
-		str = endp;
-
-		/* require the input to be in order */
-		if (val < xmin || val >= xmax || val < last_val)
-			goto bad_format;
-
-		/* skip duplicates */
-		if (val != last_val)
-			buf_add_txid(buf, val);
-		last_val = val;
-
-		if (*str == ',')
-			str++;
-		else if (*str != '\0')
-			goto bad_format;
-	}
+	if (sscanf(str, "%X/%X", &csn_hi, &csn_lo) != 2)
+		goto bad_format;
+	snap->snapshotcsn = ((uint64) csn_hi) << 32 | csn_lo;
 
-	return buf_finalize(buf);
+	return snap;
 
 bad_format:
 	ereport(ERROR,
@@ -477,8 +364,6 @@ Datum
 txid_current_snapshot(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap;
-	uint32		nxip,
-				i;
 	TxidEpoch	state;
 	Snapshot	cur;
 
@@ -488,35 +373,13 @@ txid_current_snapshot(PG_FUNCTION_ARGS)
 
 	load_xid_epoch(&state);
 
-	/*
-	 * Compile-time limits on the procarray (MAX_BACKENDS processes plus
-	 * MAX_BACKENDS prepared transactions) guarantee nxip won't be too large.
-	 */
-	StaticAssertStmt(MAX_BACKENDS * 2 <= TXID_SNAPSHOT_MAX_NXIP,
-					 "possible overflow in txid_current_snapshot()");
-
 	/* allocate */
-	nxip = cur->xcnt;
-	snap = palloc(TXID_SNAPSHOT_SIZE(nxip));
+	snap = palloc(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
 	/* fill */
-	snap->xmin = convert_xid(cur->xmin, &state);
 	snap->xmax = convert_xid(cur->xmax, &state);
-	snap->nxip = nxip;
-	for (i = 0; i < nxip; i++)
-		snap->xip[i] = convert_xid(cur->xip[i], &state);
-
-	/*
-	 * We want them guaranteed to be in ascending order.  This also removes
-	 * any duplicate xids.  Normally, an XID can only be assigned to one
-	 * backend, but when preparing a transaction for two-phase commit, there
-	 * is a transient state when both the original backend and the dummy
-	 * PGPROC entry reserved for the prepared transaction hold the same XID.
-	 */
-	sort_snapshot(snap);
-
-	/* set size after sorting, because it may have removed duplicate xips */
-	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE(snap->nxip));
+	snap->snapshotcsn = cur->snapshotcsn;
 
 	PG_RETURN_POINTER(snap);
 }
@@ -547,19 +410,12 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData str;
-	uint32		i;
 
 	initStringInfo(&str);
 
-	appendStringInfo(&str, TXID_FMT ":", snap->xmin);
 	appendStringInfo(&str, TXID_FMT ":", snap->xmax);
-
-	for (i = 0; i < snap->nxip; i++)
-	{
-		if (i > 0)
-			appendStringInfoChar(&str, ',');
-		appendStringInfo(&str, TXID_FMT, snap->xip[i]);
-	}
+	appendStringInfo(&str, "%X/%X", (uint32) (snap->snapshotcsn >> 32),
+					 (uint32) snap->snapshotcsn);
 
 	PG_RETURN_CSTRING(str.data);
 }
@@ -574,6 +430,7 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 Datum
 txid_snapshot_recv(PG_FUNCTION_ARGS)
 {
+#ifdef BROKEN
 	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
 	TxidSnapshot *snap;
 	txid		last = 0;
@@ -582,11 +439,6 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	txid		xmin,
 				xmax;
 
-	/* load and validate nxip */
-	nxip = pq_getmsgint(buf, 4);
-	if (nxip < 0 || nxip > TXID_SNAPSHOT_MAX_NXIP)
-		goto bad_format;
-
 	xmin = pq_getmsgint64(buf);
 	xmax = pq_getmsgint64(buf);
 	if (xmin == 0 || xmax == 0 || xmin > xmax || xmax > MAX_TXID)
@@ -619,6 +471,7 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(snap);
 
 bad_format:
+#endif
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 			 errmsg("invalid external txid_snapshot data")));
@@ -637,14 +490,13 @@ txid_snapshot_send(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData buf;
-	uint32		i;
 
 	pq_begintypsend(&buf);
-	pq_sendint(&buf, snap->nxip, 4);
+#ifdef BROKEN
 	pq_sendint64(&buf, snap->xmin);
 	pq_sendint64(&buf, snap->xmax);
-	for (i = 0; i < snap->nxip; i++)
-		pq_sendint64(&buf, snap->xip[i]);
+#endif
+	pq_sendint64(&buf, snap->snapshotcsn);
 	PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
 }
 
@@ -665,14 +517,18 @@ txid_visible_in_snapshot(PG_FUNCTION_ARGS)
 /*
  * txid_snapshot_xmin(txid_snapshot) returns int8
  *
- *		return snapshot's xmin
+ *             return snapshot's xmin
  */
 Datum
 txid_snapshot_xmin(PG_FUNCTION_ARGS)
 {
+	/* FIXME: we don't store xmin in the TxidSnapshot anymore. Maybe we still should? */
+#ifdef BROKEN
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 
 	PG_RETURN_INT64(snap->xmin);
+#endif
+	PG_RETURN_INT64(0);
 }
 
 /*
@@ -687,47 +543,6 @@ txid_snapshot_xmax(PG_FUNCTION_ARGS)
 
 	PG_RETURN_INT64(snap->xmax);
 }
-
-/*
- * txid_snapshot_xip(txid_snapshot) returns setof int8
- *
- *		return in-progress TXIDs in snapshot.
- */
-Datum
-txid_snapshot_xip(PG_FUNCTION_ARGS)
-{
-	FuncCallContext *fctx;
-	TxidSnapshot *snap;
-	txid		value;
-
-	/* on first call initialize snap_state and get copy of snapshot */
-	if (SRF_IS_FIRSTCALL())
-	{
-		TxidSnapshot *arg = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
-
-		fctx = SRF_FIRSTCALL_INIT();
-
-		/* make a copy of user snapshot */
-		snap = MemoryContextAlloc(fctx->multi_call_memory_ctx, VARSIZE(arg));
-		memcpy(snap, arg, VARSIZE(arg));
-
-		fctx->user_fctx = snap;
-	}
-
-	/* return values one-by-one */
-	fctx = SRF_PERCALL_SETUP();
-	snap = fctx->user_fctx;
-	if (fctx->call_cntr < snap->nxip)
-	{
-		value = snap->xip[fctx->call_cntr];
-		SRF_RETURN_NEXT(fctx, Int64GetDatum(value));
-	}
-	else
-	{
-		SRF_RETURN_DONE(fctx);
-	}
-}
-
 /*
  * Report the status of a recent transaction ID, or null for wrapped,
  * truncated away or otherwise too old XIDs.
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 214dc712ca..c58d6adb6f 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -75,6 +75,8 @@ provider postgresql {
 	probe checkpoint__done(int, int, int, int, int);
 	probe clog__checkpoint__start(bool);
 	probe clog__checkpoint__done(bool);
+	probe csnlog__checkpoint__start(bool);
+	probe csnlog__checkpoint__done(bool);
 	probe subtrans__checkpoint__start(bool);
 	probe subtrans__checkpoint__done(bool);
 	probe multixact__checkpoint__start(bool);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 294ab705f1..15226fab99 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -152,19 +152,11 @@ static Snapshot CatalogSnapshot = NULL;
 static Snapshot HistoricSnapshot = NULL;
 
 /*
- * These are updated by GetSnapshotData.  We initialize them this way
- * for the convenience of TransactionIdIsInProgress: even in bootstrap
- * mode, we don't want it to say that BootstrapTransactionId is in progress.
- *
- * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
- * InvalidTransactionId, to ensure that no one tries to use a stale
- * value. Readers should ensure that it has been set to something else
- * before using it.
+ * These are updated by GetSnapshotData.  We initialize them this way,
+ * because even in bootstrap mode, we don't want it to say that
+ * BootstrapTransactionId is in progress.
  */
 TransactionId TransactionXmin = FirstNormalTransactionId;
-TransactionId RecentXmin = FirstNormalTransactionId;
-TransactionId RecentGlobalXmin = InvalidTransactionId;
-TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /* (table, ctid) => (cmin, cmax) mapping during timetravel */
 static HTAB *tuplecid_data = NULL;
@@ -238,9 +230,7 @@ typedef struct SerializedSnapshotData
 {
 	TransactionId xmin;
 	TransactionId xmax;
-	uint32		xcnt;
-	int32		subxcnt;
-	bool		suboverflowed;
+	CommitSeqNo snapshotcsn;
 	bool		takenDuringRecovery;
 	CommandId	curcid;
 	TimestampTz whenTaken;
@@ -579,26 +569,18 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 	 * Even though we are not going to use the snapshot it computes, we must
 	 * call GetSnapshotData, for two reasons: (1) to be sure that
 	 * CurrentSnapshotData's XID arrays have been allocated, and (2) to update
-	 * RecentXmin and RecentGlobalXmin.  (We could alternatively include those
+	 * RecentGlobalXmin.  (We could alternatively include those
 	 * two variables in exported snapshot files, but it seems better to have
 	 * snapshot importers compute reasonably up-to-date values for them.)
+	 *
+	 * FIXME: neither of those reasons hold anymore. Can we drop this?
 	 */
 	CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
 	/*
 	 * Now copy appropriate fields from the source snapshot.
 	 */
-	CurrentSnapshot->xmin = sourcesnap->xmin;
 	CurrentSnapshot->xmax = sourcesnap->xmax;
-	CurrentSnapshot->xcnt = sourcesnap->xcnt;
-	Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
-	memcpy(CurrentSnapshot->xip, sourcesnap->xip,
-		   sourcesnap->xcnt * sizeof(TransactionId));
-	CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
-	Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
-	memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
-		   sourcesnap->subxcnt * sizeof(TransactionId));
-	CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
 	CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
 	/* NB: curcid should NOT be copied, it's a local matter */
 
@@ -660,50 +642,17 @@ static Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
-	Size		subxipoff;
-	Size		size;
 
 	Assert(snapshot != InvalidSnapshot);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-	if (snapshot->subxcnt > 0)
-		size += snapshot->subxcnt * sizeof(TransactionId);
-
-	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	memcpy(newsnap, snapshot, sizeof(SnapshotData));
 
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
 	newsnap->copied = true;
 
-	/* setup XID array */
-	if (snapshot->xcnt > 0)
-	{
-		newsnap->xip = (TransactionId *) (newsnap + 1);
-		memcpy(newsnap->xip, snapshot->xip,
-			   snapshot->xcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->xip = NULL;
-
-	/*
-	 * Setup subXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (snapshot->subxcnt > 0 &&
-		(!snapshot->suboverflowed || snapshot->takenDuringRecovery))
-	{
-		newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
-		memcpy(newsnap->subxip, snapshot->subxip,
-			   snapshot->subxcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->subxip = NULL;
-
 	return newsnap;
 }
 
@@ -984,7 +933,7 @@ SnapshotResetXmin(void)
 
 	if (pairingheap_is_empty(&RegisteredSnapshots))
 	{
-		MyPgXact->xmin = InvalidTransactionId;
+		ProcArrayResetXmin(MyProc);
 		return;
 	}
 
@@ -992,7 +941,7 @@ SnapshotResetXmin(void)
 										pairingheap_first(&RegisteredSnapshots));
 
 	if (TransactionIdPrecedes(MyPgXact->xmin, minSnapshot->xmin))
-		MyPgXact->xmin = minSnapshot->xmin;
+		ProcArrayResetXmin(MyProc);
 }
 
 /*
@@ -1159,13 +1108,8 @@ char *
 ExportSnapshot(Snapshot snapshot)
 {
 	TransactionId topXid;
-	TransactionId *children;
-	ExportedSnapshot *esnap;
-	int			nchildren;
-	int			addTopXid;
 	StringInfoData buf;
 	FILE	   *f;
-	int			i;
 	MemoryContext oldcxt;
 	char		path[MAXPGPATH];
 	char		pathtmp[MAXPGPATH];
@@ -1185,9 +1129,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * Get our transaction ID if there is one, to include in the snapshot.
+	 * This will assign a transaction ID if we do not yet have one.
 	 */
-	topXid = GetTopTransactionIdIfAny();
+	topXid = GetTopTransactionId();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1200,20 +1144,6 @@ ExportSnapshot(Snapshot snapshot)
 				 errmsg("cannot export a snapshot from a subtransaction")));
 
 	/*
-	 * We do however allow previous committed subtransactions to exist.
-	 * Importers of the snapshot must see them as still running, so get their
-	 * XIDs to add them to the snapshot.
-	 */
-	nchildren = xactGetCommittedChildren(&children);
-
-	/*
-	 * Generate file path for the snapshot.  We start numbering of snapshots
-	 * inside the transaction from 1.
-	 */
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d",
-			 MyProc->backendId, MyProc->lxid, list_length(exportedSnapshots) + 1);
-
-	/*
 	 * Copy the snapshot into TopTransactionContext, add it to the
 	 * exportedSnapshots list, and mark it pseudo-registered.  We do this to
 	 * ensure that the snapshot's xmin is honored for the rest of the
@@ -1222,10 +1152,7 @@ ExportSnapshot(Snapshot snapshot)
 	snapshot = CopySnapshot(snapshot);
 
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
-	esnap->snapfile = pstrdup(path);
-	esnap->snapshot = snapshot;
-	exportedSnapshots = lappend(exportedSnapshots, esnap);
+	exportedSnapshots = lappend(exportedSnapshots, snapshot);
 	MemoryContextSwitchTo(oldcxt);
 
 	snapshot->regd_count++;
@@ -1238,7 +1165,7 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
+	appendStringInfo(&buf, "xid:%u\n", topXid);
 	appendStringInfo(&buf, "pid:%d\n", MyProcPid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
@@ -1247,42 +1174,10 @@ ExportSnapshot(Snapshot snapshot)
 	appendStringInfo(&buf, "xmin:%u\n", snapshot->xmin);
 	appendStringInfo(&buf, "xmax:%u\n", snapshot->xmax);
 
-	/*
-	 * We must include our own top transaction ID in the top-xid data, since
-	 * by definition we will still be running when the importing transaction
-	 * adopts the snapshot, but GetSnapshotData never includes our own XID in
-	 * the snapshot.  (There must, therefore, be enough room to add it.)
-	 *
-	 * However, it could be that our topXid is after the xmax, in which case
-	 * we shouldn't include it because xip[] members are expected to be before
-	 * xmax.  (We need not make the same check for subxip[] members, see
-	 * snapshot.h.)
-	 */
-	addTopXid = (TransactionIdIsValid(topXid) &&
-				 TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
-	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
-	for (i = 0; i < snapshot->xcnt; i++)
-		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
-	if (addTopXid)
-		appendStringInfo(&buf, "xip:%u\n", topXid);
-
-	/*
-	 * Similarly, we add our subcommitted child XIDs to the subxid data. Here,
-	 * we have to cope with possible overflow.
-	 */
-	if (snapshot->suboverflowed ||
-		snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
-		appendStringInfoString(&buf, "sof:1\n");
-	else
-	{
-		appendStringInfoString(&buf, "sof:0\n");
-		appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
-		for (i = 0; i < snapshot->subxcnt; i++)
-			appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
-		for (i = 0; i < nchildren; i++)
-			appendStringInfo(&buf, "sxp:%u\n", children[i]);
-	}
 	appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
+	appendStringInfo(&buf, "snapshotcsn:%X/%X\n",
+					 (uint32) (snapshot->snapshotcsn >> 32),
+					 (uint32) snapshot->snapshotcsn);
 
 	/*
 	 * Now write the text representation into a file.  We first write to a
@@ -1342,85 +1237,6 @@ pg_export_snapshot(PG_FUNCTION_ARGS)
 
 
 /*
- * Parsing subroutines for ImportSnapshot: parse a line with the given
- * prefix followed by a value, and advance *s to the next line.  The
- * filename is provided for use in error messages.
- */
-static int
-parseIntFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	int			val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static TransactionId
-parseXidFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	TransactionId val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%u", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static void
-parseVxidFromText(const char *prefix, char **s, const char *filename,
-				  VirtualTransactionId *vxid)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-}
-
-/*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
  *		filename in SNAPSHOT_EXPORT_DIR.  Load the snapshot from that file.
@@ -1429,170 +1245,7 @@ parseVxidFromText(const char *prefix, char **s, const char *filename,
 void
 ImportSnapshot(const char *idstr)
 {
-	char		path[MAXPGPATH];
-	FILE	   *f;
-	struct stat stat_buf;
-	char	   *filebuf;
-	int			xcnt;
-	int			i;
-	VirtualTransactionId src_vxid;
-	int			src_pid;
-	Oid			src_dbid;
-	int			src_isolevel;
-	bool		src_readonly;
-	SnapshotData snapshot;
-
-	/*
-	 * Must be at top level of a fresh transaction.  Note in particular that
-	 * we check we haven't acquired an XID --- if we have, it's conceivable
-	 * that the snapshot would show it as not running, making for very screwy
-	 * behavior.
-	 */
-	if (FirstSnapshotSet ||
-		GetTopTransactionIdIfAny() != InvalidTransactionId ||
-		IsSubTransaction())
-		ereport(ERROR,
-				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
-				 errmsg("SET TRANSACTION SNAPSHOT must be called before any query")));
-
-	/*
-	 * If we are in read committed mode then the next query would execute with
-	 * a new snapshot thus making this function call quite useless.
-	 */
-	if (!IsolationUsesXactSnapshot())
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("a snapshot-importing transaction must have isolation level SERIALIZABLE or REPEATABLE READ")));
-
-	/*
-	 * Verify the identifier: only 0-9, A-F and hyphens are allowed.  We do
-	 * this mainly to prevent reading arbitrary files.
-	 */
-	if (strspn(idstr, "0123456789ABCDEF-") != strlen(idstr))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* OK, read the file */
-	snprintf(path, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", idstr);
-
-	f = AllocateFile(path, PG_BINARY_R);
-	if (!f)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* get the size of the file so that we know how much memory we need */
-	if (fstat(fileno(f), &stat_buf))
-		elog(ERROR, "could not stat file \"%s\": %m", path);
-
-	/* and read the file into a palloc'd string */
-	filebuf = (char *) palloc(stat_buf.st_size + 1);
-	if (fread(filebuf, stat_buf.st_size, 1, f) != 1)
-		elog(ERROR, "could not read file \"%s\": %m", path);
-
-	filebuf[stat_buf.st_size] = '\0';
-
-	FreeFile(f);
-
-	/*
-	 * Construct a snapshot struct by parsing the file content.
-	 */
-	memset(&snapshot, 0, sizeof(snapshot));
-
-	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
-	src_pid = parseIntFromText("pid:", &filebuf, path);
-	/* we abuse parseXidFromText a bit here ... */
-	src_dbid = parseXidFromText("dbid:", &filebuf, path);
-	src_isolevel = parseIntFromText("iso:", &filebuf, path);
-	src_readonly = parseIntFromText("ro:", &filebuf, path);
-
-	snapshot.xmin = parseXidFromText("xmin:", &filebuf, path);
-	snapshot.xmax = parseXidFromText("xmax:", &filebuf, path);
-
-	snapshot.xcnt = xcnt = parseIntFromText("xcnt:", &filebuf, path);
-
-	/* sanity-check the xid count before palloc */
-	if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	snapshot.xip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-	for (i = 0; i < xcnt; i++)
-		snapshot.xip[i] = parseXidFromText("xip:", &filebuf, path);
-
-	snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
-
-	if (!snapshot.suboverflowed)
-	{
-		snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
-
-		/* sanity-check the xid count before palloc */
-		if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-		snapshot.subxip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-		for (i = 0; i < xcnt; i++)
-			snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
-	}
-	else
-	{
-		snapshot.subxcnt = 0;
-		snapshot.subxip = NULL;
-	}
-
-	snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
-
-	/*
-	 * Do some additional sanity checking, just to protect ourselves.  We
-	 * don't trouble to check the array elements, just the most critical
-	 * fields.
-	 */
-	if (!VirtualTransactionIdIsValid(src_vxid) ||
-		!OidIsValid(src_dbid) ||
-		!TransactionIdIsNormal(snapshot.xmin) ||
-		!TransactionIdIsNormal(snapshot.xmax))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	/*
-	 * If we're serializable, the source transaction must be too, otherwise
-	 * predicate.c has problems (SxactGlobalXmin could go backwards).  Also, a
-	 * non-read-only transaction can't adopt a snapshot from a read-only
-	 * transaction, as predicate.c handles the cases very differently.
-	 */
-	if (IsolationIsSerializable())
-	{
-		if (src_isolevel != XACT_SERIALIZABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a serializable transaction cannot import a snapshot from a non-serializable transaction")));
-		if (src_readonly && !XactReadOnly)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a non-read-only serializable transaction cannot import a snapshot from a read-only transaction")));
-	}
-
-	/*
-	 * We cannot import a snapshot that was taken in a different database,
-	 * because vacuum calculates OldestXmin on a per-database basis; so the
-	 * source transaction's xmin doesn't protect us from data loss.  This
-	 * restriction could be removed if the source transaction were to mark its
-	 * xmin as being globally applicable.  But that would require some
-	 * additional syntax, since that has to be known when the snapshot is
-	 * initially taken.  (See pgsql-hackers discussion of 2011-10-21.)
-	 */
-	if (src_dbid != MyDatabaseId)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot import a snapshot from a different database")));
-
-	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
+	Assert(false);
 }
 
 /*
@@ -1831,7 +1484,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		if (NormalTransactionIdFollows(xlimit, recentXmin))
 			return xlimit;
 	}
-
 	return recentXmin;
 }
 
@@ -2042,13 +1694,7 @@ EstimateSnapshotSpace(Snapshot snap)
 	Assert(snap != InvalidSnapshot);
 	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = add_size(sizeof(SerializedSnapshotData),
-					mul_size(snap->xcnt, sizeof(TransactionId)));
-	if (snap->subxcnt > 0 &&
-		(!snap->suboverflowed || snap->takenDuringRecovery))
-		size = add_size(size,
-						mul_size(snap->subxcnt, sizeof(TransactionId)));
+	size = sizeof(SerializedSnapshotData);
 
 	return size;
 }
@@ -2063,51 +1709,20 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 
-	Assert(snapshot->subxcnt >= 0);
-
 	/* Copy all required fields */
 	serialized_snapshot.xmin = snapshot->xmin;
 	serialized_snapshot.xmax = snapshot->xmax;
-	serialized_snapshot.xcnt = snapshot->xcnt;
-	serialized_snapshot.subxcnt = snapshot->subxcnt;
-	serialized_snapshot.suboverflowed = snapshot->suboverflowed;
 	serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
 	serialized_snapshot.curcid = snapshot->curcid;
 	serialized_snapshot.whenTaken = snapshot->whenTaken;
 	serialized_snapshot.lsn = snapshot->lsn;
 
-	/*
-	 * Ignore the SubXID array if it has overflowed, unless the snapshot was
-	 * taken during recovery - in that case, top-level XIDs are in subxip as
-	 * well, and we mustn't lose them.
-	 */
-	if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
-		serialized_snapshot.subxcnt = 0;
+	serialized_snapshot.snapshotcsn = snapshot->snapshotcsn;
 
 	/* Copy struct to possibly-unaligned buffer */
 	memcpy(start_address,
 		   &serialized_snapshot, sizeof(SerializedSnapshotData));
 
-	/* Copy XID array */
-	if (snapshot->xcnt > 0)
-		memcpy((TransactionId *) (start_address +
-								  sizeof(SerializedSnapshotData)),
-			   snapshot->xip, snapshot->xcnt * sizeof(TransactionId));
-
-	/*
-	 * Copy SubXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		Size		subxipoff = sizeof(SerializedSnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-
-		memcpy((TransactionId *) (start_address + subxipoff),
-			   snapshot->subxip, snapshot->subxcnt * sizeof(TransactionId));
-	}
 }
 
 /*
@@ -2121,52 +1736,21 @@ Snapshot
 RestoreSnapshot(char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
-	Size		size;
 	Snapshot	snapshot;
-	TransactionId *serialized_xids;
 
 	memcpy(&serialized_snapshot, start_address,
 		   sizeof(SerializedSnapshotData));
-	serialized_xids = (TransactionId *)
-		(start_address + sizeof(SerializedSnapshotData));
-
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
-		+ serialized_snapshot.xcnt * sizeof(TransactionId)
-		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
 
 	/* Copy all required fields */
-	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	snapshot->satisfies = HeapTupleSatisfiesMVCC;
 	snapshot->xmin = serialized_snapshot.xmin;
 	snapshot->xmax = serialized_snapshot.xmax;
-	snapshot->xip = NULL;
-	snapshot->xcnt = serialized_snapshot.xcnt;
-	snapshot->subxip = NULL;
-	snapshot->subxcnt = serialized_snapshot.subxcnt;
-	snapshot->suboverflowed = serialized_snapshot.suboverflowed;
+	snapshot->snapshotcsn = serialized_snapshot.snapshotcsn;
 	snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
 	snapshot->curcid = serialized_snapshot.curcid;
 	snapshot->whenTaken = serialized_snapshot.whenTaken;
 	snapshot->lsn = serialized_snapshot.lsn;
-
-	/* Copy XIDs, if present. */
-	if (serialized_snapshot.xcnt > 0)
-	{
-		snapshot->xip = (TransactionId *) (snapshot + 1);
-		memcpy(snapshot->xip, serialized_xids,
-			   serialized_snapshot.xcnt * sizeof(TransactionId));
-	}
-
-	/* Copy SubXIDs, if present. */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
-			serialized_snapshot.xcnt;
-		memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt,
-			   serialized_snapshot.subxcnt * sizeof(TransactionId));
-	}
-
 	/* Set the copied flag so that the caller will set refcounts correctly. */
 	snapshot->regd_count = 0;
 	snapshot->active_count = 0;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index bbac4083c9..bd4fc2ac65 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -10,28 +10,6 @@
  * the passed-in buffer.  The caller must hold not only a pin, but at least
  * shared buffer content lock on the buffer containing the tuple.
  *
- * NOTE: When using a non-MVCC snapshot, we must check
- * TransactionIdIsInProgress (which looks in the PGXACT array)
- * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
- * pg_xact).  Otherwise we have a race condition: we might decide that a
- * just-committed transaction crashed, because none of the tests succeed.
- * xact.c is careful to record commit/abort in pg_xact before it unsets
- * MyPgXact->xid in the PGXACT array.  That fixes that problem, but it
- * also means there is a window where TransactionIdIsInProgress and
- * TransactionIdDidCommit will both return true.  If we check only
- * TransactionIdDidCommit, we could consider a tuple committed when a
- * later GetSnapshotData call will still think the originating transaction
- * is in progress, which leads to application-level inconsistency.  The
- * upshot is that we gotta check TransactionIdIsInProgress first in all
- * code paths, except for a few cases where we are looking at
- * subtransactions of our own main transaction and so there can't be any
- * race condition.
- *
- * When using an MVCC snapshot, we rely on XidInMVCCSnapshot rather than
- * TransactionIdIsInProgress, but the logic is otherwise the same: do not
- * check pg_xact until after deciding that the xact is no longer in progress.
- *
- *
  * Summary of visibility functions:
  *
  *	 HeapTupleSatisfiesMVCC()
@@ -82,7 +60,10 @@ SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
 
 /* local functions */
-static bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+static bool XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					 TransactionIdStatus *hintstatus);
+static bool CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot);
+static bool IsMovedTupleVisible(HeapTuple htup, Buffer buffer);
 
 /*
  * SetHintBits()
@@ -122,7 +103,7 @@ SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
-		XLogRecPtr	commitLSN = TransactionIdGetCommitLSN(xid);
+		XLogRecPtr		commitLSN = TransactionIdGetCommitLSN(xid);
 
 		if (BufferIsPermanent(buffer) && XLogNeedsFlush(commitLSN) &&
 			BufferGetLSNAtomic(buffer) < commitLSN)
@@ -178,6 +159,8 @@ bool
 HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -188,45 +171,10 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -260,17 +208,18 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot, &hintstatus);
+
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 
@@ -300,12 +249,13 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (!visible)
+		{
+			/* it must have aborted or crashed */
 			return true;
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
+		}
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -315,16 +265,15 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return true;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot, &hintstatus);
+	if (hintstatus == XID_ABORTED)
 	{
 		/* it must have aborted or crashed */
 		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 					InvalidTransactionId);
-		return true;
 	}
+	if (!visible)
+		return true;
 
 	/* xmax transaction committed */
 
@@ -379,51 +328,15 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
 		/*
 		 * An invalid Xmin can be left behind by a speculative insertion that
 		 * is canceled by super-deleting the tuple.  This also applies to
 		 * TOAST tuples created during speculative insertion.
 		 */
-		else if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
+		if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
 			return false;
 	}
 
@@ -463,6 +376,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -473,45 +387,15 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
+			if (IsMovedTupleVisible(htup, buffer))
+				return HeapTupleMayBeUpdated;
+			else
 				return HeapTupleInvisible;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
 		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return HeapTupleInvisible;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
 				return HeapTupleInvisible;	/* inserted after scan started */
@@ -545,9 +429,11 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				 * left in this Xmax; otherwise, report the tuple as
 				 * locked/updated.
 				 */
-				if (!TransactionIdIsInProgress(xmax))
+				xidstatus = TransactionIdGetStatus(xmax);
+				if (xidstatus != XID_INPROGRESS)
 					return HeapTupleMayBeUpdated;
-				return HeapTupleBeingUpdated;
+				else
+					return HeapTupleBeingUpdated;
 			}
 
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -591,17 +477,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			else
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return HeapTupleInvisible;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return HeapTupleInvisible;
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			if (xidstatus == XID_COMMITTED)
+			{
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetXmin(tuple));
+			}
+			else
+			{
+				if (xidstatus == XID_ABORTED)
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return HeapTupleInvisible;
+			}
 		}
 	}
 
@@ -651,17 +541,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
 
-		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-			return HeapTupleBeingUpdated;
-
-		if (TransactionIdDidCommit(xmax))
-			return HeapTupleUpdated;
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
+		{
+			case XID_INPROGRESS:
+				return HeapTupleBeingUpdated;
+			case XID_COMMITTED:
+				return HeapTupleUpdated;
+			case XID_ABORTED:
+				break;
+		}
 
 		/*
 		 * By here, the update in the Xmax is either aborted or crashed, but
 		 * what about the other members?
 		 */
-
 		if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
 		{
 			/*
@@ -689,15 +583,18 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;	/* updated before scan started */
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return HeapTupleBeingUpdated;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return HeapTupleMayBeUpdated;
+		case XID_INPROGRESS:
+			return HeapTupleBeingUpdated;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return HeapTupleMayBeUpdated;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -742,6 +639,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -755,45 +653,10 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -827,35 +690,39 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+		else
 		{
-			/*
-			 * Return the speculative token to caller.  Caller can worry about
-			 * xmax, since it requires a conclusively locked row version, and
-			 * a concurrent update to this tuple is a conflict of its
-			 * purposes.
-			 */
-			if (HeapTupleHeaderIsSpeculative(tuple))
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			switch (xidstatus)
 			{
-				snapshot->speculativeToken =
-					HeapTupleHeaderGetSpeculativeToken(tuple);
-
-				Assert(snapshot->speculativeToken != 0);
+				case XID_INPROGRESS:
+					/*
+					 * Return the speculative token to caller.  Caller can worry about
+					 * xmax, since it requires a conclusively locked row version, and
+					 * a concurrent update to this tuple is a conflict of its
+					 * purposes.
+					 */
+					if (HeapTupleHeaderIsSpeculative(tuple))
+					{
+						snapshot->speculativeToken =
+							HeapTupleHeaderGetSpeculativeToken(tuple);
+
+						Assert(snapshot->speculativeToken != 0);
+					}
+
+					snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
+					/* XXX shouldn't we fall through to look at xmax? */
+					return true;		/* in insertion by other */
+				case XID_COMMITTED:
+					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+								HeapTupleHeaderGetRawXmin(tuple));
+					break;
+				case XID_ABORTED:
+					/* it must have aborted or crashed */
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return false;
 			}
-
-			snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
-			/* XXX shouldn't we fall through to look at xmax? */
-			return true;		/* in insertion by other */
-		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
-		else
-		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
 		}
 	}
 
@@ -885,15 +752,19 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
 		{
-			snapshot->xmax = xmax;
-			return true;
+			case XID_INPROGRESS:
+				snapshot->xmax = xmax;
+				return true;
+			case XID_COMMITTED:
+				return false;
+			case XID_ABORTED:
+				/* it must have aborted or crashed */
+				return true;
 		}
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -903,19 +774,20 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
-			snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
-		return true;
-	}
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
-	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return true;
+		case XID_INPROGRESS:
+			if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+				snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
+			return true;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return true;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -944,28 +816,14 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  *		transactions shown as in-progress by the snapshot
  *		transactions started after the snapshot was taken
  *		changes made by the current command
- *
- * Notice that here, we will not update the tuple status hint bits if the
- * inserting/deleting transaction is still running according to our snapshot,
- * even if in reality it's committed or aborted by now.  This is intentional.
- * Checking the true transaction state would require access to high-traffic
- * shared data structures, creating contention we'd rather do without, and it
- * would not change the result of our visibility check anyway.  The hint bits
- * will be updated by the first visitor that has a snapshot new enough to see
- * the inserting/deleting transaction as done.  In the meantime, the cost of
- * leaving the hint bits unset is basically that each HeapTupleSatisfiesMVCC
- * call will need to run TransactionIdIsCurrentTransactionId in addition to
- * XidInMVCCSnapshot (but it would have to do the latter anyway).  In the old
- * coding where we tried to set the hint bits as soon as possible, we instead
- * did TransactionIdIsInProgress in each call --- to no avail, as long as the
- * inserting/deleting transaction was still running --- which was more cycles
- * and more contention on the PGXACT array.
  */
 bool
 HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 					   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -976,45 +834,10 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!XidInMVCCSnapshot(xvac, snapshot))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (XidInMVCCSnapshot(xvac, snapshot))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
 				return false;	/* inserted after scan started */
@@ -1056,25 +879,29 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetXmin(tuple),
+										   snapshot, &hintstatus);
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 	else
 	{
 		/* xmin is committed, but maybe not according to our snapshot */
-		if (!HeapTupleHeaderXminFrozen(tuple) &&
-			XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;		/* treat as still in progress */
+		if (!HeapTupleHeaderXminFrozen(tuple))
+		{
+			visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot);
+			if (!visible)
+				return false;		/* treat as still in progress */
+		}
 	}
 
 	/* by here, the inserting transaction has committed */
@@ -1104,12 +931,15 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		if (XidInMVCCSnapshot(xmax, snapshot))
-			return true;
-		if (TransactionIdDidCommit(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (visible)
 			return false;		/* updating transaction committed */
-		/* it must have aborted or crashed */
-		return true;
+		else
+		{
+			/* it must have aborted or crashed */
+			return true;
+		}
 	}
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
@@ -1122,25 +952,28 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 				return false;	/* deleted before scan started */
 		}
 
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
-			return true;
-
-		if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple),
+									   snapshot, &hintstatus);
+		if (hintstatus == XID_COMMITTED)
+		{
+			/* xmax transaction committed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
+						HeapTupleHeaderGetRawXmax(tuple));
+		}
+		if (hintstatus == XID_ABORTED)
 		{
 			/* it must have aborted or crashed */
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 						InvalidTransactionId);
-			return true;
 		}
-
-		/* xmax transaction committed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
-					HeapTupleHeaderGetRawXmax(tuple));
+		if (!visible)
+			return true;		/* treat as still in progress */
 	}
 	else
 	{
 		/* xmax is committed, but maybe not according to our snapshot */
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
+		visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot);
+		if (!visible)
 			return true;		/* treat as still in progress */
 	}
 
@@ -1149,7 +982,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	return false;
 }
 
-
 /*
  * HeapTupleSatisfiesVacuum
  *
@@ -1157,16 +989,22 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
  *	we mainly want to know is if a tuple is potentially visible to *any*
  *	running transaction.  If so, it can't be removed yet by VACUUM.
  *
- * OldestXmin is a cutoff XID (obtained from GetOldestXmin()).  Tuples
- * deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
- * still be visible to some open transaction, so we can't remove them,
- * even if we see that the deleting transaction has committed.
+ * OldestSnapshot is a cutoff snapshot (obtained from GetOldestSnapshot()).
+ * Tuples deleted by XIDs that are still visible to OldestSnapshot are deemed
+ * "recently dead"; they might still be visible to some open transaction,
+ * so we can't remove them, even if we see that the deleting transaction
+ * has committed.
+ *
+ * Note: predicate.c calls this with a current snapshot, rather than one obtained
+ * from GetOldestSnapshot(). So even if this function determines that a tuple
+ * is not visible to anyone anymore, we can't "kill" the tuple right here.
  */
 HTSV_Result
 HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1181,44 +1019,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
 			return HEAPTUPLE_DEAD;
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
-				return HEAPTUPLE_DEAD;
-			}
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						InvalidTransactionId);
-		}
 		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
+			if (IsMovedTupleVisible(htup, buffer))
+				return HEAPTUPLE_LIVE;
 			else
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
 				return HEAPTUPLE_DEAD;
-			}
 		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return HEAPTUPLE_INSERT_IN_PROGRESS;
@@ -1232,7 +1043,10 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* deleting subtransaction must have aborted */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 		{
 			/*
 			 * It'd be possible to discern between INSERT/DELETE in progress
@@ -1244,7 +1058,7 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			 */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 						HeapTupleHeaderGetRawXmin(tuple));
 		else
@@ -1295,7 +1109,8 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			}
 			else
 			{
-				if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+				xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+				if (xidstatus == XID_INPROGRESS)
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 							InvalidTransactionId);
@@ -1325,13 +1140,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* not LOCKED_ONLY, so it has to have an xmax */
 			Assert(TransactionIdIsValid(xmax));
 
-			if (TransactionIdIsInProgress(xmax))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			else if (TransactionIdDidCommit(xmax))
-				/* there are still lockers around -- can't return DEAD here */
-				return HEAPTUPLE_RECENTLY_DEAD;
-			/* updating transaction aborted */
-			return HEAPTUPLE_LIVE;
+			switch(TransactionIdGetStatus(xmax))
+			{
+				case XID_INPROGRESS:
+					return HEAPTUPLE_DELETE_IN_PROGRESS;
+				case XID_COMMITTED:
+					/* there are still lockers around -- can't return DEAD here */
+					return HEAPTUPLE_RECENTLY_DEAD;
+				case XID_ABORTED:
+					/* updating transaction aborted */
+					return HEAPTUPLE_LIVE;
+			}
 		}
 
 		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
@@ -1341,8 +1160,12 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 		/* not LOCKED_ONLY, so it has to have an xmax */
 		Assert(TransactionIdIsValid(xmax));
 
-		/* multi is not running -- updating xact cannot be */
-		Assert(!TransactionIdIsInProgress(xmax));
+		/*
+		 * multi is not running -- updating xact cannot be (this assertion
+		 * won't catch a running subtransaction)
+		 */
+		Assert(!TransactionIdIsActive(xmax));
+
 		if (TransactionIdDidCommit(xmax))
 		{
 			if (!TransactionIdPrecedes(xmax, OldestXmin))
@@ -1361,9 +1184,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 	{
-		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 			return HEAPTUPLE_DELETE_IN_PROGRESS;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 						HeapTupleHeaderGetRawXmax(tuple));
 		else
@@ -1473,127 +1298,99 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 }
 
 /*
- * XidInMVCCSnapshot
- *		Is the given XID still-in-progress according to the snapshot?
+ * XidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
+ *
+ * If 'known_committed' is true, xid is known to be committed already, even
+ * though it might not be visible to the snapshot. Passing 'true' can save
+ * some cycles.
  *
- * Note: GetSnapshotData never stores either top xid or subxids of our own
- * backend into a snapshot, so these xids will not be reported as "running"
- * by this function.  This is OK for current uses, because we always check
- * TransactionIdIsCurrentTransactionId first, except for known-committed
- * XIDs which could not be ours anyway.
+ * On return, *hintstatus is set to indicate if the transaction had committed,
+ * or aborted, whether or not it's not visible to us.
  */
 static bool
-XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
+XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					 TransactionIdStatus *hintstatus)
 {
-	uint32		i;
+	CommitSeqNo csn;
 
-	/*
-	 * Make a quick range check to eliminate most XIDs without looking at the
-	 * xip arrays.  Note that this is OK even if we convert a subxact XID to
-	 * its parent below, because a subxact with XID < xmin has surely also got
-	 * a parent with XID < xmin, while one with XID >= xmax must belong to a
-	 * parent that was not yet committed at the time of this snapshot.
-	 */
-
-	/* Any xid < xmin is not in-progress */
-	if (TransactionIdPrecedes(xid, snapshot->xmin))
-		return false;
-	/* Any xid >= xmax is in-progress */
-	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
-		return true;
+	*hintstatus = XID_INPROGRESS;
 
 	/*
-	 * Snapshot information is stored slightly differently in snapshots taken
-	 * during recovery.
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here).
+	 *
+	 * We can't do anything useful with xmin, because the xmin only tells us
+	 * whether we see it as completed. We have to check the transaction log to
+	 * see if the transaction committed or aborted, in any case.
 	 */
-	if (!snapshot->takenDuringRecovery)
-	{
-		/*
-		 * If the snapshot contains full subxact data, the fastest way to
-		 * check things is just to compare the given XID against both subxact
-		 * XIDs and top-level XIDs.  If the snapshot overflowed, we have to
-		 * use pg_subtrans to convert a subxact XID to its parent XID, but
-		 * then we need only look at top-level XIDs not subxacts.
-		 */
-		if (!snapshot->suboverflowed)
-		{
-			/* we have full data, so search subxip */
-			int32		j;
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
 
-			for (j = 0; j < snapshot->subxcnt; j++)
-			{
-				if (TransactionIdEquals(xid, snapshot->subxip[j]))
-					return true;
-			}
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-			/* not there, fall through to search xip[] */
-		}
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		*hintstatus = XID_COMMITTED;
+		if (csn < snapshot->snapshotcsn)
+			return true;
 		else
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
-
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
-
-		for (i = 0; i < snapshot->xcnt; i++)
-		{
-			if (TransactionIdEquals(xid, snapshot->xip[i]))
-				return true;
-		}
+			return false;
 	}
 	else
 	{
-		int32		j;
+		if (csn == COMMITSEQNO_ABORTED)
+			*hintstatus = XID_ABORTED;
+		return false;
+	}
+}
 
-		/*
-		 * In recovery we store all xids in the subxact array because it is by
-		 * far the bigger array, and we mostly don't know which xids are
-		 * top-level and which are subxacts. The xip array is empty.
-		 *
-		 * We start by searching subtrans, if we overflowed.
-		 */
-		if (snapshot->suboverflowed)
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
+/*
+ * CommittedXidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
+ *
+ * This is the same as XidVisibleInSnapshot, but the caller knows that the
+ * given XID committed. The only question is whether it's visible to our
+ * snapshot or not.
+ */
+static bool
+CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot)
+{
+	CommitSeqNo csn;
 
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
+	/*
+	 * Make a quick range check to eliminate most XIDs without looking at the
+	 * CSN log.
+	 */
+	if (TransactionIdPrecedes(xid, snapshot->xmin))
+		return true;
+
+	/*
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here.
+	 */
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
+
+	csn = TransactionIdGetCommitSeqNo(xid);
 
+	if (!COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		elog(WARNING, "transaction %u was hinted as committed, but was not marked as committed in the transaction log", xid);
 		/*
-		 * We now have either a top-level xid higher than xmin or an
-		 * indeterminate xid. We don't know whether it's top level or subxact
-		 * but it doesn't matter. If it's present, the xid is visible.
+		 * We have contradicting evidence on whether the transaction committed or
+		 * not. Let's assume that it did. That seems better than erroring out.
 		 */
-		for (j = 0; j < snapshot->subxcnt; j++)
-		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-				return true;
-		}
+		return true;
 	}
 
-	return false;
+	if (csn < snapshot->snapshotcsn)
+		return true;
+	else
+		return false;
 }
 
+
 /*
  * Is the tuple really only locked?  That is, is it not updated?
  *
@@ -1607,6 +1404,7 @@ bool
 HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 {
 	TransactionId xmax;
+	TransactionIdStatus	xidstatus;
 
 	/* if there's no valid Xmax, then there's obviously no update either */
 	if (tuple->t_infomask & HEAP_XMAX_INVALID)
@@ -1634,9 +1432,11 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 
 	if (TransactionIdIsCurrentTransactionId(xmax))
 		return false;
-	if (TransactionIdIsInProgress(xmax))
+
+	xidstatus = TransactionIdGetStatus(xmax);
+	if (xidstatus == XID_INPROGRESS)
 		return false;
-	if (TransactionIdDidCommit(xmax))
+	if (xidstatus == XID_COMMITTED)
 		return false;
 
 	/*
@@ -1677,6 +1477,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 	TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+	TransactionIdStatus hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1688,7 +1489,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's one of our txids, toplevel is also in there */
-	else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+	else if (TransactionIdInArray(xmin, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin = HeapTupleHeaderGetRawCommandId(tuple);
@@ -1699,7 +1500,8 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * cmin/cmax was stored in a combocid. So we need to lookup the actual
 		 * values externally.
 		 */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1712,34 +1514,11 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;		/* inserted after scan started */
 		/* fall through */
 	}
-	/* committed before our xmin horizon. Do a normal visibility check. */
-	else if (TransactionIdPrecedes(xmin, snapshot->xmin))
-	{
-		Assert(!(HeapTupleHeaderXminCommitted(tuple) &&
-				 !TransactionIdDidCommit(xmin)));
-
-		/* check for hint bit first, consult clog afterwards */
-		if (!HeapTupleHeaderXminCommitted(tuple) &&
-			!TransactionIdDidCommit(xmin))
-			return false;
-		/* fall through */
-	}
-	/* beyond our xmax horizon, i.e. invisible */
-	else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
-	{
-		return false;
-	}
-	/* check if it's a committed transaction in [xmin, xmax) */
-	else if (TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
-	{
-		/* fall through */
-	}
-
 	/*
-	 * none of the above, i.e. between [xmin, xmax) but hasn't committed. I.e.
-	 * invisible.
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
 	 */
-	else
+	else if (!XidVisibleInSnapshot(xmin, snapshot, &hintstatus))
 	{
 		return false;
 	}
@@ -1763,14 +1542,15 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	}
 
 	/* check if it's one of our txids, toplevel is also in there */
-	if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+	if (TransactionIdInArray(xmax, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin;
 		CommandId	cmax = HeapTupleHeaderGetRawCommandId(tuple);
 
 		/* Lookup actual cmin/cmax values */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1784,26 +1564,74 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		else
 			return false;		/* deleted before scan started */
 	}
-	/* below xmin horizon, normal transaction state is valid */
-	else if (TransactionIdPrecedes(xmax, snapshot->xmin))
-	{
-		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
-				 !TransactionIdDidCommit(xmax)));
+	/*
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
+	 */
+	if (XidVisibleInSnapshot(xmax, snapshot, &hintstatus))
+		return false;
+	else
+		return true;
+}
 
-		/* check hint bit first */
-		if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
-			return false;
 
-		/* check clog */
-		return !TransactionIdDidCommit(xmax);
+/*
+ * Check the visibility on a tuple with HEAP_MOVED flags set.
+ *
+ * Returns true if the tuple is visible, false otherwise. These flags are
+ * no longer used, any such tuples must've come from binary upgrade of a
+ * pre-9.0 system, so we can assume that the xid is long finished by now.
+ */
+static bool
+IsMovedTupleVisible(HeapTuple htup, Buffer buffer)
+{
+	HeapTupleHeader tuple = htup->t_data;
+	TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+	TransactionIdStatus xidstatus;
+
+	/*
+	 * Check that the xvac is not a live transaction. This should never
+	 * happen, because HEAP_MOVED flags are not set by current code.
+	 */
+	if (TransactionIdIsCurrentTransactionId(xvac))
+		elog(ERROR, "HEAP_MOVED tuple with in-progress xvac: %u", xvac);
+
+	xidstatus = TransactionIdGetStatus(xvac);
+
+	if (tuple->t_infomask & HEAP_MOVED_OFF)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+	}
+	/* Used by pre-9.0 binary upgrades */
+	else if (tuple->t_infomask & HEAP_MOVED_IN)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
 	}
-	/* above xmax horizon, we cannot possibly see the deleting transaction */
-	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
-		return true;
-	/* xmax is between [xmin, xmax), check known committed array */
-	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
-		return false;
-	/* xmax is between [xmin, xmax), but known not to have committed yet */
 	else
-		return true;
+	{
+		elog(ERROR, "IsMovedTupleVisible() called on a non-moved tuple");
+		return true; /* keep compiler quiet */
+	}
 }
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 9d1e5d789f..e6dae3ddae 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -198,12 +198,12 @@ static const char *backend_options = "--single -F -O -j -c search_path=pg_catalo
 static const char *const subdirs[] = {
 	"global",
 	"pg_wal/archive_status",
+	"pg_csnlog",
 	"pg_commit_ts",
 	"pg_dynshmem",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
-	"pg_subtrans",
 	"pg_twophase",
 	"pg_multixact",
 	"pg_multixact/members",
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 7bae0902b5..0755ffd864 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -17,16 +17,19 @@
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
  */
-typedef int XidStatus;
+typedef int CLogXidStatus;
+
+#define CLOG_XID_STATUS_IN_PROGRESS		0x00
+#define CLOG_XID_STATUS_COMMITTED		0x01
+#define CLOG_XID_STATUS_ABORTED			0x02
 
-#define TRANSACTION_STATUS_IN_PROGRESS		0x00
-#define TRANSACTION_STATUS_COMMITTED		0x01
-#define TRANSACTION_STATUS_ABORTED			0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED	0x03
+/*
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet. We don't create these anymore, but accept
+ * them in existing clog, if we've been pg_upgraded from an older version.
+ */
+#define CLOG_XID_STATUS_SUB_COMMITTED	0x03
 
 typedef struct xl_clog_truncate
 {
@@ -35,9 +38,9 @@ typedef struct xl_clog_truncate
 	Oid			oldestXactDb;
 } xl_clog_truncate;
 
-extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern void CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn);
+extern CLogXidStatus CLogGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
 extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
diff --git a/src/include/access/csnlog.h b/src/include/access/csnlog.h
new file mode 100644
index 0000000000..165effbee6
--- /dev/null
+++ b/src/include/access/csnlog.h
@@ -0,0 +1,33 @@
+/*
+ * csnlog.h
+ *
+ * Commit-Sequence-Number log.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clog.h
+ */
+#ifndef CSNLOG_H
+#define CSNLOG_H
+
+#include "access/xlog.h"
+
+extern void CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn);
+extern CommitSeqNo CSNLogGetCommitSeqNo(TransactionId xid);
+extern TransactionId CSNLogGetNextActiveXid(TransactionId start,
+											TransactionId end);
+
+extern Size CSNLOGShmemBuffers(void);
+extern Size CSNLOGShmemSize(void);
+extern void CSNLOGShmemInit(void);
+extern void BootStrapCSNLOG(void);
+extern void StartupCSNLOG(TransactionId oldestActiveXID);
+extern void TrimCSNLOG(void);
+extern void ShutdownCSNLOG(void);
+extern void CheckPointCSNLOG(void);
+extern void ExtendCSNLOG(TransactionId newestXact);
+extern void TruncateCSNLOG(TransactionId oldestXact);
+
+#endif   /* CSNLOG_H */
diff --git a/src/include/access/mvccvars.h b/src/include/access/mvccvars.h
new file mode 100644
index 0000000000..66de5a8ea6
--- /dev/null
+++ b/src/include/access/mvccvars.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvccvars.h
+ *	  Shared memory variables for XID assignment and snapshots
+ *
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/mvccvars.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVCCVARS_H
+#define MVCCVARS_H
+
+#include "port/atomics.h"
+
+/*
+ * VariableCache is a data structure in shared memory that is used to track
+ * OID and XID assignment state.  For largely historical reasons, there is
+ * just one struct with different fields that are protected by different
+ * LWLocks.
+ *
+ * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
+ * used just to generate useful messages when xidWarnLimit or xidStopLimit
+ * are exceeded.
+ */
+typedef struct VariableCacheData
+{
+	/*
+	 * These fields are protected by OidGenLock.
+	 */
+	Oid			nextOid;		/* next OID to assign */
+	uint32		oidCount;		/* OIDs available before must do XLOG work */
+
+	/*
+	 * These fields are protected by XidGenLock.
+	 */
+	TransactionId nextXid;		/* next XID to assign */
+
+	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
+	TransactionId xidVacLimit;	/* start forcing autovacuums here */
+	TransactionId xidWarnLimit; /* start complaining here */
+	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
+	TransactionId xidWrapLimit; /* where the world ends */
+	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
+
+
+	/*
+	 * Fields related to MVCC snapshots.
+	 *
+	 * lastCommitSeqNo is the CSN assigned to last committed transaction.
+	 * It is protected by CommitSeqNoLock.
+	 *
+	 * latestCompletedXid is the highest XID that has committed. Anything
+	 * > this is seen by still in-progress by everyone. Use atomic ops to
+	 * update.
+	 *
+	 * oldestActiveXid is the XID of the oldest transaction that's still
+	 * in-progress. (Or rather, the oldest XID among all still in-progress
+	 * transactions; it's not necessarily the one that started first).
+	 * Must hold ProcArrayLock in shared mode, and use atomic ops, to update.
+	 */
+	pg_atomic_uint64 nextCommitSeqNo;
+	pg_atomic_uint32 latestCompletedXid;
+	pg_atomic_uint32 oldestActiveXid;
+
+	/*
+	 * These fields are protected by CommitTsLock
+	 */
+	TransactionId oldestCommitTsXid;
+	TransactionId newestCommitTsXid;
+
+	/*
+	 * These fields are protected by CLogTruncationLock
+	 */
+	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
+} VariableCacheData;
+
+typedef VariableCacheData *VariableCache;
+
+/* in transam/varsup.c */
+extern PGDLLIMPORT VariableCache ShmemVariableCache;
+
+#endif   /* MVCCVARS_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index d829a6fab4..1d38423cfc 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -105,6 +105,8 @@ typedef struct SlruSharedData
 } SlruSharedData;
 
 typedef SlruSharedData *SlruShared;
+typedef struct HTAB HTAB;
+typedef struct PageSlotEntry PageSlotEntry;
 
 /*
  * SlruCtlData is an unshared structure that points to the active information
@@ -113,6 +115,7 @@ typedef SlruSharedData *SlruShared;
 typedef struct SlruCtlData
 {
 	SlruShared	shared;
+	HTAB *pageToSlot;
 
 	/*
 	 * This flag tells whether to fsync writes (true for pg_xact and multixact
@@ -145,6 +148,8 @@ extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
 extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
 						   TransactionId xid);
+extern int SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno,
+						   TransactionId xid);
 extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
 extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
 extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 41716d7b71..92267be465 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,20 +11,9 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
+/* these are in csnlog.c now */
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
 
-extern Size SUBTRANSShmemSize(void);
-extern void SUBTRANSShmemInit(void);
-extern void BootStrapSUBTRANS(void);
-extern void StartupSUBTRANS(TransactionId oldestActiveXID);
-extern void ShutdownSUBTRANS(void);
-extern void CheckPointSUBTRANS(void);
-extern void ExtendSUBTRANS(TransactionId newestXact);
-extern void TruncateSUBTRANS(TransactionId oldestXact);
-
-#endif							/* SUBTRANS_H */
+#endif   /* SUBTRANS_H */
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 86076dede1..7a3839ce19 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -93,57 +93,6 @@
 #define FirstBootstrapObjectId	10000
 #define FirstNormalObjectId		16384
 
-/*
- * VariableCache is a data structure in shared memory that is used to track
- * OID and XID assignment state.  For largely historical reasons, there is
- * just one struct with different fields that are protected by different
- * LWLocks.
- *
- * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
- * used just to generate useful messages when xidWarnLimit or xidStopLimit
- * are exceeded.
- */
-typedef struct VariableCacheData
-{
-	/*
-	 * These fields are protected by OidGenLock.
-	 */
-	Oid			nextOid;		/* next OID to assign */
-	uint32		oidCount;		/* OIDs available before must do XLOG work */
-
-	/*
-	 * These fields are protected by XidGenLock.
-	 */
-	TransactionId nextXid;		/* next XID to assign */
-
-	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
-	TransactionId xidVacLimit;	/* start forcing autovacuums here */
-	TransactionId xidWarnLimit; /* start complaining here */
-	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
-	TransactionId xidWrapLimit; /* where the world ends */
-	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
-
-	/*
-	 * These fields are protected by CommitTsLock
-	 */
-	TransactionId oldestCommitTsXid;
-	TransactionId newestCommitTsXid;
-
-	/*
-	 * These fields are protected by ProcArrayLock.
-	 */
-	TransactionId latestCompletedXid;	/* newest XID that has committed or
-										 * aborted */
-
-	/*
-	 * These fields are protected by CLogTruncationLock
-	 */
-	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
-
-} VariableCacheData;
-
-typedef VariableCacheData *VariableCache;
-
 
 /* ----------------
  *		extern declarations
@@ -153,15 +102,44 @@ typedef VariableCacheData *VariableCache;
 /* in transam/xact.c */
 extern bool TransactionStartedDuringRecovery(void);
 
-/* in transam/varsup.c */
-extern PGDLLIMPORT VariableCache ShmemVariableCache;
-
 /*
  * prototypes for functions in transam/transam.c
  */
 extern bool TransactionIdDidCommit(TransactionId transactionId);
 extern bool TransactionIdDidAbort(TransactionId transactionId);
-extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
+
+
+#define COMMITSEQNO_INPROGRESS	UINT64CONST(0x0)
+#define COMMITSEQNO_ABORTED		UINT64CONST(0x1)
+/*
+ * COMMITSEQNO_COMMITING is an intermediate state that is used to set CSN
+ * atomically for a top level transaction and its subtransactions.
+ * High-level users should not see this value, see TransactionIdGetCommitSeqNo().
+ */
+#define COMMITSEQNO_COMMITTING	UINT64CONST(0x2)
+#define COMMITSEQNO_FROZEN		UINT64CONST(0x3)
+#define COMMITSEQNO_FIRST_NORMAL UINT64CONST(0x4)
+
+#define COMMITSEQNO_IS_INPROGRESS(csn) ((csn) == COMMITSEQNO_INPROGRESS)
+#define COMMITSEQNO_IS_ABORTED(csn) ((csn) == COMMITSEQNO_ABORTED)
+#define COMMITSEQNO_IS_FROZEN(csn) ((csn) == COMMITSEQNO_FROZEN)
+#define COMMITSEQNO_IS_NORMAL(csn) ((csn) >= COMMITSEQNO_FIRST_NORMAL)
+#define COMMITSEQNO_IS_COMMITTING(csn) ((csn) == COMMITSEQNO_COMMITTING)
+#define COMMITSEQNO_IS_COMMITTED(csn) ((csn) >= COMMITSEQNO_FROZEN && !COMMITSEQNO_IS_SUBTRANS(csn))
+
+#define CSN_SUBTRANS_BIT		(UINT64CONST(1)<<63)
+
+#define COMMITSEQNO_IS_SUBTRANS(csn) ((csn) & CSN_SUBTRANS_BIT)
+
+typedef enum
+{
+	XID_COMMITTED,
+	XID_ABORTED,
+	XID_INPROGRESS
+} TransactionIdStatus;
+
+extern CommitSeqNo TransactionIdGetCommitSeqNo(TransactionId xid);
+extern TransactionIdStatus TransactionIdGetStatus(TransactionId transactionId);
 extern void TransactionIdAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index f2c10f905f..72016ba377 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -135,7 +135,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XLOG_XACT_ABORT				0x20
 #define XLOG_XACT_COMMIT_PREPARED	0x30
 #define XLOG_XACT_ABORT_PREPARED	0x40
-#define XLOG_XACT_ASSIGNMENT		0x50
+/* free opcode 0x50 */
 /* free opcode 0x60 */
 /* free opcode 0x70 */
 
@@ -334,7 +334,6 @@ extern TransactionId GetCurrentTransactionId(void);
 extern TransactionId GetCurrentTransactionIdIfAny(void);
 extern TransactionId GetStableLatestTransactionId(void);
 extern SubTransactionId GetCurrentSubTransactionId(void);
-extern void MarkCurrentTransactionIdLoggedIfAny(void);
 extern bool SubTransactionIsActive(SubTransactionId subxid);
 extern CommandId GetCurrentCommandId(bool used);
 extern TimestampTz GetCurrentTransactionStartTimestamp(void);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 66bfb77295..c2f557e9c7 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -52,11 +52,6 @@ extern bool InRecovery;
  * we haven't yet processed a RUNNING_XACTS or shutdown-checkpoint WAL record
  * to initialize our master-transaction tracking system.
  *
- * When the transaction tracking is initialized, we enter the SNAPSHOT_PENDING
- * state. The tracked information might still be incomplete, so we can't allow
- * connections yet, but redo functions must update the in-memory state when
- * appropriate.
- *
  * In SNAPSHOT_READY mode, we have full knowledge of transactions that are
  * (or were) running in the master at the current WAL location. Snapshots
  * can be taken, and read-only queries can be run.
@@ -65,13 +60,12 @@ typedef enum
 {
 	STANDBY_DISABLED,
 	STANDBY_INITIALIZED,
-	STANDBY_SNAPSHOT_PENDING,
 	STANDBY_SNAPSHOT_READY
 } HotStandbyState;
 
 extern HotStandbyState standbyState;
 
-#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_PENDING)
+#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_READY)
 
 /*
  * Recovery target type.
diff --git a/src/include/c.h b/src/include/c.h
index 630dfbfc41..3e4ebda3c8 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -410,6 +410,13 @@ typedef uint32 CommandId;
 #define InvalidCommandId	(~(CommandId)0)
 
 /*
+ * CommitSeqNo is currently an LSN, but keep use a separate datatype for clarity.
+ */
+typedef uint64 CommitSeqNo;
+
+#define InvalidCommitSeqNo		((CommitSeqNo) 0)
+
+/*
  * Array indexing support
  */
 #define MAXDIM 6
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index d820b56aa1..618d2a0501 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5086,8 +5086,6 @@ DATA(insert OID = 2945 (  txid_snapshot_xmin		PGNSP PGUID 12 1  0 0 0 f f f f t
 DESCR("get xmin of snapshot");
 DATA(insert OID = 2946 (  txid_snapshot_xmax		PGNSP PGUID 12 1  0 0 0 f f f f t f i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xmax _null_ _null_ _null_ ));
 DESCR("get xmax of snapshot");
-DATA(insert OID = 2947 (  txid_snapshot_xip			PGNSP PGUID 12 1 50 0 0 f f f f t t i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xip _null_ _null_ _null_ ));
-DESCR("get set of in-progress txids in snapshot");
 DATA(insert OID = 2948 (  txid_visible_in_snapshot	PGNSP PGUID 12 1  0 0 0 f f f f t f i s 2 0 16 "20 2970" _null_ _null_ _null_ _null_ _null_ txid_visible_in_snapshot _null_ _null_ _null_ ));
 DESCR("is txid visible in snapshot?");
 DATA(insert OID = 3360 (  txid_status				PGNSP PGUID 12 1  0 0 0 f f f f t f v s 1 0 25 "20" _null_ _null_ _null_ _null_ _null_ txid_status _null_ _null_ _null_ ));
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 7653717f83..6e93a9033f 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -20,30 +20,14 @@ typedef enum
 	/*
 	 * Initial state, we can't do much yet.
 	 */
-	SNAPBUILD_START = -1,
+	SNAPBUILD_START,
 
 	/*
-	 * Collecting committed transactions, to build the initial catalog
-	 * snapshot.
+	 * Found a point after hitting built_full_snapshot where all transactions
+	 * that were running at that point finished. Till we reach that we hold
+	 * off calling any commit callbacks.
 	 */
-	SNAPBUILD_BUILDING_SNAPSHOT = 0,
-
-	/*
-	 * We have collected enough information to decode tuples in transactions
-	 * that started after this.
-	 *
-	 * Once we reached this we start to collect changes. We cannot apply them
-	 * yet, because they might be based on transactions that were still
-	 * running when FULL_SNAPSHOT was reached.
-	 */
-	SNAPBUILD_FULL_SNAPSHOT = 1,
-
-	/*
-	 * Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
-	 * were running at that point finished. Till we reach that we hold off
-	 * calling any commit callbacks.
-	 */
-	SNAPBUILD_CONSISTENT = 2
+	SNAPBUILD_CONSISTENT
 } SnapBuildState;
 
 /* forward declare so we don't have to expose the struct to the public */
@@ -57,10 +41,8 @@ struct ReorderBuffer;
 struct xl_heap_new_cid;
 struct xl_running_xacts;
 
-extern void CheckPointSnapBuild(void);
-
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
-						TransactionId xmin_horizon, XLogRecPtr start_lsn,
+						XLogRecPtr start_lsn,
 						bool need_full_snapshot);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
@@ -85,6 +67,7 @@ extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 					   XLogRecPtr lsn, struct xl_heap_new_cid *cid);
 extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
 							 struct xl_running_xacts *running);
-extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+extern void SnapBuildProcessInitialSnapshot(SnapBuild *builder, XLogRecPtr lsn,
+								TransactionId xmin, TransactionId xmax);
 
 #endif							/* SNAPBUILD_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 3d16132c88..d491a0014c 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -197,7 +197,7 @@ typedef enum BuiltinTrancheIds
 {
 	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
 	LWTRANCHE_COMMITTS_BUFFERS,
-	LWTRANCHE_SUBTRANS_BUFFERS,
+	LWTRANCHE_CSNLOG_BUFFERS,
 	LWTRANCHE_MXACTOFFSET_BUFFERS,
 	LWTRANCHE_MXACTMEMBER_BUFFERS,
 	LWTRANCHE_ASYNC_BUFFERS,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 205f484510..bc611fd8cc 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -23,24 +23,6 @@
 #include "storage/proclist_types.h"
 
 /*
- * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
- * for non-aborted subtransactions of its current top transaction.  These
- * have to be treated as running XIDs by other backends.
- *
- * We also keep track of whether the cache overflowed (ie, the transaction has
- * generated at least one subtransaction that didn't fit in the cache).
- * If none of the caches have overflowed, we can assume that an XID that's not
- * listed anywhere in the PGPROC array is not a running transaction.  Else we
- * have to look at pg_subtrans.
- */
-#define PGPROC_MAX_CACHED_SUBXIDS 64	/* XXX guessed-at value */
-
-struct XidCache
-{
-	TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
-};
-
-/*
  * Flags for PGXACT->vacuumFlags
  *
  * Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
@@ -77,6 +59,14 @@ struct XidCache
 #define INVALID_PGPROCNO		PG_INT32_MAX
 
 /*
+ * The number of subtransactions below which we consider to apply clog group
+ * update optimization.  Testing reveals that the number higher than this can
+ * hurt performance.
+ */
+#define THRESHOLD_SUBTRANS_CLOG_OPT	5
+
+
+/*
  * Each backend has a PGPROC struct in shared memory.  There is also a list of
  * currently-unused PGPROC structs that will be reallocated to new backends.
  *
@@ -156,8 +146,6 @@ struct PGPROC
 	 */
 	SHM_QUEUE	myProcLocks[NUM_LOCK_PARTITIONS];
 
-	struct XidCache subxids;	/* cache for subtransaction XIDs */
-
 	/* Support for group XID clearing. */
 	/* true, if member of ProcArray group waiting for XID clear */
 	bool		procArrayGroupMember;
@@ -176,12 +164,14 @@ struct PGPROC
 	bool		clogGroupMember;	/* true, if member of clog group */
 	pg_atomic_uint32 clogGroupNext; /* next clog group member */
 	TransactionId clogGroupMemberXid;	/* transaction id of clog group member */
-	XidStatus	clogGroupMemberXidStatus;	/* transaction status of clog
+	CLogXidStatus	clogGroupMemberXidStatus;	/* transaction status of clog
 											 * group member */
 	int			clogGroupMemberPage;	/* clog page corresponding to
 										 * transaction id of clog group member */
 	XLogRecPtr	clogGroupMemberLsn; /* WAL location of commit record for clog
 									 * group member */
+	TransactionId		clogGroupSubxids[THRESHOLD_SUBTRANS_CLOG_OPT];
+	int					clogGroupNSubxids;
 
 	/* Per-backend LWLock.  Protects fields below (but not group fields). */
 	LWLock		backendLock;
@@ -215,6 +205,9 @@ extern PGDLLIMPORT struct PGXACT *MyPgXact;
  * considerably on systems with many CPU cores, by reducing the number of
  * cache lines needing to be fetched.  Thus, think very carefully before adding
  * anything else here.
+ *
+ * XXX: GetSnapshotData no longer does that, so perhaps we should put these
+ * back to PGPROC for simplicity's sake.
  */
 typedef struct PGXACT
 {
@@ -224,15 +217,17 @@ typedef struct PGXACT
 
 	TransactionId xmin;			/* minimal running XID as it was when we were
 								 * starting our xact, excluding LAZY VACUUM:
-								 * vacuum must not remove tuples deleted by
 								 * xid >= xmin ! */
 
+	CommitSeqNo	snapshotcsn;	/* oldest snapshot in use in this backend:
+								 * vacuum must not remove tuples deleted by
+								 * xacts with commit seqno > snapshotcsn !
+								 * XXX: currently unused, vacuum uses just xmin, still.
+								 */
+
 	uint8		vacuumFlags;	/* vacuum-related flags, see above */
-	bool		overflowed;
 	bool		delayChkpt;		/* true if this proc delays checkpoint start;
 								 * previously called InCommit */
-
-	uint8		nxids;
 } PGXACT;
 
 /*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 174c537be4..1e54b5d92c 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -58,25 +58,18 @@
 extern Size ProcArrayShmemSize(void);
 extern void CreateSharedProcArray(void);
 extern void ProcArrayAdd(PGPROC *proc);
-extern void ProcArrayRemove(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayRemove(PGPROC *proc);
 
-extern void ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayEndTransaction(PGPROC *proc);
 extern void ProcArrayClearTransaction(PGPROC *proc);
+extern void ProcArrayResetXmin(PGPROC *proc);
 
-extern void ProcArrayInitRecovery(TransactionId initializedUptoXID);
+extern void ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID);
 extern void ProcArrayApplyRecoveryInfo(RunningTransactions running);
 extern void ProcArrayApplyXidAssignment(TransactionId topxid,
 							int nsubxids, TransactionId *subxids);
 
 extern void RecordKnownAssignedTransactionIds(TransactionId xid);
-extern void ExpireTreeKnownAssignedTransactionIds(TransactionId xid,
-									  int nsubxids, TransactionId *subxids,
-									  TransactionId max_xid);
-extern void ExpireAllKnownAssignedTransactionIds(void);
-extern void ExpireOldKnownAssignedTransactionIds(TransactionId xid);
-
-extern int	GetMaxSnapshotXidCount(void);
-extern int	GetMaxSnapshotSubxidCount(void);
 
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
@@ -86,8 +79,9 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
 
-extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
+extern TransactionId GetRecentGlobalXmin(void);
+extern TransactionId GetRecentGlobalDataXmin(void);
 extern TransactionId GetOldestXmin(Relation rel, int flags);
 extern TransactionId GetOldestActiveTransactionId(void);
 extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
@@ -100,9 +94,8 @@ extern PGPROC *BackendPidGetProcWithLock(int pid);
 extern int	BackendXidGetPid(TransactionId xid);
 extern bool IsBackendPid(int pid);
 
-extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
-					  bool excludeXmin0, bool allDbs, int excludeVacuum,
-					  int *nvxids);
+extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
+					  bool allDbs, int excludeVacuum, int *nvxids);
 extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
 extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);
 
@@ -114,10 +107,6 @@ extern int	CountUserBackends(Oid roleid);
 extern bool CountOtherDBBackends(Oid databaseId,
 					 int *nbackends, int *nprepared);
 
-extern void XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid);
-
 extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 								TransactionId catalog_xmin, bool already_locked);
 
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index f5404b4c1f..80d0917615 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -50,10 +50,7 @@ extern void StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid
 extern void StandbyReleaseLockTree(TransactionId xid,
 					   int nsubxids, TransactionId *subxids);
 extern void StandbyReleaseAllLocks(void);
-extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
-
-#define MinSizeOfXactRunningXacts offsetof(xl_running_xacts, xids)
-
+extern void StandbyReleaseOldLocks(TransactionId oldestRunningXid);
 
 /*
  * Declarations for GetRunningTransactionData(). Similar to Snapshots, but
@@ -69,14 +66,8 @@ extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
 
 typedef struct RunningTransactionsData
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
-	TransactionId oldestRunningXid; /* *not* oldestXmin */
-	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId *xids;		/* array of (sub)xids still running */
+	TransactionId oldestRunningXid;		/* *not* oldestXmin */
 } RunningTransactionsData;
 
 typedef RunningTransactionsData *RunningTransactions;
diff --git a/src/include/storage/standbydefs.h b/src/include/storage/standbydefs.h
index a0af6788e9..2bc167e5cc 100644
--- a/src/include/storage/standbydefs.h
+++ b/src/include/storage/standbydefs.h
@@ -46,16 +46,13 @@ typedef struct xl_standby_locks
  */
 typedef struct xl_running_xacts
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
 	TransactionId oldestRunningXid; /* *not* oldestXmin */
 	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
 } xl_running_xacts;
 
+#define SizeOfXactRunningXacts (offsetof(xl_running_xacts, latestCompletedXid) + sizeof(TransactionId))
+
 /*
  * Invalidations for standby, currently only when transactions without an
  * assigned xid commit.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index fc64153780..bbef99b875 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -57,9 +57,6 @@ extern TimestampTz GetOldSnapshotThresholdTimestamp(void);
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
-extern TransactionId RecentXmin;
-extern PGDLLIMPORT TransactionId RecentGlobalXmin;
-extern TransactionId RecentGlobalDataXmin;
 
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index bf519778df..759cbd4fc8 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -60,37 +60,18 @@ typedef struct SnapshotData
 	 * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
 	 * HeapTupleSatisfiesNonVacuumable.)
 	 *
-	 * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
-	 * the effects of all older XIDs except those listed in the snapshot. xmin
-	 * is stored as an optimization to avoid needing to search the XID arrays
-	 * for most tuples.
+	 * An MVCC snapshot can see the effects of those XIDs that committed
+	 * after snapshotlsn. xmin and xmax are stored as an optimization, to
+	 * avoid checking the commit LSN for most tuples.
 	 */
 	TransactionId xmin;			/* all XID < xmin are visible to me */
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
 
 	/*
-	 * For normal MVCC snapshot this contains the all xact IDs that are in
-	 * progress, unless the snapshot was taken during recovery in which case
-	 * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
-	 * it contains *committed* transactions between xmin and xmax.
-	 *
-	 * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
-	 */
-	TransactionId *xip;
-	uint32		xcnt;			/* # of xact ids in xip[] */
-
-	/*
-	 * For non-historic MVCC snapshots, this contains subxact IDs that are in
-	 * progress (and other transactions that are in progress if taken during
-	 * recovery). For historic snapshot it contains *all* xids assigned to the
-	 * replayed transaction, including the toplevel xid.
-	 *
-	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
-	 * out any that are >= xmax
+	 * This snapshot can see the effects of all transactions with CSN <=
+	 * snapshotcsn.
 	 */
-	TransactionId *subxip;
-	int32		subxcnt;		/* # of xact ids in subxip[] */
-	bool		suboverflowed;	/* has the subxip array overflowed? */
+	CommitSeqNo	snapshotcsn;
 
 	bool		takenDuringRecovery;	/* recovery-shaped snapshot? */
 	bool		copied;			/* false if it's a static snapshot */
@@ -104,6 +85,14 @@ typedef struct SnapshotData
 	uint32		speculativeToken;
 
 	/*
+	 * this_xip contains *all* xids assigned to the replayed transaction,
+	 * including the toplevel xid. Used only in a historic MVCC snapshot,
+	 * used in logical decoding.
+	 */
+	TransactionId *this_xip;
+	uint32		this_xcnt;			/* # of xact ids in this_xip[] */
+
+	/*
 	 * Book-keeping information, used by the snapshot manager
 	 */
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
diff --git a/src/test/modules/mvcctorture/Makefile b/src/test/modules/mvcctorture/Makefile
new file mode 100644
index 0000000000..cc4ebc838a
--- /dev/null
+++ b/src/test/modules/mvcctorture/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/mvcctorture/Makefile
+
+MODULE_big	= mvcctorture
+OBJS		= mvcctorture.o
+
+EXTENSION = mvcctorture
+DATA = mvcctorture--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/mvcctorture
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/mvcctorture/README b/src/test/modules/mvcctorture/README
new file mode 100644
index 0000000000..915b00129a
--- /dev/null
+++ b/src/test/modules/mvcctorture/README
@@ -0,0 +1,25 @@
+A litte helper module for testing MVCC performance.
+
+The populate_mvcc_test_table function can be used to create a test table,
+with given number of rows. Each row in the table is stamped with a different
+xmin, and XMIN_COMMITTED hint bit can be set or not. Furthermore, the
+xmins values are shuffled, to defeat caching in transam.c and clog.c as badly
+as possible.
+
+The test table is always called "mvcc_test_table". You'll have to drop it
+yourself between tests.
+
+For example:
+
+-- Create a test table with 10 million rows, without setting hint bits
+select populate_mvcc_test_table(10000000, false);
+
+-- See how long it takes to scan it
+\timing
+select count(*) from mvcc_test_table;
+
+
+
+If you do the above, but have another psql session open, in a transaction
+that's done some updates, i.e. is holding backthe xmin horizon, you will
+see the worst-case performance of the CSN patch.
diff --git a/src/test/modules/mvcctorture/mvcctorture--1.0.sql b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
new file mode 100644
index 0000000000..652a6a3f39
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/mvcctorture/mvcctorture--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION mvcctorture" to load this file. \quit
+
+CREATE FUNCTION populate_mvcc_test_table(int4, bool)
+RETURNS void
+AS 'MODULE_PATHNAME', 'populate_mvcc_test_table'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/mvcctorture/mvcctorture.c b/src/test/modules/mvcctorture/mvcctorture.c
new file mode 100644
index 0000000000..a89a2e6e96
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.c
@@ -0,0 +1,129 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvctorture.c
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/mvcctorture.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/hio.h"
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "executor/spi.h"
+#include "funcapi.h"
+#include "nodes/makefuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(populate_mvcc_test_table);
+
+Datum
+populate_mvcc_test_table(PG_FUNCTION_ARGS)
+{
+	uint32		nrows = PG_GETARG_UINT32(0);
+	bool		set_xmin_committed = PG_GETARG_BOOL(1);
+	RangeVar   *rv;
+	Relation	rel;
+	Datum		values[1];
+	bool		isnull[1];
+	HeapTuple	tup;
+	TransactionId *xids;
+	int			ret;
+	int			i;
+	Buffer		buffer;
+	Buffer		vmbuffer = InvalidBuffer;
+
+	/* Connect to SPI manager */
+	if ((ret = SPI_connect()) < 0)
+		/* internal error */
+		elog(ERROR, "populate_mvcc_test_table: SPI_connect returned %d", ret);
+
+	SPI_execute("CREATE TABLE mvcc_test_table(i int4)", false, 0);
+
+	SPI_finish();
+
+	/* Generate a different XID for each tuple */
+	xids = (TransactionId *) palloc0(nrows * sizeof(TransactionId));
+	for (i = 0; i < nrows; i++)
+	{
+		BeginInternalSubTransaction(NULL);
+		xids[i] = GetCurrentTransactionId();
+		ReleaseCurrentSubTransaction();
+	}
+
+	rv = makeRangeVar(NULL, "mvcc_test_table", -1);
+
+	rel = heap_openrv(rv, RowExclusiveLock);
+
+	/* shuffle */
+	for (i = 0; i < nrows - 1; i++)
+	{
+		int x = i + (random() % (nrows - i));
+		TransactionId tmp;
+
+		tmp = xids[i];
+		xids[i] = xids[x];
+		xids[x] = tmp;
+	}
+
+	for (i = 0; i < nrows; i++)
+	{
+		values[0] = Int32GetDatum(i);
+		isnull[0] = false;
+
+		tup = heap_form_tuple(RelationGetDescr(rel), values, isnull);
+
+		/* Fill the header fields, like heap_prepare_insert does */
+		tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
+		tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+		tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
+		if (set_xmin_committed)
+			tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup->t_data, xids[i]);
+		HeapTupleHeaderSetCmin(tup->t_data, 1);
+		HeapTupleHeaderSetXmax(tup->t_data, 0);		/* for cleanliness */
+		tup->t_tableOid = RelationGetRelid(rel);
+
+		heap_freetuple(tup);
+
+		/*
+		 * Find buffer to insert this tuple into.  If the page is all visible,
+		 * this will also pin the requisite visibility map page.
+		 */
+		buffer = RelationGetBufferForTuple(rel, tup->t_len,
+										   InvalidBuffer,
+										   0, NULL,
+										   &vmbuffer, NULL);
+		RelationPutHeapTuple(rel, buffer, tup, false);
+
+		if (PageIsAllVisible(BufferGetPage(buffer)))
+		{
+			PageClearAllVisible(BufferGetPage(buffer));
+			visibilitymap_clear(rel,
+								ItemPointerGetBlockNumber(&(tup->t_self)),
+								vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+
+		MarkBufferDirty(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
+
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
+	heap_close(rel, NoLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/mvcctorture/mvcctorture.control b/src/test/modules/mvcctorture/mvcctorture.control
new file mode 100644
index 0000000000..1b5feb95a7
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.control
@@ -0,0 +1,5 @@
+# mvcctorture extension
+comment = 'populate a table with a mix of different XIDs'
+default_version = '1.0'
+module_pathname = '$libdir/mvcctorture'
+relocatable = true
diff --git a/src/test/regress/expected/txid.out b/src/test/regress/expected/txid.out
index 015dae3051..a53ada26ac 100644
--- a/src/test/regress/expected/txid.out
+++ b/src/test/regress/expected/txid.out
@@ -1,199 +1,44 @@
 -- txid_snapshot data type and related functions
 -- i/o
-select '12:13:'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
  txid_snapshot 
 ---------------
- 12:13:
-(1 row)
-
-select '12:18:14,16'::txid_snapshot;
- txid_snapshot 
----------------
- 12:18:14,16
-(1 row)
-
-select '12:16:14,14'::txid_snapshot;
- txid_snapshot 
----------------
- 12:16:14
+ 12:0/ABCDABCD
 (1 row)
 
 -- errors
-select '31:12:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "31:12:"
-LINE 1: select '31:12:'::txid_snapshot;
-               ^
-select '0:1:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "0:1:"
-LINE 1: select '0:1:'::txid_snapshot;
-               ^
-select '12:13:0'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:13:0"
-LINE 1: select '12:13:0'::txid_snapshot;
-               ^
-select '12:16:14,13'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:16:14,13"
-LINE 1: select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
+ERROR:  invalid input syntax for type txid_snapshot: "0:0/ABCDABCD"
+LINE 1: select '0:0/ABCDABCD'::txid_snapshot;
                ^
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
-                                                                snap                                                                 
--------------------------------------------------------------------------------------------------------------------------------------
- 12:13:
- 12:20:13,15,18
- 100001:100009:100005,100007,100008
- 100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131
-(4 rows)
+     snap      
+---------------
+ 12:0/ABCDABCD
+(1 row)
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
- txid_snapshot_xmin | txid_snapshot_xmax | txid_snapshot_xip 
---------------------+--------------------+-------------------
-                 12 |                 20 |                13
-                 12 |                 20 |                15
-                 12 |                 20 |                18
-             100001 |             100009 |            100005
-             100001 |             100009 |            100007
-             100001 |             100009 |            100008
-                100 |                150 |               101
-                100 |                150 |               102
-                100 |                150 |               103
-                100 |                150 |               104
-                100 |                150 |               105
-                100 |                150 |               106
-                100 |                150 |               107
-                100 |                150 |               108
-                100 |                150 |               109
-                100 |                150 |               110
-                100 |                150 |               111
-                100 |                150 |               112
-                100 |                150 |               113
-                100 |                150 |               114
-                100 |                150 |               115
-                100 |                150 |               116
-                100 |                150 |               117
-                100 |                150 |               118
-                100 |                150 |               119
-                100 |                150 |               120
-                100 |                150 |               121
-                100 |                150 |               122
-                100 |                150 |               123
-                100 |                150 |               124
-                100 |                150 |               125
-                100 |                150 |               126
-                100 |                150 |               127
-                100 |                150 |               128
-                100 |                150 |               129
-                100 |                150 |               130
-                100 |                150 |               131
-(37 rows)
+ txid_snapshot_xmax 
+--------------------
+                 12
+(1 row)
 
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
- id | txid_visible_in_snapshot 
-----+--------------------------
- 11 | t
- 12 | t
- 13 | f
- 14 | t
- 15 | f
- 16 | t
- 17 | t
- 18 | f
- 19 | t
- 20 | f
- 21 | f
-(11 rows)
 
 -- test bsearch
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
- id  | txid_visible_in_snapshot 
------+--------------------------
-  90 | t
-  91 | t
-  92 | t
-  93 | t
-  94 | t
-  95 | t
-  96 | t
-  97 | t
-  98 | t
-  99 | t
- 100 | t
- 101 | f
- 102 | f
- 103 | f
- 104 | f
- 105 | f
- 106 | f
- 107 | f
- 108 | f
- 109 | f
- 110 | f
- 111 | f
- 112 | f
- 113 | f
- 114 | f
- 115 | f
- 116 | f
- 117 | f
- 118 | f
- 119 | f
- 120 | f
- 121 | f
- 122 | f
- 123 | f
- 124 | f
- 125 | f
- 126 | f
- 127 | f
- 128 | f
- 129 | f
- 130 | f
- 131 | f
- 132 | t
- 133 | t
- 134 | t
- 135 | t
- 136 | t
- 137 | t
- 138 | t
- 139 | t
- 140 | t
- 141 | t
- 142 | t
- 143 | t
- 144 | t
- 145 | t
- 146 | t
- 147 | t
- 148 | t
- 149 | t
- 150 | f
- 151 | f
- 152 | f
- 153 | f
- 154 | f
- 155 | f
- 156 | f
- 157 | f
- 158 | f
- 159 | f
- 160 | f
-(71 rows)
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
  ?column? 
@@ -208,98 +53,45 @@ select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
  f
 (1 row)
 
+/*
 -- test 64bitness
-select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
-                            txid_snapshot                            
----------------------------------------------------------------------
- 1000100010001000:1000100010001100:1000100010001012,1000100010001013
-(1 row)
 
+select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
 select txid_visible_in_snapshot('1000100010001012', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- f
-(1 row)
-
 select txid_visible_in_snapshot('1000100010001015', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- t
-(1 row)
 
 -- test 64bit overflow
 SELECT txid_snapshot '1:9223372036854775807:3';
-      txid_snapshot      
--------------------------
- 1:9223372036854775807:3
-(1 row)
-
 SELECT txid_snapshot '1:9223372036854775808:3';
-ERROR:  invalid input syntax for type txid_snapshot: "1:9223372036854775808:3"
-LINE 1: SELECT txid_snapshot '1:9223372036854775808:3';
-                             ^
+
 -- test txid_current_if_assigned
 BEGIN;
 SELECT txid_current_if_assigned() IS NULL;
- ?column? 
-----------
- t
-(1 row)
-
 SELECT txid_current() \gset
 SELECT txid_current_if_assigned() IS NOT DISTINCT FROM BIGINT :'txid_current';
- ?column? 
-----------
- t
-(1 row)
-
 COMMIT;
+
 -- test xid status functions
 BEGIN;
 SELECT txid_current() AS committed \gset
 COMMIT;
+
 BEGIN;
 SELECT txid_current() AS rolledback \gset
 ROLLBACK;
+
 BEGIN;
 SELECT txid_current() AS inprogress \gset
-SELECT txid_status(:committed) AS committed;
- committed 
------------
- committed
-(1 row)
 
+SELECT txid_status(:committed) AS committed;
 SELECT txid_status(:rolledback) AS rolledback;
- rolledback 
-------------
- aborted
-(1 row)
-
 SELECT txid_status(:inprogress) AS inprogress;
- inprogress  
--------------
- in progress
-(1 row)
-
 SELECT txid_status(1); -- BootstrapTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(2); -- FrozenTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(3); -- in regress testing FirstNormalTransactionId will always be behind oldestXmin
- txid_status 
--------------
- 
-(1 row)
 
 COMMIT;
+
 BEGIN;
 CREATE FUNCTION test_future_xid_status(bigint)
 RETURNS void
@@ -311,14 +103,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
-NOTICE:  Got expected error for xid in the future
- test_future_xid_status 
-------------------------
- 
-(1 row)
-
 ROLLBACK;
+*/
diff --git a/src/test/regress/sql/txid.sql b/src/test/regress/sql/txid.sql
index bd6decf0ef..6775e04e33 100644
--- a/src/test/regress/sql/txid.sql
+++ b/src/test/regress/sql/txid.sql
@@ -1,32 +1,22 @@
 -- txid_snapshot data type and related functions
 
 -- i/o
-select '12:13:'::txid_snapshot;
-select '12:18:14,16'::txid_snapshot;
-select '12:16:14,14'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
 
 -- errors
-select '31:12:'::txid_snapshot;
-select '0:1:'::txid_snapshot;
-select '12:13:0'::txid_snapshot;
-select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
 
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
 
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
-
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
@@ -35,7 +25,7 @@ where nr = 2;
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
@@ -43,6 +33,7 @@ select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
 select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
 
+/*
 -- test 64bitness
 
 select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
@@ -92,8 +83,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
 ROLLBACK;
+*/
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 17ba2bde5c..1276ab24e7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -34,6 +34,9 @@ AfterTriggerEventList
 AfterTriggerShared
 AfterTriggerSharedData
 AfterTriggersData
+AfterTriggersQueryData
+AfterTriggersTableData
+AfterTriggersTransData
 Agg
 AggClauseCosts
 AggInfo
@@ -125,7 +128,6 @@ ArrayMetaState
 ArrayParseState
 ArrayRef
 ArrayRefState
-ArrayRemapInfo
 ArrayType
 AsyncQueueControl
 AsyncQueueEntry
@@ -143,7 +145,6 @@ AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
 AutoVacuumWorkItemType
-AutovacWorkItems
 AuxProcType
 BF_ctx
 BF_key
@@ -633,6 +634,7 @@ FileFdwPlanState
 FileName
 FileNameMap
 FindSplitData
+FixedParallelExecutorState
 FixedParallelState
 FixedParamState
 FlagMode
@@ -1019,13 +1021,13 @@ InsertStmt
 Instrumentation
 Int128AggState
 Int8TransTypeData
+IntRBTreeNode
 InternalDefaultACL
 InternalGrant
 Interval
 IntoClause
 InvalidationChunk
 InvalidationListHeader
-InvertedWalkNextStep
 IpcMemoryId
 IpcMemoryKey
 IpcSemaphoreId
@@ -1567,6 +1569,7 @@ PartitionListValue
 PartitionRangeBound
 PartitionRangeDatum
 PartitionRangeDatumKind
+PartitionScheme
 PartitionSpec
 PartitionedChildRelInfo
 PasswordType
@@ -1777,7 +1780,6 @@ RangeBox
 RangeFunction
 RangeIOData
 RangeQueryClause
-RangeRemapInfo
 RangeSubselect
 RangeTableFunc
 RangeTableFuncCol
@@ -1790,6 +1792,7 @@ RangeVar
 RangeVarGetRelidCallback
 RawColumnDefault
 RawStmt
+ReInitializeDSMForeignScan_function
 ReScanForeignScan_function
 ReadBufPtrType
 ReadBufferMode
@@ -1801,8 +1804,6 @@ RecheckForeignScan_function
 RecordCacheEntry
 RecordCompareData
 RecordIOData
-RecordRemapInfo
-RecordTypmodMap
 RecoveryTargetAction
 RecoveryTargetType
 RectBox
@@ -2016,6 +2017,11 @@ SharedInvalRelmapMsg
 SharedInvalSmgrMsg
 SharedInvalSnapshotMsg
 SharedInvalidationMessage
+SharedRecordTableEntry
+SharedRecordTableKey
+SharedRecordTypmodRegistry
+SharedSortInfo
+SharedTypmodTableEntry
 ShellTypeInfo
 ShippableCacheEntry
 ShippableCacheKey
@@ -2289,9 +2295,10 @@ TupleHashEntryData
 TupleHashIterator
 TupleHashTable
 TupleQueueReader
-TupleRemapClass
-TupleRemapInfo
 TupleTableSlot
+TuplesortInstrumentation
+TuplesortMethod
+TuplesortSpaceType
 Tuplesortstate
 Tuplestorestate
 TwoPhaseCallback
@@ -2321,7 +2328,6 @@ UChar
 UCharIterator
 UCollator
 UConverter
-UEnumeration
 UErrorCode
 UINT
 ULARGE_INTEGER
@@ -2604,7 +2610,9 @@ dsa_pointer
 dsa_segment_header
 dsa_segment_index
 dsa_segment_map
+dshash_compare_function
 dshash_hash
+dshash_hash_function
 dshash_parameters
 dshash_partition
 dshash_table

#110

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Alexander Kuzmenkov (#109)

Re: Proposal for CSN based snapshots

On Mon, Sep 25, 2017 at 10:17 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:

Here is some news about the CSN patch.

* I merged it with master (58bd60995f), which now has the clog group update.
With this optimization, CSN is now faster than the master by about 15% on
100 to 400 clients (72 cores, pgbench tpcb-like, scale 500). It does not
degrade faster than master as it did before. The numbers of clients greater
than 400 were not measured.

Hmm, that's gratifying.

* Querying for CSN of subtransactions was not implemented in the previous
version of the patch, so I added it. I tested the performance on the
tpcb-like pgbench script with some savepoints added, and it was
significantly worse than on the master. The main culprit seems to be the
ProcArrayLock taken in GetSnapshotData, GetRecentGlobalXmin,
ProcArrayEndTransaction. Although it is only taken in shared mode, just
reading the current lock mode and writing the same value back takes about
10% CPU. Maybe we could do away with some of these locks, but there is some
interplay with imported snapshots and replication slots which I don't
understand well. I plan to investigate this next.

That's not so good, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#111

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

about 8 years ago

In reply to: Robert Haas (#110)

4 attachment(s)

Re: [HACKERS] Proposal for CSN based snapshots

Hi hackers,

Here is a new version of the patch with some improvements, rebased to
117469006b.

Performance on pgbench tpcb with subtransactions is now slightly better
than master. See the picture 'savepoints2'. This was achieved by
removing unnecessary exclusive locking on CSNLogControlLock in
SubTransSetParent. After that change, both versions are mostly waiting
on XidGenLock in GetNewTransactionId.

Performance on default pgbench tpcb is also improved. At scale 500, csn
is at best 30% faster than master, see the picture 'tpcb500'. These
improvements are due to slight optimizations of GetSnapshotData and
refreshing RecentGlobalXmin less often. At scale 1500, csn is slightly
faster at up to 200 clients, but then degrades steadily: see the picture
'tpcb1500'. Nevertheless, CSN-related code paths do not show up in perf
profiles or LWLock wait statistics [1]To collect LWLock wait statistics, I sample pg_stat_activity, and also use a bcc script by Andres Freund: /messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de. I think what we are seeing here
is again that when some bottlenecks are removed, the fast degradation of
LWLocks under contention leads to net drop in performance. With this in
mind, I tried running the same benchmarks with patch from Yura Sokolov
[2]: /messages/by-id/2968c0be065baab8865c4c95de3f435c@postgrespro.ru
with this patch csn starts outperforming master on all numbers of
clients measured, as you can see in the picture 'tpcb1500'. This LWLock
change influences the csn a lot more than master, which also suggests
that we are observing a superlinear degradation of LWLocks under
increasing contention.

After this I plan to improve the comments, since many of them have
become out of date, and work on logical replication.

[1]: To collect LWLock wait statistics, I sample pg_stat_activity, and also use a bcc script by Andres Freund: /messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de
also use a bcc script by Andres Freund:
/messages/by-id/20170622210845.d2hsbqv6rxu2tiye@alap3.anarazel.de

[2]: /messages/by-id/2968c0be065baab8865c4c95de3f435c@postgrespro.ru
/messages/by-id/2968c0be065baab8865c4c95de3f435c@postgrespro.ru

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

savepoints2.pngimage/png; name=savepoints2.pngDownload

tpcb500.pngimage/png; name=tpcb500.pngDownload

tpcb1500.pngimage/png; name=tpcb1500.pngDownload

csn-v8.patchtext/x-patch; name=csn-v8.patchDownload

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14ec8f..6cc25806e6 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -25,6 +25,7 @@
 #include "commands/tablecmds.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
 
@@ -284,7 +285,7 @@ bt_check_every_level(Relation rel, bool readonly)
 	 * RecentGlobalXmin assertion matches index_getnext_tid().  See note on
 	 * RecentGlobalXmin/B-Tree page deletion.
 	 */
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
+	Assert(TransactionIdIsValid(GetRecentGlobalXmin()));
 
 	/*
 	 * Initialize state for entire verification operation
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4dd9d029e6..cbe6bb2ac7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17681,10 +17681,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
    </indexterm>
 
    <indexterm>
-    <primary>txid_snapshot_xip</primary>
-   </indexterm>
-
-   <indexterm>
     <primary>txid_snapshot_xmax</primary>
    </indexterm>
 
@@ -17731,11 +17727,6 @@ SELECT collation for ('foo' COLLATE "de_DE");
        <entry>get current snapshot</entry>
       </row>
       <row>
-       <entry><literal><function>txid_snapshot_xip(<parameter>txid_snapshot</parameter>)</function></literal></entry>
-       <entry><type>setof bigint</type></entry>
-       <entry>get in-progress transaction IDs in snapshot</entry>
-      </row>
-      <row>
        <entry><literal><function>txid_snapshot_xmax(<parameter>txid_snapshot</parameter>)</function></literal></entry>
        <entry><type>bigint</type></entry>
        <entry>get <literal>xmax</literal> of snapshot</entry>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3acef279f4..9e853ec02b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2023,8 +2023,6 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 	if (all_dead)
 		*all_dead = first_call;
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	Assert(ItemPointerGetBlockNumber(tid) == BufferGetBlockNumber(buffer));
 	offnum = ItemPointerGetOffsetNumber(tid);
 	at_chain_start = first_call;
@@ -2123,7 +2121,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
 		 * planner's get_actual_variable_range() function to match.
 		 */
 		if (all_dead && *all_dead &&
-			!HeapTupleIsSurelyDead(heapTuple, RecentGlobalXmin))
+			!HeapTupleIsSurelyDead(heapTuple, GetRecentGlobalXmin()))
 			*all_dead = false;
 
 		/*
@@ -3784,9 +3782,8 @@ l2:
 				update_xact = InvalidTransactionId;
 
 			/*
-			 * There was no UPDATE in the MultiXact; or it aborted. No
-			 * TransactionIdIsInProgress() call needed here, since we called
-			 * MultiXactIdWait() above.
+			 * There was no UPDATE in the MultiXact; or it aborted. It cannot
+			 * be in-progress anymore, since we called MultiXactIdWait() above.
 			 */
 			if (!TransactionIdIsValid(update_xact) ||
 				TransactionIdDidAbort(update_xact))
@@ -5267,7 +5264,7 @@ heap_acquire_tuplock(Relation relation, ItemPointer tid, LockTupleMode mode,
  * either here, or within MultiXactIdExpand.
  *
  * There is a similar race condition possible when the old xmax was a regular
- * TransactionId.  We test TransactionIdIsInProgress again just to narrow the
+ * TransactionId.  We test TransactionIdGetStatus again just to narrow the
  * window, but it's still possible to end up creating an unnecessary
  * MultiXactId.  Fortunately this is harmless.
  */
@@ -5278,6 +5275,7 @@ compute_new_xmax_infomask(TransactionId xmax, uint16 old_infomask,
 						  TransactionId *result_xmax, uint16 *result_infomask,
 						  uint16 *result_infomask2)
 {
+	TransactionIdStatus xidstatus;
 	TransactionId new_xmax;
 	uint16		new_infomask,
 				new_infomask2;
@@ -5413,7 +5411,7 @@ l5:
 		new_xmax = MultiXactIdCreate(xmax, status, add_to_xmax, new_status);
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
-	else if (TransactionIdIsInProgress(xmax))
+	else if ((xidstatus = TransactionIdGetStatus(xmax)) == XID_INPROGRESS)
 	{
 		/*
 		 * If the XMAX is a valid, in-progress TransactionId, then we need to
@@ -5442,8 +5440,9 @@ l5:
 				/*
 				 * LOCK_ONLY can be present alone only when a page has been
 				 * upgraded by pg_upgrade.  But in that case,
-				 * TransactionIdIsInProgress() should have returned false.  We
-				 * assume it's no longer locked in this case.
+				 * TransactionIdGetStatus() should not have returned
+				 * XID_INPROGRESS.  We assume it's no longer locked in this
+				 * case.
 				 */
 				elog(WARNING, "LOCK_ONLY found for Xid in progress %u", xmax);
 				old_infomask |= HEAP_XMAX_INVALID;
@@ -5496,7 +5495,7 @@ l5:
 		GetMultiXactIdHintBits(new_xmax, &new_infomask, &new_infomask2);
 	}
 	else if (!HEAP_XMAX_IS_LOCKED_ONLY(old_infomask) &&
-			 TransactionIdDidCommit(xmax))
+			 xidstatus == XID_COMMITTED)
 	{
 		/*
 		 * It's a committed update, so we gotta preserve him as updater of the
@@ -5525,7 +5524,7 @@ l5:
 		/*
 		 * Can get here iff the locking/updating transaction was running when
 		 * the infomask was extracted from the tuple, but finished before
-		 * TransactionIdIsInProgress got to run.  Deal with it as if there was
+		 * TransactionIdGetStatus got to run.  Deal with it as if there was
 		 * no locker at all in the first place.
 		 */
 		old_infomask |= HEAP_XMAX_INVALID;
@@ -5558,15 +5557,11 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 						   LockTupleMode mode, bool *needwait)
 {
 	MultiXactStatus wantedstatus;
+	TransactionIdStatus xidstatus;
 
 	*needwait = false;
 	wantedstatus = get_mxact_status_for_lock(mode, false);
 
-	/*
-	 * Note: we *must* check TransactionIdIsInProgress before
-	 * TransactionIdDidAbort/Commit; see comment at top of tqual.c for an
-	 * explanation.
-	 */
 	if (TransactionIdIsCurrentTransactionId(xid))
 	{
 		/*
@@ -5576,7 +5571,9 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleSelfUpdated;
 	}
-	else if (TransactionIdIsInProgress(xid))
+	xidstatus = TransactionIdGetStatus(xid);
+
+	if (xidstatus == XID_INPROGRESS)
 	{
 		/*
 		 * If the locking transaction is running, what we do depends on
@@ -5596,37 +5593,34 @@ test_lockmode_for_conflict(MultiXactStatus status, TransactionId xid,
 		 */
 		return HeapTupleMayBeUpdated;
 	}
-	else if (TransactionIdDidAbort(xid))
+	else if (xidstatus == XID_ABORTED)
 		return HeapTupleMayBeUpdated;
-	else if (TransactionIdDidCommit(xid))
-	{
-		/*
-		 * The other transaction committed.  If it was only a locker, then the
-		 * lock is completely gone now and we can return success; but if it
-		 * was an update, then what we do depends on whether the two lock
-		 * modes conflict.  If they conflict, then we must report error to
-		 * caller. But if they don't, we can fall through to allow the current
-		 * transaction to lock the tuple.
-		 *
-		 * Note: the reason we worry about ISUPDATE here is because as soon as
-		 * a transaction ends, all its locks are gone and meaningless, and
-		 * thus we can ignore them; whereas its updates persist.  In the
-		 * TransactionIdIsInProgress case, above, we don't need to check
-		 * because we know the lock is still "alive" and thus a conflict needs
-		 * always be checked.
-		 */
-		if (!ISUPDATE_from_mxstatus(status))
-			return HeapTupleMayBeUpdated;
 
-		if (DoLockModesConflict(LOCKMODE_from_mxstatus(status),
-								LOCKMODE_from_mxstatus(wantedstatus)))
-			/* bummer */
-			return HeapTupleUpdated;
+	/*
+	 * The other transaction committed.  If it was only a locker, then the
+	 * lock is completely gone now and we can return success; but if it
+	 * was an update, then what we do depends on whether the two lock
+	 * modes conflict.  If they conflict, then we must report error to
+	 * caller. But if they don't, we can fall through to allow the current
+	 * transaction to lock the tuple.
+	 *
+	 * Note: the reason we worry about ISUPDATE here is because as soon as
+	 * a transaction ends, all its locks are gone and meaningless, and
+	 * thus we can ignore them; whereas its updates persist.  In the
+	 * XID_INPROGRESS case, above, we don't need to check
+	 * because we know the lock is still "alive" and thus a conflict needs
+	 * always be checked.
+	 */
+	Assert(xidstatus == XID_COMMITTED);
 
+	if (!ISUPDATE_from_mxstatus(status))
 		return HeapTupleMayBeUpdated;
-	}
 
-	/* Not in progress, not aborted, not committed -- must have crashed */
+	if (DoLockModesConflict(LOCKMODE_from_mxstatus(status),
+							LOCKMODE_from_mxstatus(wantedstatus)))
+		/* bummer */
+		return HeapTupleUpdated;
+
 	return HeapTupleMayBeUpdated;
 }
 
@@ -6160,8 +6154,8 @@ heap_abort_speculative(Relation relation, HeapTuple tuple)
 	 * RecentGlobalXmin.  That's not pretty, but it doesn't seem worth
 	 * inventing a nicer API for this.
 	 */
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-	PageSetPrunable(page, RecentGlobalXmin);
+	Assert(TransactionIdIsValid(GetRecentGlobalXmin()));
+	PageSetPrunable(page, GetRecentGlobalXmin());
 
 	/* store transaction information of xact deleting the tuple */
 	tp.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
@@ -6483,6 +6477,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		if (ISUPDATE_from_mxstatus(members[i].status))
 		{
 			TransactionId xid = members[i].xid;
+			TransactionIdStatus xidstatus;
 
 			/*
 			 * It's an update; should we keep it?  If the transaction is known
@@ -6495,13 +6490,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 * TransactionIdIsInProgress before TransactionIdDidCommit,
 			 * because of race conditions explained in detail in tqual.c.
 			 */
-			if (TransactionIdIsCurrentTransactionId(xid) ||
-				TransactionIdIsInProgress(xid))
+			xidstatus = TransactionIdGetStatus(xid);
+			if (xidstatus == XID_INPROGRESS)
 			{
 				Assert(!TransactionIdIsValid(update_xid));
 				update_xid = xid;
 			}
-			else if (TransactionIdDidCommit(xid))
+			else if (xidstatus == XID_COMMITTED)
 			{
 				/*
 				 * The transaction committed, so we can tell caller to set
@@ -6539,8 +6534,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		else
 		{
 			/* We only keep lockers if they are still running */
-			if (TransactionIdIsCurrentTransactionId(members[i].xid) ||
-				TransactionIdIsInProgress(members[i].xid))
+			if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 			{
 				/* running locker cannot possibly be older than the cutoff */
 				Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
@@ -7014,6 +7008,7 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 		{
 			TransactionId memxid;
 			LOCKMODE	memlockmode;
+			TransactionIdStatus	xidstatus;
 
 			memlockmode = LOCKMODE_from_mxstatus(members[i].status);
 
@@ -7026,16 +7021,18 @@ DoesMultiXactIdConflict(MultiXactId multi, uint16 infomask,
 			if (TransactionIdIsCurrentTransactionId(memxid))
 				continue;
 
+			xidstatus = TransactionIdGetStatus(memxid);
+
 			if (ISUPDATE_from_mxstatus(members[i].status))
 			{
 				/* ignore aborted updaters */
-				if (TransactionIdDidAbort(memxid))
+				if (xidstatus == XID_ABORTED)
 					continue;
 			}
 			else
 			{
 				/* ignore lockers-only that are no longer in progress */
-				if (!TransactionIdIsInProgress(memxid))
+				if (xidstatus != XID_INPROGRESS)
 					continue;
 			}
 
@@ -7115,7 +7112,7 @@ Do_MultiXactIdWait(MultiXactId multi, MultiXactStatus status,
 			if (!DoLockModesConflict(LOCKMODE_from_mxstatus(memstatus),
 									 LOCKMODE_from_mxstatus(status)))
 			{
-				if (remaining && TransactionIdIsInProgress(memxid))
+				if (remaining && TransactionIdGetStatus(memxid) == XID_INPROGRESS)
 					remain++;
 				continue;
 			}
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f33e0ce07..0a61804483 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -23,6 +23,7 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 #include "utils/rel.h"
 #include "utils/tqual.h"
@@ -101,10 +102,10 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
 	 */
 	if (IsCatalogRelation(relation) ||
 		RelationIsAccessibleInLogicalDecoding(relation))
-		OldestXmin = RecentGlobalXmin;
+		OldestXmin = GetRecentGlobalXmin();
 	else
 		OldestXmin =
-			TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
+			TransactionIdLimitedForOldSnapshots(GetRecentGlobalDataXmin(),
 												relation);
 
 	Assert(TransactionIdIsValid(OldestXmin));
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index edf4172eb2..ff3ec0dbeb 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -530,8 +530,6 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
 	SCAN_CHECKS;
 	CHECK_SCAN_PROCEDURE(amgettuple);
 
-	Assert(TransactionIdIsValid(RecentGlobalXmin));
-
 	/*
 	 * The AM's amgettuple proc finds the next index entry matching the scan
 	 * keys, and puts the TID into scan->xs_ctup.t_self.  It should also set
diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README
index a3f11da8d5..db92670e68 100644
--- a/src/backend/access/nbtree/README
+++ b/src/backend/access/nbtree/README
@@ -321,6 +321,9 @@ older than RecentGlobalXmin.  As collateral damage, this implementation
 also waits for running XIDs with no snapshots and for snapshots taken
 until the next transaction to allocate an XID commits.
 
+XXX: now that we use CSNs as snapshots, it would be more
+straightforward to use something based on CSNs instead of RecentGlobalXmin.
+
 Reclaiming a page doesn't actually change its state on disk --- we simply
 record it in the shared-memory free space map, from which it will be
 handed out the next time a new page is needed for a page split.  The
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index c77434904e..eba1cc9ee1 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -31,6 +31,7 @@
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 static bool _bt_mark_page_halfdead(Relation rel, Buffer buf, BTStack stack);
@@ -761,7 +762,7 @@ _bt_page_recyclable(Page page)
 	 */
 	opaque = (BTPageOpaque) PageGetSpecialPointer(page);
 	if (P_ISDELETED(opaque) &&
-		TransactionIdPrecedes(opaque->btpo.xact, RecentGlobalXmin))
+		TransactionIdPrecedes(opaque->btpo.xact, GetRecentGlobalXmin()))
 		return true;
 	return false;
 }
diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c
index 278546a728..39dda72361 100644
--- a/src/backend/access/rmgrdesc/standbydesc.c
+++ b/src/backend/access/rmgrdesc/standbydesc.c
@@ -19,21 +19,10 @@
 static void
 standby_desc_running_xacts(StringInfo buf, xl_running_xacts *xlrec)
 {
-	int			i;
-
 	appendStringInfo(buf, "nextXid %u latestCompletedXid %u oldestRunningXid %u",
 					 xlrec->nextXid,
 					 xlrec->latestCompletedXid,
 					 xlrec->oldestRunningXid);
-	if (xlrec->xcnt > 0)
-	{
-		appendStringInfo(buf, "; %d xacts:", xlrec->xcnt);
-		for (i = 0; i < xlrec->xcnt; i++)
-			appendStringInfo(buf, " %u", xlrec->xids[i]);
-	}
-
-	if (xlrec->subxid_overflow)
-		appendStringInfoString(buf, "; subxid ovf");
 }
 
 void
diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 3aafa79e52..ef09f3c86a 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -255,17 +255,6 @@ xact_desc_abort(StringInfo buf, uint8 info, xl_xact_abort *xlrec)
 	}
 }
 
-static void
-xact_desc_assignment(StringInfo buf, xl_xact_assignment *xlrec)
-{
-	int			i;
-
-	appendStringInfoString(buf, "subxacts:");
-
-	for (i = 0; i < xlrec->nsubxacts; i++)
-		appendStringInfo(buf, " %u", xlrec->xsub[i]);
-}
-
 void
 xact_desc(StringInfo buf, XLogReaderState *record)
 {
@@ -285,18 +274,6 @@ xact_desc(StringInfo buf, XLogReaderState *record)
 
 		xact_desc_abort(buf, XLogRecGetInfo(record), xlrec);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) rec;
-
-		/*
-		 * Note that we ignore the WAL record's xid, since we're more
-		 * interested in the top-level xid that issued the record and which
-		 * xids are being reported here.
-		 */
-		appendStringInfo(buf, "xtop %u: ", xlrec->xtop);
-		xact_desc_assignment(buf, xlrec);
-	}
 }
 
 const char *
@@ -321,9 +298,6 @@ xact_identify(uint8 info)
 		case XLOG_XACT_ABORT_PREPARED:
 			id = "ABORT_PREPARED";
 			break;
-		case XLOG_XACT_ASSIGNMENT:
-			id = "ASSIGNMENT";
-			break;
 	}
 
 	return id;
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index d7d5e90ef3..20aed5755f 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -26,6 +26,7 @@
 #include "storage/bufmgr.h"
 #include "storage/indexfsm.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "utils/snapmgr.h"
 
 
@@ -521,7 +522,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
 		dt = (SpGistDeadTuple) PageGetItem(page, PageGetItemId(page, i));
 
 		if (dt->tupstate == SPGIST_REDIRECT &&
-			TransactionIdPrecedes(dt->xid, RecentGlobalXmin))
+			TransactionIdPrecedes(dt->xid, GetRecentGlobalXmin()))
 		{
 			dt->tupstate = SPGIST_PLACEHOLDER;
 			Assert(opaque->nRedirection > 0);
diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 16fbe47269..fea6d28e33 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -12,8 +12,8 @@ subdir = src/backend/access/transam
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = clog.o commit_ts.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
-	subtrans.o timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
+OBJS = clog.o commit_ts.o csnlog.o generic_xlog.o multixact.o parallel.o rmgr.o slru.o \
+	timeline.o transam.o twophase.o twophase_rmgr.o varsup.o \
 	xact.o xlog.o xlogarchive.o xlogfuncs.o \
 	xloginsert.o xlogreader.o xlogutils.o
 
diff --git a/src/backend/access/transam/README b/src/backend/access/transam/README
index ad4083eb6b..b090722560 100644
--- a/src/backend/access/transam/README
+++ b/src/backend/access/transam/README
@@ -244,44 +244,24 @@ transaction Y as committed, then snapshot A must consider transaction Y as
 committed".
 
 What we actually enforce is strict serialization of commits and rollbacks
-with snapshot-taking: we do not allow any transaction to exit the set of
-running transactions while a snapshot is being taken.  (This rule is
-stronger than necessary for consistency, but is relatively simple to
-enforce, and it assists with some other issues as explained below.)  The
-implementation of this is that GetSnapshotData takes the ProcArrayLock in
-shared mode (so that multiple backends can take snapshots in parallel),
-but ProcArrayEndTransaction must take the ProcArrayLock in exclusive mode
-while clearing MyPgXact->xid at transaction end (either commit or abort).
-(To reduce context switching, when multiple transactions commit nearly
-simultaneously, we have one backend take ProcArrayLock and clear the XIDs
-of multiple processes at once.)
-
-ProcArrayEndTransaction also holds the lock while advancing the shared
-latestCompletedXid variable.  This allows GetSnapshotData to use
-latestCompletedXid + 1 as xmax for its snapshot: there can be no
-transaction >= this xid value that the snapshot needs to consider as
-completed.
-
-In short, then, the rule is that no transaction may exit the set of
-currently-running transactions between the time we fetch latestCompletedXid
-and the time we finish building our snapshot.  However, this restriction
-only applies to transactions that have an XID --- read-only transactions
-can end without acquiring ProcArrayLock, since they don't affect anyone
-else's snapshot nor latestCompletedXid.
-
-Transaction start, per se, doesn't have any interlocking with these
-considerations, since we no longer assign an XID immediately at transaction
-start.  But when we do decide to allocate an XID, GetNewTransactionId must
-store the new XID into the shared ProcArray before releasing XidGenLock.
-This ensures that all top-level XIDs <= latestCompletedXid are either
-present in the ProcArray, or not running anymore.  (This guarantee doesn't
-apply to subtransaction XIDs, because of the possibility that there's not
-room for them in the subxid array; instead we guarantee that they are
-present or the overflow flag is set.)  If a backend released XidGenLock
-before storing its XID into MyPgXact, then it would be possible for another
-backend to allocate and commit a later XID, causing latestCompletedXid to
-pass the first backend's XID, before that value became visible in the
-ProcArray.  That would break GetOldestXmin, as discussed below.
+with snapshot-taking. Each commit is assigned a Commit Sequence Number, or
+CSN for short, using a monotonically increasing counter. A snapshot is
+represented by the value of the CSN counter, at the time the snapshot was
+taken. All (committed) transactions with a CSN <= the snapshot's CSN are
+considered as visible to the snapshot.
+
+When checking the visibility of a tuple, we need to look up the CSN
+of the xmin/xmax. For that purpose, we store the CSN of each
+transaction in the Commit Sequence Number log (csnlog).
+
+So, a snapshot is simply a CSN, such that all transactions that committed
+before that CSN are visible, and everything later is still considered as
+in-progress. However, to avoid consulting the csnlog every time the visibilty
+of a tuple is checked, we also record a lower and upper bound of the XIDs
+considered visible by the snapshot, in SnapshotData. When a snapshot is
+taken, xmax is set to the current nextXid value; any transaction that begins
+after the snapshot is surely still running. The xmin is tracked lazily in
+shared memory, by AdvanceRecentGlobalXmin().
 
 We allow GetNewTransactionId to store the XID into MyPgXact->xid (or the
 subxid array) without taking ProcArrayLock.  This was once necessary to
@@ -293,42 +273,29 @@ once, rather than assume they can read it multiple times and get the same
 answer each time.  (Use volatile-qualified pointers when doing this, to
 ensure that the C compiler does exactly what you tell it to.)
 
-Another important activity that uses the shared ProcArray is GetOldestXmin,
-which must determine a lower bound for the oldest xmin of any active MVCC
-snapshot, system-wide.  Each individual backend advertises the smallest
-xmin of its own snapshots in MyPgXact->xmin, or zero if it currently has no
+Another important activity that uses the shared ProcArray is GetOldestSnapshot
+which must determine a lower bound for the oldest of any active MVCC
+snapshots, system-wide.  Each individual backend advertises the earliest
+of its own snapshots in MyPgXact->snapshotcsn, or zero if it currently has no
 live snapshots (eg, if it's between transactions or hasn't yet set a
-snapshot for a new transaction).  GetOldestXmin takes the MIN() of the
-valid xmin fields.  It does this with only shared lock on ProcArrayLock,
-which means there is a potential race condition against other backends
-doing GetSnapshotData concurrently: we must be certain that a concurrent
-backend that is about to set its xmin does not compute an xmin less than
-what GetOldestXmin returns.  We ensure that by including all the active
-XIDs into the MIN() calculation, along with the valid xmins.  The rule that
-transactions can't exit without taking exclusive ProcArrayLock ensures that
-concurrent holders of shared ProcArrayLock will compute the same minimum of
-currently-active XIDs: no xact, in particular not the oldest, can exit
-while we hold shared ProcArrayLock.  So GetOldestXmin's view of the minimum
-active XID will be the same as that of any concurrent GetSnapshotData, and
-so it can't produce an overestimate.  If there is no active transaction at
-all, GetOldestXmin returns latestCompletedXid + 1, which is a lower bound
-for the xmin that might be computed by concurrent or later GetSnapshotData
-calls.  (We know that no XID less than this could be about to appear in
-the ProcArray, because of the XidGenLock interlock discussed above.)
-
-GetSnapshotData also performs an oldest-xmin calculation (which had better
-match GetOldestXmin's) and stores that into RecentGlobalXmin, which is used
-for some tuple age cutoff checks where a fresh call of GetOldestXmin seems
-too expensive.  Note that while it is certain that two concurrent
-executions of GetSnapshotData will compute the same xmin for their own
-snapshots, as argued above, it is not certain that they will arrive at the
-same estimate of RecentGlobalXmin.  This is because we allow XID-less
-transactions to clear their MyPgXact->xmin asynchronously (without taking
-ProcArrayLock), so one execution might see what had been the oldest xmin,
-and another not.  This is OK since RecentGlobalXmin need only be a valid
-lower bound.  As noted above, we are already assuming that fetch/store
-of the xid fields is atomic, so assuming it for xmin as well is no extra
-risk.
+snapshot for a new transaction).  GetOldestSnapshot takes the MIN() of the
+snapshots.
+
+For freezing tuples, vacuum needs to know the oldest XID that is still
+considered running by any active transaction. That is, the oldest XID still
+considered running by the oldest active snapshot, as returned by
+GetOldestSnapshotCSN(). This value is somewhat expensive to calculate, so
+the most recently calculated value is kept in shared memory
+(SharedVariableCache->recentXmin), and is recalculated lazily by
+AdvanceRecentGlobalXmin() function. AdvanceRecentGlobalXmin() first scans
+the proc array, and makes note of the oldest active XID. That XID - 1 will
+become the new xmin. It then waits until all currently active snapshots have
+finished. Any snapshot that begins later will see the xmin as finished, so
+after all the active snapshots have finished, xmin will be visible to
+everyone. However, AdvanceRecentGlobalXmin() does not actually block waiting
+for anything; instead it contains a state machine that advances if possible,
+when AdvanceRecentGlobalXmin() is called. AdvanceRecentGlobalXmin() is
+called periodically by the WAL writer, so that it doesn't get very stale.
 
 
 pg_xact and pg_subtrans
@@ -343,21 +310,10 @@ from disk.  They also allow information to be permanent across server restarts.
 
 pg_xact records the commit status for each transaction that has been assigned
 an XID.  A transaction can be in progress, committed, aborted, or
-"sub-committed".  This last state means that it's a subtransaction that's no
-longer running, but its parent has not updated its state yet.  It is not
-necessary to update a subtransaction's transaction status to subcommit, so we
-can just defer it until main transaction commit.  The main role of marking
-transactions as sub-committed is to provide an atomic commit protocol when
-transaction status is spread across multiple clog pages. As a result, whenever
-transaction status spreads across multiple pages we must use a two-phase commit
-protocol: the first phase is to mark the subtransactions as sub-committed, then
-we mark the top level transaction and all its subtransactions committed (in
-that order).  Thus, subtransactions that have not aborted appear as in-progress
-even when they have already finished, and the subcommit status appears as a
-very short transitory state during main transaction commit.  Subtransaction
-abort is always marked in clog as soon as it occurs.  When the transaction
-status all fit in a single CLOG page, we atomically mark them all as committed
-without bothering with the intermediate sub-commit state.
+"committing". For committed transactions, the clog stores the commit WAL
+record's LSN. This last state means that the transaction is just about to
+write its commit WAL record, or just did so, but it hasn't yet updated the
+clog with the record's LSN.
 
 Savepoints are implemented using subtransactions.  A subtransaction is a
 transaction inside a transaction; its commit or abort status is not only
@@ -370,7 +326,7 @@ transaction.
 The "subtransaction parent" (pg_subtrans) mechanism records, for each
 transaction with an XID, the TransactionId of its parent transaction.  This
 information is stored as soon as the subtransaction is assigned an XID.
-Top-level transactions do not have a parent, so they leave their pg_subtrans
+Top-level transactions do not have a parent, so they leave their pg_csnlog
 entries set to the default value of zero (InvalidTransactionId).
 
 pg_subtrans is used to check whether the transaction in question is still
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index bbf9ce1a3a..c15c242c26 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -33,6 +33,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/xlog.h"
@@ -74,13 +75,6 @@
 	((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)
 
 /*
- * The number of subtransactions below which we consider to apply clog group
- * update optimization.  Testing reveals that the number higher than this can
- * hurt performance.
- */
-#define THRESHOLD_SUBTRANS_CLOG_OPT	5
-
-/*
  * Link to shared-memory data structures for CLOG control
  */
 static SlruCtlData ClogCtlData;
@@ -93,23 +87,23 @@ static bool CLOGPagePrecedes(int page1, int page2);
 static void WriteZeroPageXlogRec(int pageno);
 static void WriteTruncateXlogRec(int pageno, TransactionId oldestXact,
 					 Oid oldestXidDb);
-static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
-						   XLogRecPtr lsn, int pageno,
-						   bool all_xact_same_page);
-static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status,
-						  XLogRecPtr lsn, int slotno);
-static void set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn);
-static bool TransactionGroupUpdateXidStatus(TransactionId xid,
-								XidStatus status, XLogRecPtr lsn, int pageno);
-static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
-								   TransactionId *subxids, XidStatus status,
-								   XLogRecPtr lsn, int pageno);
+static void CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno,
+				  bool all_xacts_same_page);
+static void CLogSetStatusBit(TransactionId xid, CLogXidStatus status,
+				 XLogRecPtr lsn, int slotno);
+static bool CLogGroupUpdateXidStatus(TransactionId xid, int nsubxids,
+						 TransactionId *subxids, CLogXidStatus status,
+						 XLogRecPtr lsn, int pageno);
+static void CLogSetPageStatusInternal(TransactionId xid, int nsubxids,
+						  TransactionId *subxids, CLogXidStatus status,
+						  XLogRecPtr lsn, int pageno);
+
 
 
 /*
- * TransactionIdSetTreeStatus
+ * CLogSetTreeStatus
  *
  * Record the final state of transaction entries in the commit log for
  * a transaction and its subtransaction tree. Take care to ensure this is
@@ -127,30 +121,13 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * caller guarantees the commit record is already flushed in that case.  It
  * should be InvalidXLogRecPtr for abort cases, too.
  *
- * In the commit case, atomicity is limited by whether all the subxids are in
- * the same CLOG page as xid.  If they all are, then the lock will be grabbed
- * only once, and the status will be set to committed directly.  Otherwise
- * we must
- *	 1. set sub-committed all subxids that are not on the same page as the
- *		main xid
- *	 2. atomically set committed the main xid and the subxids on the same page
- *	 3. go over the first bunch again and set them committed
- * Note that as far as concurrent checkers are concerned, main transaction
- * commit as a whole is still atomic.
- *
- * Example:
- *		TransactionId t commits and has subxids t1, t2, t3, t4
- *		t is on page p1, t1 is also on p1, t2 and t3 are on p2, t4 is on p3
- *		1. update pages2-3:
- *					page2: set t2,t3 as sub-committed
- *					page3: set t4 as sub-committed
- *		2. update page1:
- *					set t1 as sub-committed,
- *					then set t as committed,
-					then set t1 as committed
- *		3. update pages2-3:
- *					page2: set t2,t3 as committed
- *					page3: set t4 as committed
+ * The atomicity is limited by whether all the subxids are in the same CLOG
+ * page as xid.  If they all are, then the lock will be grabbed only once,
+ * and the status will be set to committed directly.  Otherwise there is
+ * a window that the parent will be seen as committed, while (some of) the
+ * children are still seen as in-progress. That's OK with the current use,
+ * as visibility checking code will not rely on the CLOG for recent
+ * transactions (CSNLOG will be used instead).
  *
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; functions in transam.c are the intended callers.
@@ -160,153 +137,75 @@ static void TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * cache yet.
  */
 void
-TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
+CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn)
 {
-	int			pageno = TransactionIdToPage(xid);	/* get page of parent */
+	TransactionId topXid;
+	int			pageno;
 	int			i;
+	int			offset;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED);
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	/*
-	 * See how many subxids, if any, are on the same page as the parent, if
-	 * any.
+	 * Update the clog page-by-page. On first iteration, we will set the
+	 * status of the top-XID, and any subtransactions on the same page.
 	 */
-	for (i = 0; i < nsubxids; i++)
-	{
-		if (TransactionIdToPage(subxids[i]) != pageno)
-			break;
-	}
-
-	/*
-	 * Do all items fit on a single page?
-	 */
-	if (i == nsubxids)
-	{
-		/*
-		 * Set the parent and all subtransactions in a single call
-		 */
-		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn,
-								   pageno, true);
-	}
-	else
-	{
-		int			nsubxids_on_first_page = i;
-
-		/*
-		 * If this is a commit then we care about doing this correctly (i.e.
-		 * using the subcommitted intermediate status).  By here, we know
-		 * we're updating more than one page of clog, so we must mark entries
-		 * that are *not* on the first page so that they show as subcommitted
-		 * before we then return to update the status to fully committed.
-		 *
-		 * To avoid touching the first page twice, skip marking subcommitted
-		 * for the subxids on that first page.
-		 */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-			set_status_by_pages(nsubxids - nsubxids_on_first_page,
-								subxids + nsubxids_on_first_page,
-								TRANSACTION_STATUS_SUB_COMMITTED, lsn);
-
-		/*
-		 * Now set the parent and subtransactions on same page as the parent,
-		 * if any
-		 */
-		pageno = TransactionIdToPage(xid);
-		TransactionIdSetPageStatus(xid, nsubxids_on_first_page, subxids, status,
-								   lsn, pageno, false);
-
-		/*
-		 * Now work through the rest of the subxids one clog page at a time,
-		 * starting from the second page onwards, like we did above.
-		 */
-		set_status_by_pages(nsubxids - nsubxids_on_first_page,
-							subxids + nsubxids_on_first_page,
-							status, lsn);
-	}
-}
-
-/*
- * Helper for TransactionIdSetTreeStatus: set the status for a bunch of
- * transactions, chunking in the separate CLOG pages involved. We never
- * pass the whole transaction tree to this function, only subtransactions
- * that are on different pages to the top level transaction id.
- */
-static void
-set_status_by_pages(int nsubxids, TransactionId *subxids,
-					XidStatus status, XLogRecPtr lsn)
-{
-	int			pageno = TransactionIdToPage(subxids[0]);
-	int			offset = 0;
-	int			i = 0;
-
-	Assert(nsubxids > 0);		/* else the pageno fetch above is unsafe */
-
-	while (i < nsubxids)
+	pageno = TransactionIdToPage(xid);	/* get page of parent */
+	topXid = xid;
+	offset = 0;
+	i = 0;
+	for (;;)
 	{
 		int			num_on_page = 0;
-		int			nextpageno;
 
-		do
+		while (i < nsubxids && TransactionIdToPage(subxids[i]) == pageno)
 		{
-			nextpageno = TransactionIdToPage(subxids[i]);
-			if (nextpageno != pageno)
-				break;
 			num_on_page++;
 			i++;
-		} while (i < nsubxids);
+		}
+
+		CLogSetPageStatus(topXid,
+						  num_on_page, subxids + offset,
+						  status, lsn, pageno,
+						  nsubxids == num_on_page);
+
+		if (i == nsubxids)
+			break;
 
-		TransactionIdSetPageStatus(InvalidTransactionId,
-								   num_on_page, subxids + offset,
-								   status, lsn, pageno, false);
 		offset = i;
-		pageno = nextpageno;
+		pageno = TransactionIdToPage(subxids[offset]);
+		topXid = InvalidTransactionId;
 	}
 }
 
 /*
- * Record the final state of transaction entries in the commit log for all
- * entries on a single page.  Atomic only on this page.
+ * Record the final state of transaction entries in the commit log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as CLogSetTreeStatus()
  */
 static void
-TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status,
-						   XLogRecPtr lsn, int pageno,
-						   bool all_xact_same_page)
+CLogSetPageStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status,
+				  XLogRecPtr lsn, int pageno,
+				  bool all_xact_same_page)
 {
-	/* Can't use group update when PGPROC overflows. */
-	StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS,
-					 "group clog threshold less than PGPROC cached subxids");
-
 	/*
 	 * When there is contention on CLogControlLock, we try to group multiple
 	 * updates; a single leader process will perform transaction status
 	 * updates for multiple backends so that the number of times
 	 * CLogControlLock needs to be acquired is reduced.
 	 *
-	 * For this optimization to be safe, the XID in MyPgXact and the subxids
-	 * in MyProc must be the same as the ones for which we're setting the
-	 * status.  Check that this is the case.
-	 *
 	 * For this optimization to be efficient, we shouldn't have too many
 	 * sub-XIDs and all of the XIDs for which we're adjusting clog should be
 	 * on the same page.  Check those conditions, too.
 	 */
 	if (all_xact_same_page && xid == MyPgXact->xid &&
-		nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT &&
-		nsubxids == MyPgXact->nxids &&
-		memcmp(subxids, MyProc->subxids.xids,
-			   nsubxids * sizeof(TransactionId)) == 0)
+		nsubxids <= THRESHOLD_SUBTRANS_CLOG_OPT)
 	{
 		/*
-		 * We don't try to do group update optimization if a process has
-		 * overflowed the subxids array in its PGPROC, since in that case we
-		 * don't have a complete list of XIDs for it.
-		 */
-		Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= PGPROC_MAX_CACHED_SUBXIDS);
-
-		/*
 		 * If we can immediately acquire CLogControlLock, we update the status
 		 * of our own XID and release the lock.  If not, try use group XID
 		 * update.  If that doesn't work out, fall back to waiting for the
@@ -315,12 +214,13 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 		if (LWLockConditionalAcquire(CLogControlLock, LW_EXCLUSIVE))
 		{
 			/* Got the lock without waiting!  Do the update. */
-			TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
-											   lsn, pageno);
+			CLogSetPageStatusInternal(xid, nsubxids, subxids, status,
+									  lsn, pageno);
 			LWLockRelease(CLogControlLock);
 			return;
 		}
-		else if (TransactionGroupUpdateXidStatus(xid, status, lsn, pageno))
+		else if (CLogGroupUpdateXidStatus(xid, nsubxids, subxids, status,
+										  lsn, pageno))
 		{
 			/* Group update mechanism has done the work. */
 			return;
@@ -331,8 +231,8 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
 
 	/* Group update not applicable, or couldn't accept this page number. */
 	LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
-	TransactionIdSetPageStatusInternal(xid, nsubxids, subxids, status,
-									   lsn, pageno);
+	CLogSetPageStatusInternal(xid, nsubxids, subxids, status,
+							  lsn, pageno);
 	LWLockRelease(CLogControlLock);
 }
 
@@ -342,17 +242,15 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids,
  * We don't do any locking here; caller must handle that.
  */
 static void
-TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
-								   TransactionId *subxids, XidStatus status,
-								   XLogRecPtr lsn, int pageno)
+CLogSetPageStatusInternal(TransactionId xid, int nsubxids,
+						  TransactionId *subxids, CLogXidStatus status,
+						  XLogRecPtr lsn, int pageno)
 {
 	int			slotno;
 	int			i;
 
-	Assert(status == TRANSACTION_STATUS_COMMITTED ||
-		   status == TRANSACTION_STATUS_ABORTED ||
-		   (status == TRANSACTION_STATUS_SUB_COMMITTED && !TransactionIdIsValid(xid)));
-	Assert(LWLockHeldByMeInMode(CLogControlLock, LW_EXCLUSIVE));
+	Assert(status == CLOG_XID_STATUS_COMMITTED ||
+		   status == CLOG_XID_STATUS_ABORTED);
 
 	/*
 	 * If we're doing an async commit (ie, lsn is valid), then we must wait
@@ -365,38 +263,15 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
 	 */
 	slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
 
-	/*
-	 * Set the main transaction id, if any.
-	 *
-	 * If we update more than one xid on this page while it is being written
-	 * out, we might find that some of the bits go to disk and others don't.
-	 * If we are updating commits on the page with the top-level xid that
-	 * could break atomicity, so we subcommit the subxids first before we mark
-	 * the top-level commit.
-	 */
+	/* Set the main transaction id, if any. */
 	if (TransactionIdIsValid(xid))
-	{
-		/* Subtransactions first, if needed ... */
-		if (status == TRANSACTION_STATUS_COMMITTED)
-		{
-			for (i = 0; i < nsubxids; i++)
-			{
-				Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-				TransactionIdSetStatusBit(subxids[i],
-										  TRANSACTION_STATUS_SUB_COMMITTED,
-										  lsn, slotno);
-			}
-		}
-
-		/* ... then the main transaction */
-		TransactionIdSetStatusBit(xid, status, lsn, slotno);
-	}
+		CLogSetStatusBit(xid, status, lsn, slotno);
 
 	/* Set the subtransactions */
 	for (i = 0; i < nsubxids; i++)
 	{
 		Assert(ClogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
-		TransactionIdSetStatusBit(subxids[i], status, lsn, slotno);
+		CLogSetStatusBit(subxids[i], status, lsn, slotno);
 	}
 
 	ClogCtl->shared->page_dirty[slotno] = true;
@@ -417,8 +292,9 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
  * number we need to update differs from those processes already waiting.
  */
 static bool
-TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
-								XLogRecPtr lsn, int pageno)
+CLogGroupUpdateXidStatus(TransactionId xid, int nsubxids,
+						 TransactionId *subxids, CLogXidStatus status,
+						 XLogRecPtr lsn, int pageno)
 {
 	volatile PROC_HDR *procglobal = ProcGlobal;
 	PGPROC	   *proc = MyProc;
@@ -437,6 +313,8 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	proc->clogGroupMemberXidStatus = status;
 	proc->clogGroupMemberPage = pageno;
 	proc->clogGroupMemberLsn = lsn;
+	proc->clogGroupNSubxids = nsubxids;
+	memcpy(&proc->clogGroupSubxids[0], subxids, nsubxids * sizeof(TransactionId));
 
 	nextidx = pg_atomic_read_u32(&procglobal->clogGroupFirst);
 
@@ -517,20 +395,13 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
 	while (nextidx != INVALID_PGPROCNO)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[nextidx];
-		PGXACT	   *pgxact = &ProcGlobal->allPgXact[nextidx];
-
-		/*
-		 * Overflowed transactions should not use group XID status update
-		 * mechanism.
-		 */
-		Assert(!pgxact->overflowed);
 
-		TransactionIdSetPageStatusInternal(proc->clogGroupMemberXid,
-										   pgxact->nxids,
-										   proc->subxids.xids,
-										   proc->clogGroupMemberXidStatus,
-										   proc->clogGroupMemberLsn,
-										   proc->clogGroupMemberPage);
+		CLogSetPageStatusInternal(proc->clogGroupMemberXid,
+								  proc->clogGroupNSubxids,
+								  proc->clogGroupSubxids,
+								  proc->clogGroupMemberXidStatus,
+								  proc->clogGroupMemberLsn,
+								  proc->clogGroupMemberPage);
 
 		/* Move to next proc in list. */
 		nextidx = pg_atomic_read_u32(&proc->clogGroupNext);
@@ -569,7 +440,7 @@ TransactionGroupUpdateXidStatus(TransactionId xid, XidStatus status,
  * Must be called with CLogControlLock held
  */
 static void
-TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno)
+CLogSetStatusBit(TransactionId xid, CLogXidStatus status, XLogRecPtr lsn, int slotno)
 {
 	int			byteno = TransactionIdToByte(xid);
 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
@@ -581,22 +452,12 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
 	curval = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
 
 	/*
-	 * When replaying transactions during recovery we still need to perform
-	 * the two phases of subcommit and then commit. However, some transactions
-	 * are already correctly marked, so we just treat those as a no-op which
-	 * allows us to keep the following Assert as restrictive as possible.
-	 */
-	if (InRecovery && status == TRANSACTION_STATUS_SUB_COMMITTED &&
-		curval == TRANSACTION_STATUS_COMMITTED)
-		return;
-
-	/*
 	 * Current state change should be from 0 or subcommitted to target state
 	 * or we should already be there when replaying changes during recovery.
 	 */
 	Assert(curval == 0 ||
-		   (curval == TRANSACTION_STATUS_SUB_COMMITTED &&
-			status != TRANSACTION_STATUS_IN_PROGRESS) ||
+		   (curval == CLOG_XID_STATUS_SUB_COMMITTED &&
+			status != CLOG_XID_STATUS_IN_PROGRESS) ||
 		   curval == status);
 
 	/* note this assumes exclusive access to the clog page */
@@ -637,8 +498,8 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
  * NB: this is a low-level routine and is NOT the preferred entry point
  * for most uses; TransactionLogFetch() in transam.c is the intended caller.
  */
-XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+CLogXidStatus
+CLogGetStatus(TransactionId xid, XLogRecPtr *lsn)
 {
 	int			pageno = TransactionIdToPage(xid);
 	int			byteno = TransactionIdToByte(xid);
@@ -646,7 +507,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
 	int			slotno;
 	int			lsnindex;
 	char	   *byteptr;
-	XidStatus	status;
+	CLogXidStatus status;
 
 	/* lock is acquired by SimpleLruReadPage_ReadOnly */
 
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index 7b7bf2b2bf..1668b00507 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -26,6 +26,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "catalog/pg_type.h"
diff --git a/src/backend/access/transam/csnlog.c b/src/backend/access/transam/csnlog.c
new file mode 100644
index 0000000000..4d3139593a
--- /dev/null
+++ b/src/backend/access/transam/csnlog.c
@@ -0,0 +1,766 @@
+/*-------------------------------------------------------------------------
+ *
+ * csnlog.c
+ *		Tracking Commit-Sequence-Numbers and in-progress subtransactions
+ *
+ * The pg_csnlog manager is a pg_clog-like manager that stores the commit
+ * sequence number, or parent transaction Id, for each transaction.  It is
+ * a fundamental part of MVCC.
+ *
+ * The csnlog serves two purposes:
+ *
+ * 1. While a transaction is in progress, it stores the parent transaction
+ * Id for each in-progress subtransaction. A main transaction has a parent
+ * of InvalidTransactionId, and each subtransaction has its immediate
+ * parent. The tree can easily be walked from child to parent, but not in
+ * the opposite direction.
+ *
+ * 2. After a transaction has committed, it stores the Commit Sequence
+ * Number of the commit.
+ *
+ * We can use the same structure for both, because we don't care about the
+ * parent-child relationships subtransaction after commit.
+ *
+ * This code is based on clog.c, but the robustness requirements
+ * are completely different from pg_clog, because we only need to remember
+ * pg_csnlog information for currently-open and recently committed
+ * transactions.  Thus, there is no need to preserve data over a crash and
+ * restart.
+ *
+ * There are no XLOG interactions since we do not care about preserving
+ * data across crashes.  During database startup, we simply force the
+ * currently-active page of CSNLOG to zeroes.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/backend/access/transam/csnlog.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
+#include "access/slru.h"
+#include "access/subtrans.h"
+#include "access/transam.h"
+#include "miscadmin.h"
+#include "pg_trace.h"
+#include "utils/snapmgr.h"
+
+/*
+ * Defines for CSNLOG page sizes.  A page is the same BLCKSZ as is used
+ * everywhere else in Postgres.
+ *
+ * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
+ * CSNLOG page numbering also wraps around at 0xFFFFFFFF/CSNLOG_XACTS_PER_PAGE,
+ * and CSNLOG segment numbering at
+ * 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
+ * explicit notice of that fact in this module, except when comparing segment
+ * and page numbers in TruncateCSNLOG (see CSNLOGPagePrecedes).
+ */
+
+/* We store the commit LSN for each xid */
+#define CSNLOG_XACTS_PER_PAGE (BLCKSZ / sizeof(CommitSeqNo))
+
+#define TransactionIdToPage(xid)	((xid) / (TransactionId) CSNLOG_XACTS_PER_PAGE)
+#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CSNLOG_XACTS_PER_PAGE)
+
+/* We allocate new log pages in batches */
+#define BATCH_SIZE 128
+
+/*
+ * Link to shared-memory data structures for CLOG control
+ */
+static SlruCtlData CsnlogCtlData;
+
+#define CsnlogCtl (&CsnlogCtlData)
+
+
+static int	ZeroCSNLOGPage(int pageno);
+static bool CSNLOGPagePrecedes(int page1, int page2);
+static void CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+					TransactionId *subxids,
+					CommitSeqNo csn, int pageno);
+static void CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno);
+static CommitSeqNo InternalGetCommitSeqNo(TransactionId xid);
+static CommitSeqNo RecursiveGetCommitSeqNo(TransactionId xid);
+
+/*
+ * CSNLogSetCommitSeqNo
+ *
+ * Record the status and CSN of transaction entries in the commit log for a
+ * transaction and its subtransaction tree. Take care to ensure this is
+ * efficient, and as atomic as possible.
+ *
+ * xid is a single xid to set status for. This will typically be the
+ * top level transactionid for a top level commit or abort. It can
+ * also be a subtransaction when we record transaction aborts.
+ *
+ * subxids is an array of xids of length nsubxids, representing subtransactions
+ * in the tree of xid. In various cases nsubxids may be zero.
+ *
+ * csn is the commit sequence number of the transaction. It should be
+ * InvalidCommitSeqNo for abort cases.
+ *
+ * Note: This doesn't guarantee atomicity. The caller can use the
+ * COMMITSEQNO_COMMITTING special value for that.
+ */
+void
+CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn)
+{
+	int nextSubxid;
+	int topPage;
+	TransactionId topXid;
+	TransactionId oldestActiveXid = pg_atomic_read_u32(
+				&ShmemVariableCache->oldestActiveXid);
+				
+	Assert(!TransactionIdIsNormal(xid)
+		   || TransactionIdPrecedesOrEquals(oldestActiveXid, xid));
+
+	if (csn == InvalidCommitSeqNo || xid == BootstrapTransactionId)
+	{
+		if (IsBootstrapProcessingMode())
+			csn = COMMITSEQNO_FROZEN;
+		else
+			elog(ERROR, "cannot mark transaction committed without CSN");
+	}
+
+	/*
+	 * We set the status of child transaction before the status of parent
+	 * transactions, so that another process can correctly determine the
+	 * resulting status of a child transaction. See RecursiveGetCommitSeqNo().
+	 */
+	topXid = InvalidTransactionId;
+	topPage = TransactionIdToPage(xid);
+	nextSubxid = nsubxids - 1;
+	do
+	{
+		int currentPage = topPage;
+		int subxidsOnPage = 0;
+		for (; nextSubxid >= 0; nextSubxid--)
+		{
+			int subxidPage = TransactionIdToPage(subxids[nextSubxid]);
+
+			if (subxidsOnPage == 0)
+				currentPage = subxidPage;
+
+			if (currentPage != subxidPage)
+				break;
+
+			subxidsOnPage++;
+		}
+
+		if (currentPage == topPage)
+		{
+			Assert(topXid == InvalidTransactionId);
+			topXid = xid;
+		}
+
+		CSNLogSetPageStatus(topXid, subxidsOnPage, subxids + nextSubxid + 1,
+							csn, currentPage);
+	}
+	while (nextSubxid >= 0);
+
+	if (topXid == InvalidTransactionId)
+	{
+		/*
+		 * No subxids were on the same page as the main xid; we have to update
+		 * it separately
+		 */
+		CSNLogSetPageStatus(xid, 0, NULL, csn, topPage);
+	}
+}
+
+/*
+ * Record the final state of transaction entries in the csn log for
+ * all entries on a single page.  Atomic only on this page.
+ *
+ * Otherwise API is same as TransactionIdSetTreeStatus()
+ */
+static void
+CSNLogSetPageStatus(TransactionId xid, int nsubxids,
+					TransactionId *subxids,
+					CommitSeqNo csn, int pageno)
+{
+	int			slotno;
+	int			i;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+	/*
+	 * We set the status of child transaction before the status of parent
+	 * transactions, so that another process can correctly determine the
+	 * resulting status of a child transaction. See RecursiveGetCommitSeqNo().
+	 */
+	for (i = nsubxids - 1; i >= 0; i--)
+	{
+		Assert(CsnlogCtl->shared->page_number[slotno] == TransactionIdToPage(subxids[i]));
+		CSNLogSetCSN(subxids[i], csn, slotno);
+		pg_write_barrier();
+	}
+
+	if (TransactionIdIsValid(xid))
+		CSNLogSetCSN(xid, csn, slotno);
+
+	CsnlogCtl->shared->page_dirty[slotno] = true;
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+
+/*
+ * Record the parent of a subtransaction in the subtrans log.
+ *
+ * In some cases we may need to overwrite an existing value.
+ */
+void
+SubTransSetParent(TransactionId xid, TransactionId parent)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+	CommitSeqNo *ptr;
+	CommitSeqNo newcsn;
+
+	Assert(TransactionIdIsValid(parent));
+	Assert(TransactionIdFollows(xid, parent));
+
+	newcsn = CSN_SUBTRANS_BIT | (uint64) parent;
+
+	/*
+	 * Shared page access is enough to set the subtransaction parent.
+	 * It is set when the subtransaction is assigned an xid,
+	 * and can be read only later, after the subtransaction have modified
+	 * some tuples.
+	 */
+	slotno = SimpleLruReadPage_ReadOnly(CsnlogCtl, pageno, xid);
+	ptr = (CommitSeqNo *) CsnlogCtl->shared->page_buffer[slotno];
+	ptr += entryno;
+
+	/*
+	 * It's possible we'll try to set the parent xid multiple times but we
+	 * shouldn't ever be changing the xid from one valid xid to another valid
+	 * xid, which would corrupt the data structure.
+	 */
+	if (*ptr != newcsn)
+	{
+		Assert(*ptr == COMMITSEQNO_INPROGRESS);
+		*ptr = newcsn;
+		CsnlogCtl->shared->page_dirty[slotno] = true;
+	}
+
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Interrogate the parent of a transaction in the csnlog.
+ */
+TransactionId
+SubTransGetParent(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	csn = InternalGetCommitSeqNo(xid);
+
+	LWLockRelease(CSNLogControlLock);
+
+	if (COMMITSEQNO_IS_SUBTRANS(csn))
+		return (TransactionId) (csn & 0xFFFFFFFF);
+	else
+		return InvalidTransactionId;
+}
+
+/*
+ * SubTransGetTopmostTransaction
+ *
+ * Returns the topmost transaction of the given transaction id.
+ *
+ * Because we cannot look back further than TransactionXmin, it is possible
+ * that this function will lie and return an intermediate subtransaction ID
+ * instead of the true topmost parent ID.  This is OK, because in practice
+ * we only care about detecting whether the topmost parent is still running
+ * or is part of a current snapshot's list of still-running transactions.
+ * Therefore, any XID before TransactionXmin is as good as any other.
+ */
+TransactionId
+SubTransGetTopmostTransaction(TransactionId xid)
+{
+	TransactionId parentXid = xid,
+				previousXid = xid;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	while (TransactionIdIsValid(parentXid))
+	{
+		previousXid = parentXid;
+		if (TransactionIdPrecedes(parentXid, TransactionXmin))
+			break;
+		parentXid = SubTransGetParent(parentXid);
+
+		/*
+		 * By convention the parent xid gets allocated first, so should always
+		 * precede the child xid. Anything else points to a corrupted data
+		 * structure that could lead to an infinite loop, so exit.
+		 */
+		if (!TransactionIdPrecedes(parentXid, previousXid))
+			elog(ERROR, "pg_csnlog contains invalid entry: xid %u points to parent xid %u",
+				 previousXid, parentXid);
+	}
+
+	Assert(TransactionIdIsValid(previousXid));
+
+	return previousXid;
+}
+
+/*
+ * Sets the commit status of a single transaction.
+ *
+ * Must be called with CSNLogControlLock held
+ */
+static void
+CSNLogSetCSN(TransactionId xid, CommitSeqNo csn, int slotno)
+{
+	int			entryno = TransactionIdToPgIndex(xid);
+	CommitSeqNo *ptr;
+
+	ptr = (CommitSeqNo *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+	/*
+	 * Current state change should be from 0 to target state. (Allow setting
+	 * it again to same value.)
+	 */
+	Assert(COMMITSEQNO_IS_INPROGRESS(*ptr) ||
+		   COMMITSEQNO_IS_COMMITTING(*ptr) ||
+		   COMMITSEQNO_IS_SUBTRANS(*ptr) ||
+		   *ptr == csn);
+
+	*ptr = csn;
+}
+
+/*
+ * Interrogate the state of a transaction in the commit log.
+ *
+ * Aside from the actual commit status, this function returns (into *lsn)
+ * an LSN that is late enough to be able to guarantee that if we flush up to
+ * that LSN then we will have flushed the transaction's commit record to disk.
+ * The result is not necessarily the exact LSN of the transaction's commit
+ * record!	For example, for long-past transactions (those whose clog pages
+ * already migrated to disk), we'll return InvalidXLogRecPtr.  Also, because
+ * we group transactions on the same clog page to conserve storage, we might
+ * return the LSN of a later transaction that falls into the same group.
+ *
+ * NB: this is a low-level routine and is NOT the preferred entry point
+ * for most uses; TransactionIdGetCommitSeqNo() in transam.c is the intended caller.
+ */
+CommitSeqNo
+CSNLogGetCommitSeqNo(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	csn = RecursiveGetCommitSeqNo(xid);
+
+	LWLockRelease(CSNLogControlLock);
+
+	return csn;
+}
+
+/* Determine the CSN of a transaction, walking the subtransaction tree if needed */
+static CommitSeqNo
+RecursiveGetCommitSeqNo(TransactionId xid)
+{
+	CommitSeqNo csn;
+
+	csn = InternalGetCommitSeqNo(xid);
+
+	if (COMMITSEQNO_IS_SUBTRANS(csn))
+	{
+		TransactionId parentXid = csn & ~CSN_SUBTRANS_BIT;
+		CommitSeqNo parentCsn = RecursiveGetCommitSeqNo(parentXid);
+
+		Assert(!COMMITSEQNO_IS_SUBTRANS(parentCsn));
+
+		/*
+		 * The parent and child transaction status update is not atomic. We
+		 * must take care not to use the updated parent status with the old
+		 * child status, or else we can wrongly see a committed subtransaction
+		 * as aborted. This happens when the parent is already marked as
+		 * committed and the child is not yet marked.
+		 */
+		pg_read_barrier();
+
+		csn = InternalGetCommitSeqNo(xid);
+
+		if (COMMITSEQNO_IS_SUBTRANS(csn))
+		{
+			if (COMMITSEQNO_IS_ABORTED(parentCsn)
+				|| COMMITSEQNO_IS_COMMITTED(parentCsn))
+			{
+				csn = COMMITSEQNO_ABORTED;
+			}
+			else if (COMMITSEQNO_IS_INPROGRESS(parentCsn))
+				csn = COMMITSEQNO_INPROGRESS;
+			else if (COMMITSEQNO_IS_COMMITTING(parentCsn))
+				csn = COMMITSEQNO_COMMITTING;
+			else
+				Assert(false);
+		}
+	}
+
+	return csn;
+}
+
+/*
+ * Get the raw CSN value.
+ */
+static CommitSeqNo
+InternalGetCommitSeqNo(TransactionId xid)
+{
+	int			pageno = TransactionIdToPage(xid);
+	int			entryno = TransactionIdToPgIndex(xid);
+	int			slotno;
+
+	/* Can't ask about stuff that might not be around anymore */
+	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
+
+	if (!TransactionIdIsNormal(xid))
+	{
+		if (xid == InvalidTransactionId)
+			return COMMITSEQNO_ABORTED;
+		if (xid == FrozenTransactionId || xid == BootstrapTransactionId)
+			return COMMITSEQNO_FROZEN;
+	}
+
+	slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+	return *(CommitSeqNo *) (CsnlogCtl->shared->page_buffer[slotno]
+							 + entryno * sizeof(XLogRecPtr));
+}
+
+/*
+ * Find the next xid that is in progress.
+ * We do not care about the subtransactions, they are accounted for
+ * by their respective top-level transactions.
+ */
+TransactionId
+CSNLogGetNextActiveXid(TransactionId xid,
+					   TransactionId end)
+{
+	Assert(TransactionIdIsValid(TransactionXmin));
+
+	LWLockAcquire(CSNLogControlLock, LW_SHARED);
+
+	for (;;)
+	{
+		int pageno;
+		int slotno;
+		int entryno;
+
+		if (!TransactionIdPrecedes(xid, end))
+			goto end;
+
+		pageno = TransactionIdToPage(xid);
+		slotno = SimpleLruReadPage_ReadOnly_Locked(CsnlogCtl, pageno, xid);
+
+		for (entryno = TransactionIdToPgIndex(xid); entryno < CSNLOG_XACTS_PER_PAGE;
+			 entryno++)
+		{
+			CommitSeqNo csn;
+
+			if (!TransactionIdPrecedes(xid, end))
+				goto end;
+
+			csn = *(XLogRecPtr *) (CsnlogCtl->shared->page_buffer[slotno] + entryno * sizeof(XLogRecPtr));
+
+			if (COMMITSEQNO_IS_INPROGRESS(csn)
+				|| COMMITSEQNO_IS_COMMITTING(csn))
+			{
+				goto end;
+			}
+
+			TransactionIdAdvance(xid);
+		}
+	}
+
+end:
+	LWLockRelease(CSNLogControlLock);
+
+	return xid;
+}
+
+/*
+ * Number of shared CSNLOG buffers.
+ */
+Size
+CSNLOGShmemBuffers(void)
+{
+	return Min(128, Max(BATCH_SIZE, NBuffers / 512));
+}
+
+/*
+ * Initialization of shared memory for CSNLOG
+ */
+Size
+CSNLOGShmemSize(void)
+{
+	return SimpleLruShmemSize(CSNLOGShmemBuffers(), 0);
+}
+
+void
+CSNLOGShmemInit(void)
+{
+	CsnlogCtl->PagePrecedes = CSNLOGPagePrecedes;
+	SimpleLruInit(CsnlogCtl, "CSNLOG Ctl", CSNLOGShmemBuffers(), 0,
+				  CSNLogControlLock, "pg_csnlog", LWTRANCHE_CSNLOG_BUFFERS);
+}
+
+/*
+ * This func must be called ONCE on system install.  It creates
+ * the initial CSNLOG segment.  (The pg_csnlog directory is assumed to
+ * have been created by initdb, and CSNLOGShmemInit must have been
+ * called already.)
+ */
+void
+BootStrapCSNLOG(void)
+{
+	int			slotno;
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Create and zero the first page of the commit log */
+	slotno = ZeroCSNLOGPage(0);
+
+	/* Make sure it's written out */
+	SimpleLruWritePage(CsnlogCtl, slotno);
+	Assert(!CsnlogCtl->shared->page_dirty[slotno]);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Initialize (or reinitialize) a page of CLOG to zeroes.
+ * If writeXlog is TRUE, also emit an XLOG record saying we did this.
+ *
+ * The page is not actually written, just set up in shared memory.
+ * The slot number of the new page is returned.
+ *
+ * Control lock must be held at entry, and will be held at exit.
+ */
+static int
+ZeroCSNLOGPage(int pageno)
+{
+	return SimpleLruZeroPage(CsnlogCtl, pageno);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend startup,
+ * after StartupXLOG has initialized ShmemVariableCache->nextXid.
+ *
+ * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
+ * if there are none.
+ */
+void
+StartupCSNLOG(TransactionId oldestActiveXID)
+{
+	int			startPage;
+	int			endPage;
+
+	/*
+	 * Since we don't expect pg_csnlog to be valid across crashes, we
+	 * initialize the currently-active page(s) to zeroes during startup.
+	 * Whenever we advance into a new page, ExtendCSNLOG will likewise zero
+	 * the new page without regard to whatever was previously on disk.
+	 */
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	startPage = TransactionIdToPage(oldestActiveXID);
+	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
+	endPage = ((endPage + BATCH_SIZE - 1) / BATCH_SIZE) * BATCH_SIZE;
+
+	while (startPage != endPage)
+	{
+		(void) ZeroCSNLOGPage(startPage);
+		startPage++;
+		/* must account for wraparound */
+		if (startPage > TransactionIdToPage(MaxTransactionId))
+			startPage = 0;
+	}
+	(void) ZeroCSNLOGPage(startPage);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * This must be called ONCE during postmaster or standalone-backend shutdown
+ */
+void
+ShutdownCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely as a debugging aid.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(false);
+	SimpleLruFlush(CsnlogCtl, false);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(false);
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ */
+void
+TrimCSNLOG(void)
+{
+	TransactionId xid = ShmemVariableCache->nextXid;
+	int			pageno = TransactionIdToPage(xid);
+
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/*
+	 * Re-Initialize our idea of the latest page number.
+	 */
+	CsnlogCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Zero out the remainder of the current clog page.  Under normal
+	 * circumstances it should be zeroes already, but it seems at least
+	 * theoretically possible that XLOG replay will have settled on a nextXID
+	 * value that is less than the last XID actually used and marked by the
+	 * previous database lifecycle (since subtransaction commit writes clog
+	 * but makes no WAL entry).  Let's just be safe. (We need not worry about
+	 * pages beyond the current one, since those will be zeroed when first
+	 * used.  For the same reason, there is no need to do anything when
+	 * nextXid is exactly at a page boundary; and it's likely that the
+	 * "current" page doesn't exist yet in that case.)
+	 */
+	if (TransactionIdToPgIndex(xid) != 0)
+	{
+		int			entryno = TransactionIdToPgIndex(xid);
+		int			byteno = entryno * sizeof(XLogRecPtr);
+		int			slotno;
+		char	   *byteptr;
+
+		slotno = SimpleLruReadPage(CsnlogCtl, pageno, false, xid);
+
+		byteptr = CsnlogCtl->shared->page_buffer[slotno] + byteno;
+
+		/* Zero the rest of the page */
+		MemSet(byteptr, 0, BLCKSZ - byteno);
+
+		CsnlogCtl->shared->page_dirty[slotno] = true;
+	}
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+/*
+ * Perform a checkpoint --- either during shutdown, or on-the-fly
+ */
+void
+CheckPointCSNLOG(void)
+{
+	/*
+	 * Flush dirty CLOG pages to disk
+	 *
+	 * This is not actually necessary from a correctness point of view. We do
+	 * it merely to improve the odds that writing of dirty pages is done by
+	 * the checkpoint process and not by backends.
+	 */
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_START(true);
+	SimpleLruFlush(CsnlogCtl, true);
+	TRACE_POSTGRESQL_CSNLOG_CHECKPOINT_DONE(true);
+}
+
+
+/*
+ * Make sure that CSNLOG has room for a newly-allocated XID.
+ *
+ * NB: this is called while holding XidGenLock.  We want it to be very fast
+ * most of the time; even when it's not so fast, no actual I/O need happen
+ * unless we're forced to write out a dirty clog or xlog page to make room
+ * in shared memory.
+ */
+void
+ExtendCSNLOG(TransactionId newestXact)
+{
+	int			i;
+	int			pageno;
+
+	/*
+	 * No work except at first XID of a page.  But beware: just after
+	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
+	 */
+	if (TransactionIdToPgIndex(newestXact) != 0 &&
+		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
+		return;
+
+	pageno = TransactionIdToPage(newestXact);
+
+	if (pageno % BATCH_SIZE)
+		return;
+	LWLockAcquire(CSNLogControlLock, LW_EXCLUSIVE);
+
+	/* Zero the page and make an XLOG entry about it */
+	for (i = pageno; i < pageno + BATCH_SIZE; i++)
+		ZeroCSNLOGPage(i);
+
+	LWLockRelease(CSNLogControlLock);
+}
+
+
+/*
+ * Remove all CSNLOG segments before the one holding the passed transaction ID
+ *
+ * This is normally called during checkpoint, with oldestXact being the
+ * oldest TransactionXmin of any running transaction.
+ */
+void
+TruncateCSNLOG(TransactionId oldestXact)
+{
+	int			cutoffPage;
+
+	/*
+	 * The cutoff point is the start of the segment containing oldestXact. We
+	 * pass the *page* containing oldestXact to SimpleLruTruncate.
+	 */
+	cutoffPage = TransactionIdToPage(oldestXact);
+
+	SimpleLruTruncate(CsnlogCtl, cutoffPage);
+}
+
+
+/*
+ * Decide which of two CLOG page numbers is "older" for truncation purposes.
+ *
+ * We need to use comparison of TransactionIds here in order to do the right
+ * thing with wraparound XID arithmetic.  However, if we are asked about
+ * page number zero, we don't want to hand InvalidTransactionId to
+ * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
+ * offset both xids by FirstNormalTransactionId to avoid that.
+ */
+static bool
+CSNLOGPagePrecedes(int page1, int page2)
+{
+	TransactionId xid1;
+	TransactionId xid2;
+
+	xid1 = ((TransactionId) page1) * CSNLOG_XACTS_PER_PAGE;
+	xid1 += FirstNormalTransactionId;
+	xid2 = ((TransactionId) page2) * CSNLOG_XACTS_PER_PAGE;
+	xid2 += FirstNormalTransactionId;
+
+	return TransactionIdPrecedes(xid1, xid2);
+}
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 0fb6bf2f02..5c38da7eda 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -69,6 +69,7 @@
 #include "postgres.h"
 
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -513,9 +514,11 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
 
 	for (i = 0, j = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid) ||
+		TransactionIdStatus xidstatus = TransactionIdGetStatus(members[i].xid);
+
+		if (xidstatus == XID_INPROGRESS ||
 			(ISUPDATE_from_mxstatus(members[i].status) &&
-			 TransactionIdDidCommit(members[i].xid)))
+			 xidstatus == XID_COMMITTED))
 		{
 			newMembers[j].xid = members[i].xid;
 			newMembers[j++].status = members[i].status;
@@ -590,7 +593,7 @@ MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly)
 	 */
 	for (i = 0; i < nmembers; i++)
 	{
-		if (TransactionIdIsInProgress(members[i].xid))
+		if (TransactionIdGetStatus(members[i].xid) == XID_INPROGRESS)
 		{
 			debug_elog4(DEBUG2, "IsRunning: member %d (%u) is running",
 						i, members[i].xid);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 94b6e6612a..960944dc0f 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -57,6 +57,7 @@
 #include "pgstat.h"
 #include "storage/fd.h"
 #include "storage/shmem.h"
+#include "utils/hsearch.h"
 #include "miscadmin.h"
 
 
@@ -81,6 +82,13 @@ typedef struct SlruFlushData
 
 typedef struct SlruFlushData *SlruFlush;
 
+/* An entry of page-to-slot hash map */
+typedef struct PageSlotEntry
+{
+	int page;
+	int slot;
+} PageSlotEntry;
+
 /*
  * Macro to mark a buffer slot "most recently used".  Note multiple evaluation
  * of arguments!
@@ -166,11 +174,24 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 			  LWLock *ctllock, const char *subdir, int tranche_id)
 {
 	SlruShared	shared;
+	char *hashName;
+	HTAB *htab;
 	bool		found;
+	HASHCTL info;
 
 	shared = (SlruShared) ShmemInitStruct(name,
 										  SimpleLruShmemSize(nslots, nlsns),
 										  &found);
+	hashName = psprintf("%s_hash", name);
+
+	MemSet(&info, 0, sizeof(info));
+	info.keysize = sizeof(((PageSlotEntry*)0)->page);
+	info.entrysize = sizeof(PageSlotEntry);
+
+	htab = ShmemInitHash(hashName, nslots, nslots, &info,
+						 HASH_ELEM | HASH_BLOBS | HASH_FIXED_SIZE);
+
+	pfree(hashName);
 
 	if (!IsUnderPostmaster)
 	{
@@ -247,6 +268,7 @@ SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
 	 * assume caller set PagePrecedes.
 	 */
 	ctl->shared = shared;
+	ctl->pageToSlot = htab;
 	ctl->do_fsync = true;		/* default behavior */
 	StrNCpy(ctl->Dir, subdir, sizeof(ctl->Dir));
 }
@@ -264,6 +286,7 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 {
 	SlruShared	shared = ctl->shared;
 	int			slotno;
+	PageSlotEntry *entry = NULL;
 
 	/* Find a suitable buffer slot for the page */
 	slotno = SlruSelectLRUPage(ctl, pageno);
@@ -273,7 +296,16 @@ SimpleLruZeroPage(SlruCtl ctl, int pageno)
 		   shared->page_number[slotno] == pageno);
 
 	/* Mark the slot as containing this page */
+	if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+	{
+		int oldpageno = shared->page_number[slotno];
+		entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+		Assert(entry != NULL);
+	}
+
 	shared->page_number[slotno] = pageno;
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+	entry->slot = slotno;
 	shared->page_status[slotno] = SLRU_PAGE_VALID;
 	shared->page_dirty[slotno] = true;
 	SlruRecentlyUsed(shared, slotno);
@@ -343,8 +375,14 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
 		{
 			/* indeed, the I/O must have failed */
 			if (shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				int oldpageno = shared->page_number[slotno];
+				PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+
+				Assert(entry != NULL);
 				shared->page_status[slotno] = SLRU_PAGE_EMPTY;
-			else				/* write_in_progress */
+			}
+			else	/* write_in_progress */
 			{
 				shared->page_status[slotno] = SLRU_PAGE_VALID;
 				shared->page_dirty[slotno] = true;
@@ -382,6 +420,7 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 	{
 		int			slotno;
 		bool		ok;
+		PageSlotEntry *entry;
 
 		/* See if page already is in memory; if not, pick victim slot */
 		slotno = SlruSelectLRUPage(ctl, pageno);
@@ -413,7 +452,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				!shared->page_dirty[slotno]));
 
 		/* Mark the slot read-busy */
+		if (shared->page_status[slotno] == SLRU_PAGE_VALID)
+		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+		}
+
 		shared->page_number[slotno] = pageno;
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_ENTER, NULL);
+		entry->slot = slotno;
 		shared->page_status[slotno] = SLRU_PAGE_READ_IN_PROGRESS;
 		shared->page_dirty[slotno] = false;
 
@@ -436,7 +484,14 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 			   shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS &&
 			   !shared->page_dirty[slotno]);
 
-		shared->page_status[slotno] = ok ? SLRU_PAGE_VALID : SLRU_PAGE_EMPTY;
+		if (ok)
+			shared->page_status[slotno] = SLRU_PAGE_VALID;
+		else
+		{
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &pageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
+			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
+		}
 
 		LWLockRelease(&shared->buffer_locks[slotno].lock);
 
@@ -450,9 +505,13 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 }
 
 /*
+ * !!! FIXME: rename to SimpleLruReadPage_Shared
+ *
  * Find a page in a shared buffer, reading it in if necessary.
  * The page number must correspond to an already-initialized page.
- * The caller must intend only read-only access to the page.
+ * The caller can dirty the page holding the shared lock, but it
+ * becomes their responsibility to synchronize the access to the
+ * page data.
  *
  * The passed-in xid is used only for error reporting, and may be
  * InvalidTransactionId if no specific xid is associated with the action.
@@ -467,19 +526,22 @@ int
 SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 {
 	SlruShared	shared = ctl->shared;
-	int			slotno;
+	PageSlotEntry *entry = NULL;
+	int slotno;
 
 	/* Try to find the page while holding only shared lock */
 	LWLockAcquire(shared->ControlLock, LW_SHARED);
 
 	/* See if page is already in a buffer */
-	for (slotno = 0; slotno < shared->num_slots; slotno++)
+	entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+	if (entry != NULL)
 	{
-		if (shared->page_number[slotno] == pageno &&
-			shared->page_status[slotno] != SLRU_PAGE_EMPTY &&
-			shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+		slotno = entry->slot;
+		Assert(shared->page_status[slotno] != SLRU_PAGE_EMPTY);
+		if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+			&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
 		{
-			/* See comments for SlruRecentlyUsed macro */
+			Assert(shared->page_number[slotno] == pageno);
 			SlruRecentlyUsed(shared, slotno);
 			return slotno;
 		}
@@ -493,6 +555,44 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
 }
 
 /*
+ * Same as SimpleLruReadPage_ReadOnly, but the shared lock must be held by the caller
+ * and will be held at exit.
+ */
+int
+SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno, TransactionId xid)
+{
+	SlruShared	shared = ctl->shared;
+	int slotno;
+	PageSlotEntry *entry;
+
+	Assert(LWLockHeldByMe(shared->ControlLock));
+
+	for (;;)
+	{
+		/* See if page is already in a buffer */
+		entry = hash_search(ctl->pageToSlot, &pageno, HASH_FIND, NULL);
+		if (entry != NULL)
+		{
+			slotno = entry->slot;
+			if (shared->page_status[slotno] != SLRU_PAGE_EMPTY
+				&& shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS)
+			{
+				Assert(shared->page_number[slotno] == pageno);
+				SlruRecentlyUsed(shared, slotno);
+				return slotno;
+			}
+		}
+
+		/* No luck, so switch to normal exclusive lock and do regular read */
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
+		SimpleLruReadPage(ctl, pageno, true, xid);
+		LWLockRelease(shared->ControlLock);
+		LWLockAcquire(shared->ControlLock, LW_SHARED);
+	}
+}
+
+/*
  * Write a page from a shared buffer, if necessary.
  * Does nothing if the specified slot is not dirty.
  *
@@ -975,9 +1075,9 @@ SlruSelectLRUPage(SlruCtl ctl, int pageno)
 		int			bestvalidslot = 0;	/* keep compiler quiet */
 		int			best_valid_delta = -1;
 		int			best_valid_page_number = 0; /* keep compiler quiet */
-		int			bestinvalidslot = 0;	/* keep compiler quiet */
+		int			bestinvalidslot = 0;		/* keep compiler quiet */
 		int			best_invalid_delta = -1;
-		int			best_invalid_page_number = 0;	/* keep compiler quiet */
+		int			best_invalid_page_number = 0;		/* keep compiler quiet */
 
 		/* See if page already has a buffer assigned */
 		for (slotno = 0; slotno < shared->num_slots; slotno++)
@@ -1213,6 +1313,9 @@ restart:;
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
@@ -1284,6 +1387,9 @@ restart:
 		if (shared->page_status[slotno] == SLRU_PAGE_VALID &&
 			!shared->page_dirty[slotno])
 		{
+			int oldpageno = shared->page_number[slotno];
+			PageSlotEntry *entry = hash_search(ctl->pageToSlot, &oldpageno, HASH_REMOVE, NULL);
+			Assert(entry != NULL);
 			shared->page_status[slotno] = SLRU_PAGE_EMPTY;
 			continue;
 		}
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
deleted file mode 100644
index f640661130..0000000000
--- a/src/backend/access/transam/subtrans.c
+++ /dev/null
@@ -1,394 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * subtrans.c
- *		PostgreSQL subtransaction-log manager
- *
- * The pg_subtrans manager is a pg_xact-like manager that stores the parent
- * transaction Id for each transaction.  It is a fundamental part of the
- * nested transactions implementation.  A main transaction has a parent
- * of InvalidTransactionId, and each subtransaction has its immediate parent.
- * The tree can easily be walked from child to parent, but not in the
- * opposite direction.
- *
- * This code is based on xact.c, but the robustness requirements
- * are completely different from pg_xact, because we only need to remember
- * pg_subtrans information for currently-open transactions.  Thus, there is
- * no need to preserve data over a crash and restart.
- *
- * There are no XLOG interactions since we do not care about preserving
- * data across crashes.  During database startup, we simply force the
- * currently-active page of SUBTRANS to zeroes.
- *
- * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * src/backend/access/transam/subtrans.c
- *
- *-------------------------------------------------------------------------
- */
-#include "postgres.h"
-
-#include "access/slru.h"
-#include "access/subtrans.h"
-#include "access/transam.h"
-#include "pg_trace.h"
-#include "utils/snapmgr.h"
-
-
-/*
- * Defines for SubTrans page sizes.  A page is the same BLCKSZ as is used
- * everywhere else in Postgres.
- *
- * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
- * SubTrans page numbering also wraps around at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE, and segment numbering at
- * 0xFFFFFFFF/SUBTRANS_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
- * explicit notice of that fact in this module, except when comparing segment
- * and page numbers in TruncateSUBTRANS (see SubTransPagePrecedes) and zeroing
- * them in StartupSUBTRANS.
- */
-
-/* We need four bytes per xact */
-#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
-
-#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
-
-
-/*
- * Link to shared-memory data structures for SUBTRANS control
- */
-static SlruCtlData SubTransCtlData;
-
-#define SubTransCtl  (&SubTransCtlData)
-
-
-static int	ZeroSUBTRANSPage(int pageno);
-static bool SubTransPagePrecedes(int page1, int page2);
-
-
-/*
- * Record the parent of a subtransaction in the subtrans log.
- */
-void
-SubTransSetParent(TransactionId xid, TransactionId parent)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-
-	Assert(TransactionIdIsValid(parent));
-	Assert(TransactionIdFollows(xid, parent));
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	/*
-	 * It's possible we'll try to set the parent xid multiple times but we
-	 * shouldn't ever be changing the xid from one valid xid to another valid
-	 * xid, which would corrupt the data structure.
-	 */
-	if (*ptr != parent)
-	{
-		Assert(*ptr == InvalidTransactionId);
-		*ptr = parent;
-		SubTransCtl->shared->page_dirty[slotno] = true;
-	}
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Interrogate the parent of a transaction in the subtrans log.
- */
-TransactionId
-SubTransGetParent(TransactionId xid)
-{
-	int			pageno = TransactionIdToPage(xid);
-	int			entryno = TransactionIdToEntry(xid);
-	int			slotno;
-	TransactionId *ptr;
-	TransactionId parent;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	/* Bootstrap and frozen XIDs have no parent */
-	if (!TransactionIdIsNormal(xid))
-		return InvalidTransactionId;
-
-	/* lock is acquired by SimpleLruReadPage_ReadOnly */
-
-	slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
-	ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
-	ptr += entryno;
-
-	parent = *ptr;
-
-	LWLockRelease(SubtransControlLock);
-
-	return parent;
-}
-
-/*
- * SubTransGetTopmostTransaction
- *
- * Returns the topmost transaction of the given transaction id.
- *
- * Because we cannot look back further than TransactionXmin, it is possible
- * that this function will lie and return an intermediate subtransaction ID
- * instead of the true topmost parent ID.  This is OK, because in practice
- * we only care about detecting whether the topmost parent is still running
- * or is part of a current snapshot's list of still-running transactions.
- * Therefore, any XID before TransactionXmin is as good as any other.
- */
-TransactionId
-SubTransGetTopmostTransaction(TransactionId xid)
-{
-	TransactionId parentXid = xid,
-				previousXid = xid;
-
-	/* Can't ask about stuff that might not be around anymore */
-	Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));
-
-	while (TransactionIdIsValid(parentXid))
-	{
-		previousXid = parentXid;
-		if (TransactionIdPrecedes(parentXid, TransactionXmin))
-			break;
-		parentXid = SubTransGetParent(parentXid);
-
-		/*
-		 * By convention the parent xid gets allocated first, so should always
-		 * precede the child xid. Anything else points to a corrupted data
-		 * structure that could lead to an infinite loop, so exit.
-		 */
-		if (!TransactionIdPrecedes(parentXid, previousXid))
-			elog(ERROR, "pg_subtrans contains invalid entry: xid %u points to parent xid %u",
-				 previousXid, parentXid);
-	}
-
-	Assert(TransactionIdIsValid(previousXid));
-
-	return previousXid;
-}
-
-
-/*
- * Initialization of shared memory for SUBTRANS
- */
-Size
-SUBTRANSShmemSize(void)
-{
-	return SimpleLruShmemSize(NUM_SUBTRANS_BUFFERS, 0);
-}
-
-void
-SUBTRANSShmemInit(void)
-{
-	SubTransCtl->PagePrecedes = SubTransPagePrecedes;
-	SimpleLruInit(SubTransCtl, "subtrans", NUM_SUBTRANS_BUFFERS, 0,
-				  SubtransControlLock, "pg_subtrans",
-				  LWTRANCHE_SUBTRANS_BUFFERS);
-	/* Override default assumption that writes should be fsync'd */
-	SubTransCtl->do_fsync = false;
-}
-
-/*
- * This func must be called ONCE on system install.  It creates
- * the initial SUBTRANS segment.  (The SUBTRANS directory is assumed to
- * have been created by the initdb shell script, and SUBTRANSShmemInit
- * must have been called already.)
- *
- * Note: it's not really necessary to create the initial segment now,
- * since slru.c would create it on first write anyway.  But we may as well
- * do it to be sure the directory is set up correctly.
- */
-void
-BootStrapSUBTRANS(void)
-{
-	int			slotno;
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Create and zero the first page of the subtrans log */
-	slotno = ZeroSUBTRANSPage(0);
-
-	/* Make sure it's written out */
-	SimpleLruWritePage(SubTransCtl, slotno);
-	Assert(!SubTransCtl->shared->page_dirty[slotno]);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * Initialize (or reinitialize) a page of SUBTRANS to zeroes.
- *
- * The page is not actually written, just set up in shared memory.
- * The slot number of the new page is returned.
- *
- * Control lock must be held at entry, and will be held at exit.
- */
-static int
-ZeroSUBTRANSPage(int pageno)
-{
-	return SimpleLruZeroPage(SubTransCtl, pageno);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend startup,
- * after StartupXLOG has initialized ShmemVariableCache->nextXid.
- *
- * oldestActiveXID is the oldest XID of any prepared transaction, or nextXid
- * if there are none.
- */
-void
-StartupSUBTRANS(TransactionId oldestActiveXID)
-{
-	int			startPage;
-	int			endPage;
-
-	/*
-	 * Since we don't expect pg_subtrans to be valid across crashes, we
-	 * initialize the currently-active page(s) to zeroes during startup.
-	 * Whenever we advance into a new page, ExtendSUBTRANS will likewise zero
-	 * the new page without regard to whatever was previously on disk.
-	 */
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	startPage = TransactionIdToPage(oldestActiveXID);
-	endPage = TransactionIdToPage(ShmemVariableCache->nextXid);
-
-	while (startPage != endPage)
-	{
-		(void) ZeroSUBTRANSPage(startPage);
-		startPage++;
-		/* must account for wraparound */
-		if (startPage > TransactionIdToPage(MaxTransactionId))
-			startPage = 0;
-	}
-	(void) ZeroSUBTRANSPage(startPage);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-/*
- * This must be called ONCE during postmaster or standalone-backend shutdown
- */
-void
-ShutdownSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely as a debugging aid.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(false);
-	SimpleLruFlush(SubTransCtl, false);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(false);
-}
-
-/*
- * Perform a checkpoint --- either during shutdown, or on-the-fly
- */
-void
-CheckPointSUBTRANS(void)
-{
-	/*
-	 * Flush dirty SUBTRANS pages to disk
-	 *
-	 * This is not actually necessary from a correctness point of view. We do
-	 * it merely to improve the odds that writing of dirty pages is done by
-	 * the checkpoint process and not by backends.
-	 */
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_START(true);
-	SimpleLruFlush(SubTransCtl, true);
-	TRACE_POSTGRESQL_SUBTRANS_CHECKPOINT_DONE(true);
-}
-
-
-/*
- * Make sure that SUBTRANS has room for a newly-allocated XID.
- *
- * NB: this is called while holding XidGenLock.  We want it to be very fast
- * most of the time; even when it's not so fast, no actual I/O need happen
- * unless we're forced to write out a dirty subtrans page to make room
- * in shared memory.
- */
-void
-ExtendSUBTRANS(TransactionId newestXact)
-{
-	int			pageno;
-
-	/*
-	 * No work except at first XID of a page.  But beware: just after
-	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
-	 */
-	if (TransactionIdToEntry(newestXact) != 0 &&
-		!TransactionIdEquals(newestXact, FirstNormalTransactionId))
-		return;
-
-	pageno = TransactionIdToPage(newestXact);
-
-	LWLockAcquire(SubtransControlLock, LW_EXCLUSIVE);
-
-	/* Zero the page */
-	ZeroSUBTRANSPage(pageno);
-
-	LWLockRelease(SubtransControlLock);
-}
-
-
-/*
- * Remove all SUBTRANS segments before the one holding the passed transaction ID
- *
- * This is normally called during checkpoint, with oldestXact being the
- * oldest TransactionXmin of any running transaction.
- */
-void
-TruncateSUBTRANS(TransactionId oldestXact)
-{
-	int			cutoffPage;
-
-	/*
-	 * The cutoff point is the start of the segment containing oldestXact. We
-	 * pass the *page* containing oldestXact to SimpleLruTruncate.  We step
-	 * back one transaction to avoid passing a cutoff page that hasn't been
-	 * created yet in the rare case that oldestXact would be the first item on
-	 * a page and oldestXact == next XID.  In that case, if we didn't subtract
-	 * one, we'd trigger SimpleLruTruncate's wraparound detection.
-	 */
-	TransactionIdRetreat(oldestXact);
-	cutoffPage = TransactionIdToPage(oldestXact);
-
-	SimpleLruTruncate(SubTransCtl, cutoffPage);
-}
-
-
-/*
- * Decide which of two SUBTRANS page numbers is "older" for truncation purposes.
- *
- * We need to use comparison of TransactionIds here in order to do the right
- * thing with wraparound XID arithmetic.  However, if we are asked about
- * page number zero, we don't want to hand InvalidTransactionId to
- * TransactionIdPrecedes: it'll get weird about permanent xact IDs.  So,
- * offset both xids by FirstNormalTransactionId to avoid that.
- */
-static bool
-SubTransPagePrecedes(int page1, int page2)
-{
-	TransactionId xid1;
-	TransactionId xid2;
-
-	xid1 = ((TransactionId) page1) * SUBTRANS_XACTS_PER_PAGE;
-	xid1 += FirstNormalTransactionId;
-	xid2 = ((TransactionId) page2) * SUBTRANS_XACTS_PER_PAGE;
-	xid2 += FirstNormalTransactionId;
-
-	return TransactionIdPrecedes(xid1, xid2);
-}
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index 968b232364..e2dd957693 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -3,6 +3,15 @@
  * transam.c
  *	  postgres transaction (commit) log interface routines
  *
+ * This module contains high level functions for managing the status
+ * of transactions. It sits on top of two lower level structures: the
+ * CLOG, and the CSNLOG. The CLOG is a permanent on-disk structure that
+ * tracks the committed/aborted status for each transaction ID. The CSNLOG
+ * tracks *when* each transaction ID committed (or aborted). The CSNLOG
+ * is used when checking the status of recent transactions that might still
+ * be in-progress, and it is reset at server startup. The CLOG is used for
+ * older transactions that are known to have completed (or crashed).
+ *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
@@ -10,56 +19,49 @@
  * IDENTIFICATION
  *	  src/backend/access/transam/transam.c
  *
- * NOTES
- *	  This file contains the high level access-method interface to the
- *	  transaction system.
- *
  *-------------------------------------------------------------------------
  */
 
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
+#include "storage/lmgr.h"
 #include "utils/snapmgr.h"
 
 /*
- * Single-item cache for results of TransactionLogFetch.  It's worth having
+ * Single-item cache for results of TransactionIdGetCommitSeqNo.  It's worth
+ * having
  * such a cache because we frequently find ourselves repeatedly checking the
  * same XID, for example when scanning a table just after a bulk insert,
  * update, or delete.
  */
 static TransactionId cachedFetchXid = InvalidTransactionId;
-static XidStatus cachedFetchXidStatus;
-static XLogRecPtr cachedCommitLSN;
+static CommitSeqNo cachedCSN;
 
-/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
-
-
-/* ----------------------------------------------------------------
- *		Postgres log access method interface
- *
- *		TransactionLogFetch
- * ----------------------------------------------------------------
+/*
+ * Also have a (separate) cache for CLogGetCommitLSN()
  */
+static TransactionId cachedLSNFetchXid = InvalidTransactionId;
+static XLogRecPtr cachedCommitLSN;
 
 /*
- * TransactionLogFetch --- fetch commit status of specified transaction id
+ * TransactionIdGetCommitSeqNo --- fetch CSN of specified transaction id
  */
-static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+CommitSeqNo
+TransactionIdGetCommitSeqNo(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
-	XLogRecPtr	xidlsn;
+	CommitSeqNo	csn;
 
 	/*
 	 * Before going to the commit log manager, check our single item cache to
 	 * see if we didn't just check the transaction status a moment ago.
 	 */
 	if (TransactionIdEquals(transactionId, cachedFetchXid))
-		return cachedFetchXidStatus;
+		return cachedCSN;
 
 	/*
 	 * Also, check to see if the transaction ID is a permanent one.
@@ -67,53 +69,63 @@ TransactionLogFetch(TransactionId transactionId)
 	if (!TransactionIdIsNormal(transactionId))
 	{
 		if (TransactionIdEquals(transactionId, BootstrapTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
+			return COMMITSEQNO_FROZEN;
 		if (TransactionIdEquals(transactionId, FrozenTransactionId))
-			return TRANSACTION_STATUS_COMMITTED;
-		return TRANSACTION_STATUS_ABORTED;
+			return COMMITSEQNO_FROZEN;
+		return COMMITSEQNO_ABORTED;
 	}
 
 	/*
-	 * Get the transaction status.
+	 * If the XID is older than TransactionXmin, check the clog. Otherwise
+	 * check the csnlog.
 	 */
-	xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+	Assert(TransactionIdIsValid(TransactionXmin));
+	if (TransactionIdPrecedes(transactionId, TransactionXmin))
+	{
+		XLogRecPtr lsn;
+
+		if (CLogGetStatus(transactionId, &lsn) == CLOG_XID_STATUS_COMMITTED)
+			csn = COMMITSEQNO_FROZEN;
+		else
+			csn = COMMITSEQNO_ABORTED;
+	}
+	else
+	{
+		csn = CSNLogGetCommitSeqNo(transactionId);
+
+		if (csn == COMMITSEQNO_COMMITTING)
+		{
+			/*
+			 * If the transaction is committing at this very instant, and
+			 * hasn't set its CSN yet, wait for it to finish doing so.
+			 *
+			 * XXX: Alternatively, we could wait on the heavy-weight lock on
+			 * the XID. that'd make TransactionIdCommitTree() slightly
+			 * cheaper, as it wouldn't need to acquire CommitSeqNoLock (even
+			 * in shared mode).
+			 */
+			LWLockAcquire(CommitSeqNoLock, LW_EXCLUSIVE);
+			LWLockRelease(CommitSeqNoLock);
+
+			csn = CSNLogGetCommitSeqNo(transactionId);
+			Assert(csn != COMMITSEQNO_COMMITTING);
+		}
+	}
 
 	/*
-	 * Cache it, but DO NOT cache status for unfinished or sub-committed
-	 * transactions!  We only cache status that is guaranteed not to change.
+	 * Cache it, but DO NOT cache status for unfinished transactions!
+	 * We only cache status that is guaranteed not to change.
 	 */
-	if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
-		xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn) ||
+		COMMITSEQNO_IS_ABORTED(csn))
 	{
 		cachedFetchXid = transactionId;
-		cachedFetchXidStatus = xidstatus;
-		cachedCommitLSN = xidlsn;
+		cachedCSN = csn;
 	}
 
-	return xidstatus;
+	return csn;
 }
 
-/* ----------------------------------------------------------------
- *						Interface functions
- *
- *		TransactionIdDidCommit
- *		TransactionIdDidAbort
- *		========
- *		   these functions test the transaction status of
- *		   a specified transaction id.
- *
- *		TransactionIdCommitTree
- *		TransactionIdAsyncCommitTree
- *		TransactionIdAbortTree
- *		========
- *		   these functions set the transaction status of the specified
- *		   transaction tree.
- *
- * See also TransactionIdIsInProgress, which once was in this module
- * but now lives in procarray.c.
- * ----------------------------------------------------------------
- */
-
 /*
  * TransactionIdDidCommit
  *		True iff transaction associated with the identifier did commit.
@@ -124,50 +136,14 @@ TransactionLogFetch(TransactionId transactionId)
 bool							/* true if given transaction committed */
 TransactionIdDidCommit(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked committed, it's committed.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_COMMITTED)
+	if (COMMITSEQNO_IS_COMMITTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 *
-	 * Originally we Assert'ed that the result of SubTransGetParent was not
-	 * zero. However with the introduction of prepared transactions, there can
-	 * be a window just after database startup where we do not have complete
-	 * knowledge in pg_subtrans of the transactions after TransactionXmin.
-	 * StartupSUBTRANS() has ensured that any missing information will be
-	 * zeroed.  Since this case should not happen under normal conditions, it
-	 * seems reasonable to emit a WARNING for it.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return false;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return false;
-		}
-		return TransactionIdDidCommit(parentXid);
-	}
-
-	/*
-	 * It's not committed.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
@@ -180,70 +156,35 @@ TransactionIdDidCommit(TransactionId transactionId)
 bool							/* true if given transaction aborted */
 TransactionIdDidAbort(TransactionId transactionId)
 {
-	XidStatus	xidstatus;
+	CommitSeqNo csn;
 
-	xidstatus = TransactionLogFetch(transactionId);
+	csn = TransactionIdGetCommitSeqNo(transactionId);
 
-	/*
-	 * If it's marked aborted, it's aborted.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_ABORTED)
+	if (COMMITSEQNO_IS_ABORTED(csn))
 		return true;
-
-	/*
-	 * If it's marked subcommitted, we have to check the parent recursively.
-	 * However, if it's older than TransactionXmin, we can't look at
-	 * pg_subtrans; instead assume that the parent crashed without cleaning up
-	 * its children.
-	 */
-	if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
-	{
-		TransactionId parentXid;
-
-		if (TransactionIdPrecedes(transactionId, TransactionXmin))
-			return true;
-		parentXid = SubTransGetParent(transactionId);
-		if (!TransactionIdIsValid(parentXid))
-		{
-			/* see notes in TransactionIdDidCommit */
-			elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
-				 transactionId);
-			return true;
-		}
-		return TransactionIdDidAbort(parentXid);
-	}
-
-	/*
-	 * It's not aborted.
-	 */
-	return false;
+	else
+		return false;
 }
 
 /*
- * TransactionIdIsKnownCompleted
- *		True iff transaction associated with the identifier is currently
- *		known to have either committed or aborted.
+ * Returns the status of the tranaction.
  *
- * This does NOT look into pg_xact but merely probes our local cache
- * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated tqual.c checks for the same XID.  If this isn't extremely fast
- * then it will be counterproductive.
- *
- * Note:
- *		Assumes transaction identifier is valid.
+ * Note that this treats a a crashed transaction as still in-progress,
+ * until it falls off the xmin horizon.
  */
-bool
-TransactionIdIsKnownCompleted(TransactionId transactionId)
+TransactionIdStatus
+TransactionIdGetStatus(TransactionId xid)
 {
-	if (TransactionIdEquals(transactionId, cachedFetchXid))
-	{
-		/* If it's in the cache at all, it must be completed. */
-		return true;
-	}
+	CommitSeqNo csn;
+
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-	return false;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return XID_COMMITTED;
+	else if (COMMITSEQNO_IS_ABORTED(csn))
+		return XID_ABORTED;
+	else
+		return XID_INPROGRESS;
 }
 
 /*
@@ -252,28 +193,82 @@ TransactionIdIsKnownCompleted(TransactionId transactionId)
  *
  * "xid" is a toplevel transaction commit, and the xids array contains its
  * committed subtransactions.
- *
- * This commit operation is not guaranteed to be atomic, but if not, subxids
- * are correctly marked subcommit first.
  */
 void
 TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED,
-							   InvalidXLogRecPtr);
+	TransactionIdAsyncCommitTree(xid, nxids, xids, InvalidXLogRecPtr);
 }
 
 /*
  * TransactionIdAsyncCommitTree
- *		Same as above, but for async commits.  The commit record LSN is needed.
+ *		Same as above, but for async commits.
+ *
+ * "xid" is a toplevel transaction commit, and the xids array contains its
+ * committed subtransactions.
  */
 void
 TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 							 XLogRecPtr lsn)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_COMMITTED, lsn);
+	CommitSeqNo csn;
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+	/*
+	 * First update the clog, then CSN log.
+	 * oldestActiveXid advances based on CSN log content (see
+	 * AdvanceOldestActiveXid), and it should not become greater than
+	 * our xid before we set the clog status.
+	 * Otherwise other transactions could see us as aborted for some time
+	 * after we have written to CSN log, and somebody advanced the oldest
+	 * active xid past our xid, but before we write to clog.
+	 */
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_COMMITTED,
+					  lsn);
+
+	/*
+	 * Grab the CommitSeqNoLock, in shared mode. This is only used to
+	 * provide a way for a concurrent transaction to wait for us to
+	 * complete (see TransactionIdGetCommitSeqNo()).
+	 *
+	 * XXX: We could reduce the time the lock is held, by only setting
+	 * the CSN on the top-XID while holding the lock, and updating the
+	 * sub-XIDs later. But it doesn't matter much, because we're only
+	 * holding it in shared mode, and it's rare for it to be acquired
+	 * in exclusive mode.
+	 */
+	LWLockAcquire(CommitSeqNoLock, LW_SHARED);
+
+	/*
+	 * First update latestCompletedXid to cover this xid. We do this before
+	 * assigning a CSN, so that if someone acquires a new snapshot at the same
+	 * time, the xmax it computes is sure to cover our XID.
+	 */
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	/*
+	 * Mark our top transaction id as commit-in-progress.
+	 */
+	CSNLogSetCommitSeqNo(xid, 0, NULL, COMMITSEQNO_COMMITTING);
+
+	/* Get our CSN and increment */
+	csn = pg_atomic_fetch_add_u64(&ShmemVariableCache->nextCommitSeqNo, 1);
+	Assert(csn >= COMMITSEQNO_FIRST_NORMAL);
+
+	/* Stamp this XID (and sub-XIDs) with the CSN */
+	CSNLogSetCommitSeqNo(xid, nxids, xids, csn);
+
+	LWLockRelease(CommitSeqNoLock);
 }
 
 /*
@@ -289,8 +284,23 @@ TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids,
 void
 TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids)
 {
-	TransactionIdSetTreeStatus(xid, nxids, xids,
-							   TRANSACTION_STATUS_ABORTED, InvalidXLogRecPtr);
+	TransactionId latestXid;
+	TransactionId currentLatestCompletedXid;
+
+	latestXid = TransactionIdLatest(xid, nxids, xids);
+
+	currentLatestCompletedXid = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
+	while (TransactionIdFollows(latestXid, currentLatestCompletedXid))
+	{
+		if (pg_atomic_compare_exchange_u32(&ShmemVariableCache->latestCompletedXid,
+										   &currentLatestCompletedXid,
+										   latestXid))
+			break;
+	}
+
+	CSNLogSetCommitSeqNo(xid, nxids, xids, COMMITSEQNO_ABORTED);
+	CLogSetTreeStatus(xid, nxids, xids,
+					  CLOG_XID_STATUS_ABORTED, InvalidXLogRecPtr);
 }
 
 /*
@@ -409,7 +419,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	 * checking TransactionLogFetch's cache will usually succeed and avoid an
 	 * extra trip to shared memory.
 	 */
-	if (TransactionIdEquals(xid, cachedFetchXid))
+	if (TransactionIdEquals(xid, cachedLSNFetchXid))
 		return cachedCommitLSN;
 
 	/* Special XIDs are always known committed */
@@ -419,7 +429,10 @@ TransactionIdGetCommitLSN(TransactionId xid)
 	/*
 	 * Get the transaction status.
 	 */
-	(void) TransactionIdGetStatus(xid, &result);
+	(void) CLogGetStatus(xid, &result);
+
+	cachedLSNFetchXid = xid;
+	cachedCommitLSN = result;
 
 	return result;
 }
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index b715152e8d..2de3a943ec 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -22,7 +22,7 @@
  *		transaction in prepared state with the same GID.
  *
  *		A global transaction (gxact) also has dummy PGXACT and PGPROC; this is
- *		what keeps the XID considered running by TransactionIdIsInProgress.
+ *		what keeps the XID considered running by the functions in procarray.c.
  *		It is also convenient as a PGPROC to hook the gxact's locks to.
  *
  *		Information to recover prepared transactions in case of crash is
@@ -78,6 +78,7 @@
 
 #include "access/commit_ts.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/twophase.h"
@@ -467,6 +468,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->lxid = (LocalTransactionId) xid;
 	pgxact->xid = xid;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	pgxact->delayChkpt = false;
 	pgxact->vacuumFlags = 0;
 	proc->pid = 0;
@@ -480,9 +482,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->waitProcLock = NULL;
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
 		SHMQueueInit(&(proc->myProcLocks[i]));
-	/* subxid data must be filled later by GXactLoadSubxactData */
-	pgxact->overflowed = false;
-	pgxact->nxids = 0;
 
 	gxact->prepared_at = prepared_at;
 	gxact->xid = xid;
@@ -500,34 +499,6 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 }
 
 /*
- * GXactLoadSubxactData
- *
- * If the transaction being persisted had any subtransactions, this must
- * be called before MarkAsPrepared() to load information into the dummy
- * PGPROC.
- */
-static void
-GXactLoadSubxactData(GlobalTransaction gxact, int nsubxacts,
-					 TransactionId *children)
-{
-	PGPROC	   *proc = &ProcGlobal->allProcs[gxact->pgprocno];
-	PGXACT	   *pgxact = &ProcGlobal->allPgXact[gxact->pgprocno];
-
-	/* We need no extra lock since the GXACT isn't valid yet */
-	if (nsubxacts > PGPROC_MAX_CACHED_SUBXIDS)
-	{
-		pgxact->overflowed = true;
-		nsubxacts = PGPROC_MAX_CACHED_SUBXIDS;
-	}
-	if (nsubxacts > 0)
-	{
-		memcpy(proc->subxids.xids, children,
-			   nsubxacts * sizeof(TransactionId));
-		pgxact->nxids = nsubxacts;
-	}
-}
-
-/*
  * MarkAsPrepared
  *		Mark the GXACT as fully valid, and enter it into the global ProcArray.
  *
@@ -545,7 +516,7 @@ MarkAsPrepared(GlobalTransaction gxact, bool lock_held)
 		LWLockRelease(TwoPhaseStateLock);
 
 	/*
-	 * Put it into the global ProcArray so TransactionIdIsInProgress considers
+	 * Put it into the global ProcArray so GetOldestActiveTransactionId() considers
 	 * the XID as still running.
 	 */
 	ProcArrayAdd(&ProcGlobal->allProcs[gxact->pgprocno]);
@@ -1036,8 +1007,6 @@ StartPrepare(GlobalTransaction gxact)
 	if (hdr.nsubxacts > 0)
 	{
 		save_state_data(children, hdr.nsubxacts * sizeof(TransactionId));
-		/* While we have the child-xact data, stuff it in the gxact too */
-		GXactLoadSubxactData(gxact, hdr.nsubxacts, children);
 	}
 	if (hdr.ncommitrels > 0)
 	{
@@ -1123,7 +1092,7 @@ EndPrepare(GlobalTransaction gxact)
 	 * NB: a side effect of this is to make a dummy ProcArray entry for the
 	 * prepared XID.  This must happen before we clear the XID from MyPgXact,
 	 * else there is a window where the XID is not running according to
-	 * TransactionIdIsInProgress, and onlookers would be entitled to assume
+	 * GetOldestActiveTransactionId, and onlookers would be entitled to assume
 	 * the xact crashed.  Instead we have a window where the same XID appears
 	 * twice in ProcArray, which is OK.
 	 */
@@ -1374,7 +1343,6 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	char	   *buf;
 	char	   *bufptr;
 	TwoPhaseFileHeader *hdr;
-	TransactionId latestXid;
 	TransactionId *children;
 	RelFileNode *commitrels;
 	RelFileNode *abortrels;
@@ -1419,14 +1387,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	invalmsgs = (SharedInvalidationMessage *) bufptr;
 	bufptr += MAXALIGN(hdr->ninvalmsgs * sizeof(SharedInvalidationMessage));
 
-	/* compute latestXid among all children */
-	latestXid = TransactionIdLatest(xid, hdr->nsubxacts, children);
-
 	/*
 	 * The order of operations here is critical: make the XLOG entry for
 	 * commit or abort, then mark the transaction committed or aborted in
 	 * pg_xact, then remove its PGPROC from the global ProcArray (which means
-	 * TransactionIdIsInProgress will stop saying the prepared xact is in
+	 * GetOldestActiveTransactionId() will stop saying the prepared xact is in
 	 * progress), then run the post-commit or post-abort callbacks. The
 	 * callbacks will release the locks the transaction held.
 	 */
@@ -1441,7 +1406,7 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 									   hdr->nsubxacts, children,
 									   hdr->nabortrels, abortrels);
 
-	ProcArrayRemove(proc, latestXid);
+	ProcArrayRemove(proc);
 
 	/*
 	 * In case we fail while running the callbacks, mark the gxact invalid so
@@ -1926,17 +1891,17 @@ RecoverPreparedTransactions(void)
 		xid = gxact->xid;
 
 		/*
-		 * Reconstruct subtrans state for the transaction --- needed because
-		 * pg_subtrans is not preserved over a restart.  Note that we are
-		 * linking all the subtransactions directly to the top-level XID;
-		 * there may originally have been a more complex hierarchy, but
-		 * there's no need to restore that exactly. It's possible that
-		 * SubTransSetParent has been set before, if the prepared transaction
-		 * generated xid assignment records.
+		 * Reconstruct subtrans state for the transaction --- needed
+		 * because pg_csnlog is not preserved over a restart.  Note that
+		 * we are linking all the subtransactions directly to the
+		 * top-level XID; there may originally have been a more complex
+		 * hierarchy, but there's no need to restore that exactly.
+		 * It's possible that SubTransSetParent has been set before, if
+		 * the prepared transaction generated xid assignment records.
 		 */
 		buf = ProcessTwoPhaseBuffer(xid,
-									gxact->prepare_start_lsn,
-									gxact->ondisk, true, false);
+				gxact->prepare_start_lsn,
+				gxact->ondisk, true, false);
 		if (buf == NULL)
 			continue;
 
@@ -1965,7 +1930,6 @@ RecoverPreparedTransactions(void)
 		/* recovered, so reset the flag for entries generated by redo */
 		gxact->inredo = false;
 
-		GXactLoadSubxactData(gxact, hdr->nsubxacts, subxids);
 		MarkAsPrepared(gxact, true);
 
 		LWLockRelease(TwoPhaseStateLock);
@@ -2026,7 +1990,7 @@ ProcessTwoPhaseBuffer(TransactionId xid,
 		Assert(prepare_start_lsn != InvalidXLogRecPtr);
 
 	/* Already processed? */
-	if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
+	if (TransactionIdGetStatus(xid) != XID_INPROGRESS)
 	{
 		if (fromdisk)
 		{
@@ -2225,7 +2189,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
 	/* Flush XLOG to disk */
 	XLogFlush(recptr);
 
-	/* Mark the transaction committed in pg_xact */
+	/* Mark the transaction committed in pg_xact and pg_csnlog */
 	TransactionIdCommitTree(xid, nchildren, children);
 
 	/* Checkpoint can proceed now */
@@ -2263,7 +2227,7 @@ RecordTransactionAbortPrepared(TransactionId xid,
 	 * Catch the scenario where we aborted partway through
 	 * RecordTransactionCommitPrepared ...
 	 */
-	if (TransactionIdDidCommit(xid))
+	if (TransactionIdGetStatus(xid) == XID_COMMITTED)
 		elog(PANIC, "cannot abort transaction %u, it was already committed",
 			 xid);
 
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 702c8c957f..f7ce30273c 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -15,6 +15,8 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
 #include "access/xact.h"
@@ -169,8 +171,8 @@ GetNewTransactionId(bool isSubXact)
 	 * Extend pg_subtrans and pg_commit_ts too.
 	 */
 	ExtendCLOG(xid);
+	ExtendCSNLOG(xid);
 	ExtendCommitTs(xid);
-	ExtendSUBTRANS(xid);
 
 	/*
 	 * Now advance the nextXid counter.  This must not happen until after we
@@ -200,17 +202,8 @@ GetNewTransactionId(bool isSubXact)
 	 * A solution to the atomic-store problem would be to give each PGXACT its
 	 * own spinlock used only for fetching/storing that PGXACT's xid and
 	 * related fields.
-	 *
-	 * If there's no room to fit a subtransaction XID into PGPROC, set the
-	 * cache-overflowed flag instead.  This forces readers to look in
-	 * pg_subtrans to map subtransaction XIDs up to top-level XIDs. There is a
-	 * race-condition window, in that the new XID will not appear as running
-	 * until its parent link has been placed into pg_subtrans. However, that
-	 * will happen before anyone could possibly have a reason to inquire about
-	 * the status of the XID, so it seems OK.  (Snapshots taken during this
-	 * window *will* include the parent XID, so they will deliver the correct
-	 * answer later on when someone does have a reason to inquire.)
 	 */
+	if (!isSubXact)
 	{
 		/*
 		 * Use volatile pointer to prevent code rearrangement; other backends
@@ -219,23 +212,9 @@ GetNewTransactionId(bool isSubXact)
 		 * nxids before filling the array entry.  Note we are assuming that
 		 * TransactionId and int fetch/store are atomic.
 		 */
-		volatile PGPROC *myproc = MyProc;
 		volatile PGXACT *mypgxact = MyPgXact;
 
-		if (!isSubXact)
-			mypgxact->xid = xid;
-		else
-		{
-			int			nxids = mypgxact->nxids;
-
-			if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
-			{
-				myproc->subxids.xids[nxids] = xid;
-				mypgxact->nxids = nxids + 1;
-			}
-			else
-				mypgxact->overflowed = true;
-		}
+		mypgxact->xid = xid;
 	}
 
 	LWLockRelease(XidGenLock);
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c06fabca10..efb8e5fefe 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -20,8 +20,10 @@
 #include <time.h>
 #include <unistd.h>
 
+#include "access/clog.h"
 #include "access/commit_ts.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/parallel.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -185,11 +187,10 @@ typedef struct TransactionStateData
 	int			maxChildXids;	/* allocated size of childXids[] */
 	Oid			prevUser;		/* previous CurrentUserId setting */
 	int			prevSecContext; /* previous SecurityRestrictionContext */
-	bool		prevXactReadOnly;	/* entry-time xact r/o state */
-	bool		startedInRecovery;	/* did we start in recovery? */
-	bool		didLogXid;		/* has xid been included in WAL record? */
-	int			parallelModeLevel;	/* Enter/ExitParallelMode counter */
-	struct TransactionStateData *parent;	/* back link to parent */
+	bool		prevXactReadOnly;		/* entry-time xact r/o state */
+	bool		startedInRecovery;		/* did we start in recovery? */
+	int			parallelModeLevel;		/* Enter/ExitParallelMode counter */
+	struct TransactionStateData *parent;		/* back link to parent */
 } TransactionStateData;
 
 typedef TransactionStateData *TransactionState;
@@ -218,18 +219,10 @@ static TransactionStateData TopTransactionStateData = {
 	0,							/* previous SecurityRestrictionContext */
 	false,						/* entry-time xact r/o state */
 	false,						/* startedInRecovery */
-	false,						/* didLogXid */
 	0,							/* parallelMode */
 	NULL						/* link to parent state block */
 };
 
-/*
- * unreportedXids holds XIDs of all subtransactions that have not yet been
- * reported in an XLOG_XACT_ASSIGNMENT record.
- */
-static int	nUnreportedXids;
-static TransactionId unreportedXids[PGPROC_MAX_CACHED_SUBXIDS];
-
 static TransactionState CurrentTransactionState = &TopTransactionStateData;
 
 /*
@@ -313,7 +306,7 @@ static void CleanupTransaction(void);
 static void CheckTransactionChain(bool isTopLevel, bool throwError,
 					  const char *stmtType);
 static void CommitTransaction(void);
-static TransactionId RecordTransactionAbort(bool isSubXact);
+static void RecordTransactionAbort(bool isSubXact);
 static void StartTransaction(void);
 
 static void StartSubTransaction(void);
@@ -438,19 +431,6 @@ GetCurrentTransactionIdIfAny(void)
 }
 
 /*
- *	MarkCurrentTransactionIdLoggedIfAny
- *
- * Remember that the current xid - if it is assigned - now has been wal logged.
- */
-void
-MarkCurrentTransactionIdLoggedIfAny(void)
-{
-	if (TransactionIdIsValid(CurrentTransactionState->transactionId))
-		CurrentTransactionState->didLogXid = true;
-}
-
-
-/*
  *	GetStableLatestTransactionId
  *
  * Get the transaction's XID if it has one, else read the next-to-be-assigned
@@ -491,7 +471,6 @@ AssignTransactionId(TransactionState s)
 {
 	bool		isSubXact = (s->parent != NULL);
 	ResourceOwner currentOwner;
-	bool		log_unknown_top = false;
 
 	/* Assert that caller didn't screw up */
 	Assert(!TransactionIdIsValid(s->transactionId));
@@ -542,18 +521,14 @@ AssignTransactionId(TransactionState s)
 	 * superfluously log something. That can happen when an xid is included
 	 * somewhere inside a wal record, but not in XLogRecord->xl_xid, like in
 	 * xl_standby_locks.
+	 *
+	 * FIXME: didLogXid and the whole xact_assignment stuff is no more. We
+	 * no longer need it for subtransactions. Do we still need it for this
+	 * logical stuff?
 	 */
-	if (isSubXact && XLogLogicalInfoActive() &&
-		!TopTransactionStateData.didLogXid)
-		log_unknown_top = true;
 
 	/*
 	 * Generate a new Xid and record it in PG_PROC and pg_subtrans.
-	 *
-	 * NB: we must make the subtrans entry BEFORE the Xid appears anywhere in
-	 * shared storage other than PG_PROC; because if there's no room for it in
-	 * PG_PROC, the subtrans entry is needed to ensure that other backends see
-	 * the Xid as "running".  See GetNewTransactionId.
 	 */
 	s->transactionId = GetNewTransactionId(isSubXact);
 	if (!isSubXact)
@@ -580,59 +555,6 @@ AssignTransactionId(TransactionState s)
 	XactLockTableInsert(s->transactionId);
 
 	CurrentResourceOwner = currentOwner;
-
-	/*
-	 * Every PGPROC_MAX_CACHED_SUBXIDS assigned transaction ids within each
-	 * top-level transaction we issue a WAL record for the assignment. We
-	 * include the top-level xid and all the subxids that have not yet been
-	 * reported using XLOG_XACT_ASSIGNMENT records.
-	 *
-	 * This is required to limit the amount of shared memory required in a hot
-	 * standby server to keep track of in-progress XIDs. See notes for
-	 * RecordKnownAssignedTransactionIds().
-	 *
-	 * We don't keep track of the immediate parent of each subxid, only the
-	 * top-level transaction that each subxact belongs to. This is correct in
-	 * recovery only because aborted subtransactions are separately WAL
-	 * logged.
-	 *
-	 * This is correct even for the case where several levels above us didn't
-	 * have an xid assigned as we recursed up to them beforehand.
-	 */
-	if (isSubXact && XLogStandbyInfoActive())
-	{
-		unreportedXids[nUnreportedXids] = s->transactionId;
-		nUnreportedXids++;
-
-		/*
-		 * ensure this test matches similar one in
-		 * RecoverPreparedTransactions()
-		 */
-		if (nUnreportedXids >= PGPROC_MAX_CACHED_SUBXIDS ||
-			log_unknown_top)
-		{
-			xl_xact_assignment xlrec;
-
-			/*
-			 * xtop is always set by now because we recurse up transaction
-			 * stack to the highest unassigned xid and then come back down
-			 */
-			xlrec.xtop = GetTopTransactionId();
-			Assert(TransactionIdIsValid(xlrec.xtop));
-			xlrec.nsubxacts = nUnreportedXids;
-
-			XLogBeginInsert();
-			XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment);
-			XLogRegisterData((char *) unreportedXids,
-							 nUnreportedXids * sizeof(TransactionId));
-
-			(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
-
-			nUnreportedXids = 0;
-			/* mark top, not current xact as having been logged */
-			TopTransactionStateData.didLogXid = true;
-		}
-	}
 }
 
 /*
@@ -1109,17 +1031,13 @@ AtSubStart_ResourceOwner(void)
 /*
  *	RecordTransactionCommit
  *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
- *
  * If you change this function, see RecordTransactionCommitPrepared also.
  */
-static TransactionId
+static void
 RecordTransactionCommit(void)
 {
 	TransactionId xid = GetTopTransactionIdIfAny();
 	bool		markXidCommitted = TransactionIdIsValid(xid);
-	TransactionId latestXid = InvalidTransactionId;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1283,7 +1201,7 @@ RecordTransactionCommit(void)
 		XLogFlush(XactLastRecEnd);
 
 		/*
-		 * Now we may update the CLOG, if we wrote a COMMIT record above
+		 * Now we may update the CLOG and CSNLOG, if we wrote a COMMIT record above
 		 */
 		if (markXidCommitted)
 			TransactionIdCommitTree(xid, nchildren, children);
@@ -1309,7 +1227,8 @@ RecordTransactionCommit(void)
 		 * flushed before the CLOG may be updated.
 		 */
 		if (markXidCommitted)
-			TransactionIdAsyncCommitTree(xid, nchildren, children, XactLastRecEnd);
+			TransactionIdAsyncCommitTree(xid, nchildren, children,
+										 XactLastRecEnd);
 	}
 
 	/*
@@ -1322,9 +1241,6 @@ RecordTransactionCommit(void)
 		END_CRIT_SECTION();
 	}
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
 	/*
 	 * Wait for synchronous replication, if required. Similar to the decision
 	 * above about using committing asynchronously we only want to wait if
@@ -1346,8 +1262,6 @@ cleanup:
 	/* Clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 
@@ -1515,15 +1429,11 @@ AtSubCommit_childXids(void)
 
 /*
  *	RecordTransactionAbort
- *
- * Returns latest XID among xact and its children, or InvalidTransactionId
- * if the xact has no XID.  (We compute that here just because it's easier.)
  */
-static TransactionId
+static void
 RecordTransactionAbort(bool isSubXact)
 {
 	TransactionId xid = GetCurrentTransactionIdIfAny();
-	TransactionId latestXid;
 	int			nrels;
 	RelFileNode *rels;
 	int			nchildren;
@@ -1541,7 +1451,7 @@ RecordTransactionAbort(bool isSubXact)
 		/* Reset XactLastRecEnd until the next transaction writes something */
 		if (!isSubXact)
 			XactLastRecEnd = 0;
-		return InvalidTransactionId;
+		return;
 	}
 
 	/*
@@ -1604,18 +1514,6 @@ RecordTransactionAbort(bool isSubXact)
 
 	END_CRIT_SECTION();
 
-	/* Compute latestXid while we have the child XIDs handy */
-	latestXid = TransactionIdLatest(xid, nchildren, children);
-
-	/*
-	 * If we're aborting a subtransaction, we can immediately remove failed
-	 * XIDs from PGPROC's cache of running child XIDs.  We do that here for
-	 * subxacts, because we already have the child XID array at hand.  For
-	 * main xacts, the equivalent happens just after this function returns.
-	 */
-	if (isSubXact)
-		XidCacheRemoveRunningXids(xid, nchildren, children, latestXid);
-
 	/* Reset XactLastRecEnd until the next transaction writes something */
 	if (!isSubXact)
 		XactLastRecEnd = 0;
@@ -1623,8 +1521,6 @@ RecordTransactionAbort(bool isSubXact)
 	/* And clean up local data */
 	if (rels)
 		pfree(rels);
-
-	return latestXid;
 }
 
 /*
@@ -1851,12 +1747,6 @@ StartTransaction(void)
 	currentCommandIdUsed = false;
 
 	/*
-	 * initialize reported xid accounting
-	 */
-	nUnreportedXids = 0;
-	s->didLogXid = false;
-
-	/*
 	 * must initialize resource-management stuff first
 	 */
 	AtStart_Memory();
@@ -1933,7 +1823,6 @@ static void
 CommitTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	is_parallel_worker = (s->blockState == TBLOCK_PARALLEL_INPROGRESS);
@@ -2033,17 +1922,11 @@ CommitTransaction(void)
 		 * We need to mark our XIDs as committed in pg_xact.  This is where we
 		 * durably commit.
 		 */
-		latestXid = RecordTransactionCommit();
+		RecordTransactionCommit();
 	}
 	else
 	{
 		/*
-		 * We must not mark our XID committed; the parallel master is
-		 * responsible for that.
-		 */
-		latestXid = InvalidTransactionId;
-
-		/*
 		 * Make sure the master will know about any WAL we wrote before it
 		 * commits.
 		 */
@@ -2057,7 +1940,7 @@ CommitTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionCommit.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * This is all post-commit cleanup.  Note that if an error is raised here,
@@ -2444,7 +2327,6 @@ static void
 AbortTransaction(void)
 {
 	TransactionState s = CurrentTransactionState;
-	TransactionId latestXid;
 	bool		is_parallel_worker;
 
 	/* Prevent cancel/die interrupt while cleaning up */
@@ -2549,11 +2431,9 @@ AbortTransaction(void)
 	 * record.
 	 */
 	if (!is_parallel_worker)
-		latestXid = RecordTransactionAbort(false);
+		RecordTransactionAbort(false);
 	else
 	{
-		latestXid = InvalidTransactionId;
-
 		/*
 		 * Since the parallel master won't get our value of XactLastRecEnd in
 		 * this case, we nudge WAL-writer ourselves in this case.  See related
@@ -2569,7 +2449,7 @@ AbortTransaction(void)
 	 * must be done _before_ releasing locks we hold and _after_
 	 * RecordTransactionAbort.
 	 */
-	ProcArrayEndTransaction(MyProc, latestXid);
+	ProcArrayEndTransaction(MyProc);
 
 	/*
 	 * Post-abort cleanup.  See notes in CommitTransaction() concerning
@@ -5530,9 +5410,12 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 	if (standbyState == STANDBY_DISABLED)
 	{
 		/*
-		 * Mark the transaction committed in pg_xact.
+		 * Mark the transaction committed in pg_xact. We don't bother updating
+		 * pg_csnlog during replay.
 		 */
-		TransactionIdCommitTree(xid, parsed->nsubxacts, parsed->subxacts);
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_COMMITTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5556,14 +5439,7 @@ xact_redo_commit(xl_xact_parsed_commit *parsed,
 		 * bits set on changes made by transactions that haven't yet
 		 * recovered. It's unlikely but it's good to be safe.
 		 */
-		TransactionIdAsyncCommitTree(
-									 xid, parsed->nsubxacts, parsed->subxacts, lsn);
-
-		/*
-		 * We must mark clog before we update the ProcArray.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
+		TransactionIdAsyncCommitTree(xid, parsed->nsubxacts, parsed->subxacts, lsn);
 
 		/*
 		 * Send any cache invalidations attached to the commit. We must
@@ -5688,8 +5564,13 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 
 	if (standbyState == STANDBY_DISABLED)
 	{
-		/* Mark the transaction aborted in pg_xact, no need for async stuff */
-		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
+		/*
+		 * Mark the transaction aborted in pg_xact, no need for async stuff or
+		 * to update pg_csnlog.
+		 */
+		CLogSetTreeStatus(xid, parsed->nsubxacts, parsed->subxacts,
+						  CLOG_XID_STATUS_ABORTED,
+						  InvalidXLogRecPtr);
 	}
 	else
 	{
@@ -5708,12 +5589,6 @@ xact_redo_abort(xl_xact_parsed_abort *parsed, TransactionId xid)
 		TransactionIdAbortTree(xid, parsed->nsubxacts, parsed->subxacts);
 
 		/*
-		 * We must update the ProcArray after we have marked clog.
-		 */
-		ExpireTreeKnownAssignedTransactionIds(
-											  xid, parsed->nsubxacts, parsed->subxacts, max_xid);
-
-		/*
 		 * There are no flat files that need updating, nor invalidation
 		 * messages to send or undo.
 		 */
@@ -5802,14 +5677,6 @@ xact_redo(XLogReaderState *record)
 					   record->EndRecPtr);
 		LWLockRelease(TwoPhaseStateLock);
 	}
-	else if (info == XLOG_XACT_ASSIGNMENT)
-	{
-		xl_xact_assignment *xlrec = (xl_xact_assignment *) XLogRecGetData(record);
-
-		if (standbyState >= STANDBY_INITIALIZED)
-			ProcArrayApplyXidAssignment(xlrec->xtop,
-										xlrec->nsubxacts, xlrec->xsub);
-	}
 	else
 		elog(PANIC, "xact_redo: unknown op code %u", info);
 }
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e729180f82..f7781cbabc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -24,7 +24,9 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/multixact.h"
+#include "access/mvccvars.h"
 #include "access/rewriteheap.h"
 #include "access/subtrans.h"
 #include "access/timeline.h"
@@ -1103,8 +1105,6 @@ XLogInsertRecord(XLogRecData *rdata,
 	 */
 	WALInsertLockRelease();
 
-	MarkCurrentTransactionIdLoggedIfAny();
-
 	END_CRIT_SECTION();
 
 	/*
@@ -5013,6 +5013,7 @@ BootStrapXLOG(void)
 	char		mock_auth_nonce[MOCK_AUTH_NONCE_LEN];
 	struct timeval tv;
 	pg_crc32c	crc;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Select a hopefully-unique system identifier code for this installation.
@@ -5078,6 +5079,13 @@ BootStrapXLOG(void)
 	ShmemVariableCache->nextXid = checkPoint.nextXid;
 	ShmemVariableCache->nextOid = checkPoint.nextOid;
 	ShmemVariableCache->oidCount = 0;
+
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	MultiXactSetNextMXact(checkPoint.nextMulti, checkPoint.nextMultiOffset);
 	AdvanceOldestClogXid(checkPoint.oldestXid);
 	SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB);
@@ -5176,8 +5184,8 @@ BootStrapXLOG(void)
 
 	/* Bootstrap the commit log, too */
 	BootStrapCLOG();
+	BootStrapCSNLOG();
 	BootStrapCommitTs();
-	BootStrapSUBTRANS();
 	BootStrapMultiXact();
 
 	pfree(buffer);
@@ -6283,6 +6291,7 @@ StartupXLOG(void)
 	XLogPageReadPrivate private;
 	bool		fast_promoted = false;
 	struct stat st;
+	TransactionId latestCompletedXid;
 
 	/*
 	 * Verify XLOG status looks valid.
@@ -6694,6 +6703,12 @@ StartupXLOG(void)
 	XLogCtl->ckptXidEpoch = checkPoint.nextXidEpoch;
 	XLogCtl->ckptXid = checkPoint.nextXid;
 
+	pg_atomic_write_u64(&ShmemVariableCache->nextCommitSeqNo, COMMITSEQNO_FIRST_NORMAL);
+	latestCompletedXid = checkPoint.nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid, latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, checkPoint.nextXid);
+
 	/*
 	 * Initialize replication slots, before there's a chance to remove
 	 * required resources.
@@ -6945,15 +6960,15 @@ StartupXLOG(void)
 			Assert(TransactionIdIsValid(oldestActiveXID));
 
 			/* Tell procarray about the range of xids it has to deal with */
-			ProcArrayInitRecovery(ShmemVariableCache->nextXid);
+			ProcArrayInitRecovery(oldestActiveXID, ShmemVariableCache->nextXid);
 
 			/*
-			 * Startup commit log and subtrans only.  MultiXact and commit
+			 * Startup commit log and csnlog only.  MultiXact and commit
 			 * timestamp have already been started up and other SLRUs are not
 			 * maintained during recovery and need not be started yet.
 			 */
 			StartupCLOG();
-			StartupSUBTRANS(oldestActiveXID);
+			StartupCSNLOG(oldestActiveXID);
 
 			/*
 			 * If we're beginning at a shutdown checkpoint, we know that
@@ -6964,7 +6979,6 @@ StartupXLOG(void)
 			if (wasShutdown)
 			{
 				RunningTransactionsData running;
-				TransactionId latestCompletedXid;
 
 				/*
 				 * Construct a RunningTransactions snapshot representing a
@@ -6972,16 +6986,8 @@ StartupXLOG(void)
 				 * alive. We're never overflowed at this point because all
 				 * subxids are listed with their parent prepared transactions.
 				 */
-				running.xcnt = nxids;
-				running.subxcnt = 0;
-				running.subxid_overflow = false;
 				running.nextXid = checkPoint.nextXid;
 				running.oldestRunningXid = oldestActiveXID;
-				latestCompletedXid = checkPoint.nextXid;
-				TransactionIdRetreat(latestCompletedXid);
-				Assert(TransactionIdIsNormal(latestCompletedXid));
-				running.latestCompletedXid = latestCompletedXid;
-				running.xids = xids;
 
 				ProcArrayApplyRecoveryInfo(&running);
 
@@ -7725,20 +7731,22 @@ StartupXLOG(void)
 	XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
 	XLogCtl->lastSegSwitchLSN = EndOfLog;
 
-	/* also initialize latestCompletedXid, to nextXid - 1 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
-	TransactionIdRetreat(ShmemVariableCache->latestCompletedXid);
-	LWLockRelease(ProcArrayLock);
+	/* also initialize latestCompletedXid, to nextXid - 1, and oldestActiveXid */
+	latestCompletedXid = ShmemVariableCache->nextXid;
+	TransactionIdRetreat(latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->latestCompletedXid,
+						latestCompletedXid);
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid,
+						oldestActiveXID);
 
 	/*
-	 * Start up the commit log and subtrans, if not already done for hot
+	 * Start up the commit log and csnlog, if not already done for hot
 	 * standby.  (commit timestamps are started below, if necessary.)
 	 */
 	if (standbyState == STANDBY_DISABLED)
 	{
 		StartupCLOG();
-		StartupSUBTRANS(oldestActiveXID);
+		StartupCSNLOG(oldestActiveXID);
 	}
 
 	/*
@@ -8390,8 +8398,8 @@ ShutdownXLOG(int code, Datum arg)
 		CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);
 	}
 	ShutdownCLOG();
+	ShutdownCSNLOG();
 	ShutdownCommitTs();
-	ShutdownSUBTRANS();
 	ShutdownMultiXact();
 }
 
@@ -8959,14 +8967,14 @@ CreateCheckPoint(int flags)
 		PreallocXlogFiles(recptr);
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  During recovery, though, we mustn't do this because
-	 * StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  During recovery, though, we mustn't do this because
+	 * StartupCSNLOG hasn't been called yet.
 	 */
 	if (!RecoveryInProgress())
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update stats before releasing lock. */
 	LogCheckpointEnd(false);
@@ -9042,13 +9050,12 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointCLOG();
+	CheckPointCSNLOG();
 	CheckPointCommitTs();
-	CheckPointSUBTRANS();
 	CheckPointMultiXact();
 	CheckPointPredicate();
 	CheckPointRelationMap();
 	CheckPointReplicationSlots();
-	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointBuffers(flags);	/* performs all required fsyncs */
 	CheckPointReplicationOrigin();
@@ -9320,14 +9327,14 @@ CreateRestartPoint(int flags)
 	}
 
 	/*
-	 * Truncate pg_subtrans if possible.  We can throw away all data before
+	 * Truncate pg_csnlog if possible.  We can throw away all data before
 	 * the oldest XMIN of any running transaction.  No future transaction will
-	 * attempt to reference any pg_subtrans entry older than that (see Asserts
-	 * in subtrans.c).  When hot standby is disabled, though, we mustn't do
-	 * this because StartupSUBTRANS hasn't been called yet.
+	 * attempt to reference any pg_csnlog entry older than that (see Asserts
+	 * in csnlog.c).  When hot standby is disabled, though, we mustn't do
+	 * this because StartupCSNLOG hasn't been called yet.
 	 */
 	if (EnableHotStandby)
-		TruncateSUBTRANS(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
+		TruncateCSNLOG(GetOldestXmin(NULL, PROCARRAY_FLAGS_DEFAULT));
 
 	/* Real work is done, but log and update before releasing lock. */
 	LogCheckpointEnd(true);
@@ -9714,7 +9721,6 @@ xlog_redo(XLogReaderState *record)
 			TransactionId *xids;
 			int			nxids;
 			TransactionId oldestActiveXID;
-			TransactionId latestCompletedXid;
 			RunningTransactionsData running;
 
 			oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
@@ -9725,16 +9731,8 @@ xlog_redo(XLogReaderState *record)
 			 * never overflowed at this point because all subxids are listed
 			 * with their parent prepared transactions.
 			 */
-			running.xcnt = nxids;
-			running.subxcnt = 0;
-			running.subxid_overflow = false;
 			running.nextXid = checkPoint.nextXid;
 			running.oldestRunningXid = oldestActiveXID;
-			latestCompletedXid = checkPoint.nextXid;
-			TransactionIdRetreat(latestCompletedXid);
-			Assert(TransactionIdIsNormal(latestCompletedXid));
-			running.latestCompletedXid = latestCompletedXid;
-			running.xids = xids;
 
 			ProcArrayApplyRecoveryInfo(&running);
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9e14880b99..4be3a23900 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -69,6 +69,7 @@
 #include "parser/parse_relation.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -895,7 +896,7 @@ AddNewRelationTuple(Relation pg_class_desc,
 		 * We know that no xacts older than RecentXmin are still running, so
 		 * that will do.
 		 */
-		new_rel_reltup->relfrozenxid = RecentXmin;
+		new_rel_reltup->relfrozenxid = GetOldestActiveTransactionId();
 
 		/*
 		 * Similarly, initialize the minimum Multixact to the first value that
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index f7de742a56..9af6fe6a3d 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1942,23 +1942,34 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 		/* Ignore messages destined for other databases */
 		if (qe->dboid == MyDatabaseId)
 		{
-			if (XidInMVCCSnapshot(qe->xid, snapshot))
+			TransactionIdStatus status;
+			if (XidVisibleInSnapshot(qe->xid, snapshot, &status))
+			{
+				/* qe->data is the null-terminated channel name */
+				char	   *channel = qe->data;
+
+				Assert(status == XID_COMMITTED);
+
+				if (IsListeningOn(channel))
+				{
+					/* payload follows channel name */
+					char	   *payload = qe->data + strlen(channel) + 1;
+
+					NotifyMyFrontEnd(channel, payload, qe->srcPid);
+				}
+			}
+			else if (status == XID_INPROGRESS || status == XID_COMMITTED)
 			{
 				/*
-				 * The source transaction is still in progress, so we can't
-				 * process this message yet.  Break out of the loop, but first
-				 * back up *current so we will reprocess the message next
-				 * time.  (Note: it is unlikely but not impossible for
-				 * TransactionIdDidCommit to fail, so we can't really avoid
+				 * The source transaction is still in progress accroding to our
+				 * snapshot, so we can't process this message yet. Break out
+				 * of the loop, but first back up *current so we will reprocess
+				 * the message next time.  (Note: it is unlikely but not impossible
+				 * for TransactionIdDidCommit to fail, so we can't really avoid
 				 * this advance-then-back-up behavior when dealing with an
 				 * uncommitted message.)
 				 *
-				 * Note that we must test XidInMVCCSnapshot before we test
-				 * TransactionIdDidCommit, else we might return a message from
-				 * a transaction that is not yet visible to snapshots; compare
-				 * the comments at the head of tqual.c.
-				 *
-				 * Also, while our own xact won't be listed in the snapshot,
+				 * Note that while our own xact won't be listed in the snapshot,
 				 * we need not check for TransactionIdIsCurrentTransactionId
 				 * because our transaction cannot (yet) have queued any
 				 * messages.
@@ -1967,21 +1978,9 @@ asyncQueueProcessPageEntries(volatile QueuePosition *current,
 				reachedStop = true;
 				break;
 			}
-			else if (TransactionIdDidCommit(qe->xid))
-			{
-				/* qe->data is the null-terminated channel name */
-				char	   *channel = qe->data;
-
-				if (IsListeningOn(channel))
-				{
-					/* payload follows channel name */
-					char	   *payload = qe->data + strlen(channel) + 1;
-
-					NotifyMyFrontEnd(channel, payload, qe->srcPid);
-				}
-			}
 			else
 			{
+				Assert(status == XID_ABORTED);
 				/*
 				 * The source transaction aborted or crashed, so we just
 				 * ignore its notifications.
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d2e0376511..26575706a8 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -33,6 +33,7 @@
 #include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -842,7 +843,8 @@ static void
 refresh_by_heap_swap(Oid matviewOid, Oid OIDNewHeap, char relpersistence)
 {
 	finish_heap_swap(matviewOid, OIDNewHeap, false, false, true, true,
-					 RecentXmin, ReadNextMultiXactId(), relpersistence);
+					 GetOldestActiveTransactionId(), ReadNextMultiXactId(),
+					 relpersistence);
 }
 
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d19846d005..ea4234864d 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -87,6 +87,7 @@
 #include "storage/lmgr.h"
 #include "storage/lock.h"
 #include "storage/predicate.h"
+#include "storage/procarray.h"
 #include "storage/smgr.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
@@ -1474,7 +1475,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			 * deletion at commit.
 			 */
 			RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-									  RecentXmin, minmulti);
+									  GetOldestActiveTransactionId(), minmulti);
 			if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 				heap_create_init_fork(rel);
 
@@ -1488,7 +1489,7 @@ ExecuteTruncate(TruncateStmt *stmt)
 			{
 				rel = relation_open(toast_relid, AccessExclusiveLock);
 				RelationSetNewRelfilenode(rel, rel->rd_rel->relpersistence,
-										  RecentXmin, minmulti);
+										  GetOldestActiveTransactionId(), minmulti);
 				if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
 					heap_create_init_fork(rel);
 				heap_close(rel, NoLock);
@@ -4294,7 +4295,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			finish_heap_swap(tab->relid, OIDNewHeap,
 							 false, false, true,
 							 !OidIsValid(tab->newTableSpace),
-							 RecentXmin,
+							 GetOldestActiveTransactionId(),
 							 ReadNextMultiXactId(),
 							 persistence);
 		}
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index 486fd0c988..23d36401a6 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -165,7 +165,6 @@ LogicalDecodingProcessRecord(LogicalDecodingContext *ctx, XLogReaderState *recor
 static void
 DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 {
-	SnapBuild  *builder = ctx->snapshot_builder;
 	uint8		info = XLogRecGetInfo(buf->record) & ~XLR_INFO_MASK;
 
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(buf->record),
@@ -176,8 +175,6 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 			/* this is also used in END_OF_RECOVERY checkpoints */
 		case XLOG_CHECKPOINT_SHUTDOWN:
 		case XLOG_END_OF_RECOVERY:
-			SnapBuildSerializationPoint(builder, buf->origptr);
-
 			break;
 		case XLOG_CHECKPOINT_ONLINE:
 
@@ -217,8 +214,11 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	 * ok not to call ReorderBufferProcessXid() in that case, except in the
 	 * assignment case there'll not be any later records with the same xid;
 	 * and in the assignment case we'll not decode those xacts.
+	 *
+	 * FIXME: the assignment record is no more. I don't understand the above
+	 * comment. Can it be just removed?
 	 */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -259,23 +259,6 @@ DecodeXactOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 				DecodeAbort(ctx, buf, &parsed, xid);
 				break;
 			}
-		case XLOG_XACT_ASSIGNMENT:
-			{
-				xl_xact_assignment *xlrec;
-				int			i;
-				TransactionId *sub_xid;
-
-				xlrec = (xl_xact_assignment *) XLogRecGetData(r);
-
-				sub_xid = &xlrec->xsub[0];
-
-				for (i = 0; i < xlrec->nsubxacts; i++)
-				{
-					ReorderBufferAssignChild(reorder, xlrec->xtop,
-											 *(sub_xid++), buf->origptr);
-				}
-				break;
-			}
 		case XLOG_XACT_PREPARE:
 
 			/*
@@ -354,7 +337,7 @@ DecodeHeap2Op(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -409,7 +392,7 @@ DecodeHeapOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, xid, buf->origptr);
 
 	/* no point in doing anything yet */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	switch (info)
@@ -502,7 +485,7 @@ DecodeLogicalMsgOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 	ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(r), buf->origptr);
 
 	/* No point in doing anything yet. */
-	if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
+	if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
 		return;
 
 	message = (xl_logical_message *) XLogRecGetData(r);
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index bca585fc27..1f212cc04e 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -113,7 +113,7 @@ CheckLogicalDecodingRequirements(void)
 static LogicalDecodingContext *
 StartupDecodingContext(List *output_plugin_options,
 					   XLogRecPtr start_lsn,
-					   TransactionId xmin_horizon,
+
 					   bool need_full_snapshot,
 					   XLogPageReadCB read_page,
 					   LogicalOutputPluginWriterPrepareWrite prepare_write,
@@ -173,7 +173,7 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->reorder = ReorderBufferAllocate();
 	ctx->snapshot_builder =
-		AllocateSnapshotBuilder(ctx->reorder, xmin_horizon, start_lsn,
+		AllocateSnapshotBuilder(ctx->reorder, start_lsn,
 								need_full_snapshot);
 
 	ctx->reorder->private_data = ctx;
@@ -302,7 +302,7 @@ CreateInitDecodingContext(char *plugin,
 	ReplicationSlotMarkDirty();
 	ReplicationSlotSave();
 
-	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr, xmin_horizon,
+	ctx = StartupDecodingContext(NIL, InvalidXLogRecPtr,
 								 need_full_snapshot, read_page, prepare_write,
 								 do_write, update_progress);
 
@@ -394,10 +394,9 @@ CreateDecodingContext(XLogRecPtr start_lsn,
 	}
 
 	ctx = StartupDecodingContext(output_plugin_options,
-								 start_lsn, InvalidTransactionId, false,
+								 start_lsn, false,
 								 read_page, prepare_write, do_write,
 								 update_progress);
-
 	/* call output plugin initialization callback */
 	old_context = MemoryContextSwitchTo(ctx->context);
 	if (ctx->callbacks.startup_cb != NULL)
@@ -777,12 +776,12 @@ message_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
 }
 
 /*
- * Set the required catalog xmin horizon for historic snapshots in the current
- * replication slot.
+ * Set the oldest snapshot required for historic catalog lookups in the
+ * current replication slot.
  *
- * Note that in the most cases, we won't be able to immediately use the xmin
- * to increase the xmin horizon: we need to wait till the client has confirmed
- * receiving current_lsn with LogicalConfirmReceivedLocation().
+ * Note that in the most cases, we won't be able to immediately use the
+ * snapshot to increase the oldest snapshot, we need to wait till the client
+ * has confirmed receiving current_lsn with LogicalConfirmReceivedLocation().
  */
 void
 LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index dc0ad5b0e7..d43401287e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1190,7 +1190,6 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	Size		size;
 
 	size = sizeof(SnapshotData) +
-		sizeof(TransactionId) * orig_snap->xcnt +
 		sizeof(TransactionId) * (txn->nsubtxns + 1);
 
 	snap = MemoryContextAllocZero(rb->context, size);
@@ -1199,36 +1198,33 @@ ReorderBufferCopySnap(ReorderBuffer *rb, Snapshot orig_snap,
 	snap->copied = true;
 	snap->active_count = 1;		/* mark as active so nobody frees it */
 	snap->regd_count = 0;
-	snap->xip = (TransactionId *) (snap + 1);
-
-	memcpy(snap->xip, orig_snap->xip, sizeof(TransactionId) * snap->xcnt);
 
 	/*
 	 * snap->subxip contains all txids that belong to our transaction which we
 	 * need to check via cmin/cmax. That's why we store the toplevel
 	 * transaction in there as well.
 	 */
-	snap->subxip = snap->xip + snap->xcnt;
-	snap->subxip[i++] = txn->xid;
+	snap->this_xip = (TransactionId *) (snap + 1);
+	snap->this_xip[i++] = txn->xid;
 
 	/*
 	 * nsubxcnt isn't decreased when subtransactions abort, so count manually.
 	 * Since it's an upper boundary it is safe to use it for the allocation
 	 * above.
 	 */
-	snap->subxcnt = 1;
+	snap->this_xcnt = 1;
 
 	dlist_foreach(iter, &txn->subtxns)
 	{
 		ReorderBufferTXN *sub_txn;
 
 		sub_txn = dlist_container(ReorderBufferTXN, node, iter.cur);
-		snap->subxip[i++] = sub_txn->xid;
-		snap->subxcnt++;
+		snap->this_xip[i++] = sub_txn->xid;
+		snap->this_xcnt++;
 	}
 
 	/* sort so we can bsearch() later */
-	qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);
+	qsort(snap->this_xip, snap->this_xcnt, sizeof(TransactionId), xidComparator);
 
 	/* store the specified current CommandId */
 	snap->curcid = cid;
@@ -1300,6 +1296,7 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
 	}
 
 	snapshot_now = txn->base_snapshot;
+	Assert(snapshot_now->snapshotcsn != InvalidCommitSeqNo);
 
 	/* build data to be able to lookup the CommandIds of catalog tuples */
 	ReorderBufferBuildTupleCidHash(rb, txn);
@@ -2192,10 +2189,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				snap = change->data.snapshot;
 
-				sz += sizeof(SnapshotData) +
-					sizeof(TransactionId) * snap->xcnt +
-					sizeof(TransactionId) * snap->subxcnt
-					;
+				sz += sizeof(SnapshotData);
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
@@ -2205,20 +2199,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
-
-				if (snap->xcnt)
-				{
-					memcpy(data, snap->xip,
-						   sizeof(TransactionId) * snap->xcnt);
-					data += sizeof(TransactionId) * snap->xcnt;
-				}
-
-				if (snap->subxcnt)
-				{
-					memcpy(data, snap->subxip,
-						   sizeof(TransactionId) * snap->subxcnt);
-					data += sizeof(TransactionId) * snap->subxcnt;
-				}
 				break;
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM:
@@ -2484,24 +2464,16 @@ ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			}
 		case REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT:
 			{
-				Snapshot	oldsnap;
 				Snapshot	newsnap;
 				Size		size;
 
-				oldsnap = (Snapshot) data;
-
-				size = sizeof(SnapshotData) +
-					sizeof(TransactionId) * oldsnap->xcnt +
-					sizeof(TransactionId) * (oldsnap->subxcnt + 0);
+				size = sizeof(SnapshotData);
 
 				change->data.snapshot = MemoryContextAllocZero(rb->context, size);
 
 				newsnap = change->data.snapshot;
 
 				memcpy(newsnap, data, size);
-				newsnap->xip = (TransactionId *)
-					(((char *) newsnap) + sizeof(SnapshotData));
-				newsnap->subxip = newsnap->xip + newsnap->xcnt;
 				newsnap->copied = true;
 				break;
 			}
@@ -3153,7 +3125,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 			continue;
 
 		/* not for our transaction */
-		if (!TransactionIdInArray(f_mapped_xid, snapshot->subxip, snapshot->subxcnt))
+		if (!TransactionIdInArray(f_mapped_xid, snapshot->this_xip, snapshot->this_xcnt))
 			continue;
 
 		/* ok, relevant, queue for apply */
@@ -3181,7 +3153,7 @@ UpdateLogicalMappings(HTAB *tuplecid_data, Oid relid, Snapshot snapshot)
 		RewriteMappingFile *f = files_a[off];
 
 		elog(DEBUG1, "applying mapping: \"%s\" in %u", f->fname,
-			 snapshot->subxip[0]);
+			 snapshot->this_xip[0]);
 		ApplyLogicalMappingFile(tuplecid_data, relid, f->fname);
 		pfree(f);
 	}
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index ad65b9831d..580d45b252 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -164,17 +164,15 @@ struct SnapBuild
 	/* all transactions >= than this are uncommitted */
 	TransactionId xmax;
 
+	/* this determines the state of transactions between xmin and xmax */
+	CommitSeqNo snapshotcsn;
+
 	/*
 	 * Don't replay commits from an LSN < this LSN. This can be set externally
 	 * but it will also be advanced (never retreat) from within snapbuild.c.
 	 */
 	XLogRecPtr	start_decoding_at;
 
-	/*
-	 * Don't start decoding WAL until the "xl_running_xacts" information
-	 * indicates there are no running xids with an xid smaller than this.
-	 */
-	TransactionId initial_xmin_horizon;
 
 	/* Indicates if we are building full snapshot or just catalog one. */
 	bool		building_full_snapshot;
@@ -185,70 +183,9 @@ struct SnapBuild
 	Snapshot	snapshot;
 
 	/*
-	 * LSN of the last location we are sure a snapshot has been serialized to.
-	 */
-	XLogRecPtr	last_serialized_snapshot;
-
-	/*
 	 * The reorderbuffer we need to update with usable snapshots et al.
 	 */
 	ReorderBuffer *reorder;
-
-	/*
-	 * Outdated: This struct isn't used for its original purpose anymore, but
-	 * can't be removed / changed in a minor version, because it's stored
-	 * on-disk.
-	 */
-	struct
-	{
-		/*
-		 * NB: This field is misused, until a major version can break on-disk
-		 * compatibility. See SnapBuildNextPhaseAt() /
-		 * SnapBuildStartNextPhaseAt().
-		 */
-		TransactionId was_xmin;
-		TransactionId was_xmax;
-
-		size_t		was_xcnt;	/* number of used xip entries */
-		size_t		was_xcnt_space; /* allocated size of xip */
-		TransactionId *was_xip; /* running xacts array, xidComparator-sorted */
-	}			was_running;
-
-	/*
-	 * Array of transactions which could have catalog changes that committed
-	 * between xmin and xmax.
-	 */
-	struct
-	{
-		/* number of committed transactions */
-		size_t		xcnt;
-
-		/* available space for committed transactions */
-		size_t		xcnt_space;
-
-		/*
-		 * Until we reach a CONSISTENT state, we record commits of all
-		 * transactions, not just the catalog changing ones. Record when that
-		 * changes so we know we cannot export a snapshot safely anymore.
-		 */
-		bool		includes_all_transactions;
-
-		/*
-		 * Array of committed transactions that have modified the catalog.
-		 *
-		 * As this array is frequently modified we do *not* keep it in
-		 * xidComparator order. Instead we sort the array when building &
-		 * distributing a snapshot.
-		 *
-		 * TODO: It's unclear whether that reasoning has much merit. Every
-		 * time we add something here after becoming consistent will also
-		 * require distributing a snapshot. Storing them sorted would
-		 * potentially also make it easier to purge (but more complicated wrt
-		 * wraparound?). Should be improved if sorting while building the
-		 * snapshot shows up in profiles.
-		 */
-		TransactionId *xip;
-	}			committed;
 };
 
 /*
@@ -258,9 +195,6 @@ struct SnapBuild
 static ResourceOwner SavedResourceOwnerDuringExport = NULL;
 static bool ExportInProgress = false;
 
-/* ->committed manipulation */
-static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
-
 /* snapshot building/manipulation/distribution functions */
 static Snapshot SnapBuildBuildSnapshot(SnapBuild *builder);
 
@@ -270,41 +204,6 @@ static void SnapBuildSnapIncRefcount(Snapshot snap);
 
 static void SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn);
 
-/* xlog reading helper functions for SnapBuildProcessRecord */
-static bool SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running);
-static void SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff);
-
-/* serialization functions */
-static void SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn);
-static bool SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn);
-
-/*
- * Return TransactionId after which the next phase of initial snapshot
- * building will happen.
- */
-static inline TransactionId
-SnapBuildNextPhaseAt(SnapBuild *builder)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	return builder->was_running.was_xmax;
-}
-
-/*
- * Set TransactionId after which the next phase of initial snapshot building
- * will happen.
- */
-static inline void
-SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
-{
-	/*
-	 * For backward compatibility reasons this has to be stored in the wrongly
-	 * named field.  Will be fixed in next major version.
-	 */
-	builder->was_running.was_xmax = at;
-}
 
 /*
  * Allocate a new snapshot builder.
@@ -314,7 +213,6 @@ SnapBuildStartNextPhaseAt(SnapBuild *builder, TransactionId at)
  */
 SnapBuild *
 AllocateSnapshotBuilder(ReorderBuffer *reorder,
-						TransactionId xmin_horizon,
 						XLogRecPtr start_lsn,
 						bool need_full_snapshot)
 {
@@ -335,13 +233,6 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->reorder = reorder;
 	/* Other struct members initialized by zeroing via palloc0 above */
 
-	builder->committed.xcnt = 0;
-	builder->committed.xcnt_space = 128;	/* arbitrary number */
-	builder->committed.xip =
-		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
-	builder->committed.includes_all_transactions = true;
-
-	builder->initial_xmin_horizon = xmin_horizon;
 	builder->start_decoding_at = start_lsn;
 	builder->building_full_snapshot = need_full_snapshot;
 
@@ -380,7 +271,6 @@ SnapBuildFreeSnapshot(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 	Assert(snap->regd_count == 0);
 
@@ -438,7 +328,6 @@ SnapBuildSnapDecRefcount(Snapshot snap)
 
 	/* make sure nobody modified our snapshot */
 	Assert(snap->curcid == FirstCommandId);
-	Assert(!snap->suboverflowed);
 	Assert(!snap->takenDuringRecovery);
 
 	Assert(snap->regd_count == 0);
@@ -468,10 +357,9 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	Snapshot	snapshot;
 	Size		ssize;
 
-	Assert(builder->state >= SNAPBUILD_FULL_SNAPSHOT);
+	Assert(builder->state >= SNAPBUILD_CONSISTENT);
 
 	ssize = sizeof(SnapshotData)
-		+ sizeof(TransactionId) * builder->committed.xcnt
 		+ sizeof(TransactionId) * 1 /* toplevel xid */ ;
 
 	snapshot = MemoryContextAllocZero(builder->context, ssize);
@@ -479,52 +367,34 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 	snapshot->satisfies = HeapTupleSatisfiesHistoricMVCC;
 
 	/*
-	 * We misuse the original meaning of SnapshotData's xip and subxip fields
-	 * to make the more fitting for our needs.
-	 *
-	 * In the 'xip' array we store transactions that have to be treated as
-	 * committed. Since we will only ever look at tuples from transactions
-	 * that have modified the catalog it's more efficient to store those few
-	 * that exist between xmin and xmax (frequently there are none).
-	 *
 	 * Snapshots that are used in transactions that have modified the catalog
-	 * also use the 'subxip' array to store their toplevel xid and all the
+	 * use the 'this_xip' array to store their toplevel xid and all the
 	 * subtransaction xids so we can recognize when we need to treat rows as
-	 * visible that are not in xip but still need to be visible. Subxip only
+	 * visible that would not normally be visible by the CSN test. this_xip only
 	 * gets filled when the transaction is copied into the context of a
 	 * catalog modifying transaction since we otherwise share a snapshot
 	 * between transactions. As long as a txn hasn't modified the catalog it
 	 * doesn't need to treat any uncommitted rows as visible, so there is no
 	 * need for those xids.
 	 *
-	 * Both arrays are qsort'ed so that we can use bsearch() on them.
+	 * this_xip array is qsort'ed so that we can use bsearch() on them.
 	 */
 	Assert(TransactionIdIsNormal(builder->xmin));
 	Assert(TransactionIdIsNormal(builder->xmax));
+	Assert(builder->snapshotcsn != InvalidCommitSeqNo);
 
 	snapshot->xmin = builder->xmin;
 	snapshot->xmax = builder->xmax;
-
-	/* store all transactions to be treated as committed by this snapshot */
-	snapshot->xip =
-		(TransactionId *) ((char *) snapshot + sizeof(SnapshotData));
-	snapshot->xcnt = builder->committed.xcnt;
-	memcpy(snapshot->xip,
-		   builder->committed.xip,
-		   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+	snapshot->snapshotcsn = builder->snapshotcsn;
 
 	/*
-	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
+	 * Initially, this_xip is empty, i.e. it's a snapshot to be used by
 	 * transactions that don't modify the catalog. Will be filled by
 	 * ReorderBufferCopySnap() if necessary.
 	 */
-	snapshot->subxcnt = 0;
-	snapshot->subxip = NULL;
+	snapshot->this_xcnt = 0;
+	snapshot->this_xip = NULL;
 
-	snapshot->suboverflowed = false;
 	snapshot->takenDuringRecovery = false;
 	snapshot->copied = false;
 	snapshot->curcid = FirstCommandId;
@@ -545,9 +415,6 @@ Snapshot
 SnapBuildInitialSnapshot(SnapBuild *builder)
 {
 	Snapshot	snap;
-	TransactionId xid;
-	TransactionId *newxip;
-	int			newxcnt = 0;
 
 	Assert(!FirstSnapshotSet);
 	Assert(XactIsoLevel == XACT_REPEATABLE_READ);
@@ -555,9 +422,6 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	if (builder->state != SNAPBUILD_CONSISTENT)
 		elog(ERROR, "cannot build an initial slot snapshot before reaching a consistent state");
 
-	if (!builder->committed.includes_all_transactions)
-		elog(ERROR, "cannot build an initial slot snapshot, not all transactions are monitored anymore");
-
 	/* so we don't overwrite the existing value */
 	if (TransactionIdIsValid(MyPgXact->xmin))
 		elog(ERROR, "cannot build an initial slot snapshot when MyPgXact->xmin already is valid");
@@ -569,56 +433,7 @@ SnapBuildInitialSnapshot(SnapBuild *builder)
 	 * mechanism. Due to that we can do this without locks, we're only
 	 * changing our own value.
 	 */
-#ifdef USE_ASSERT_CHECKING
-	{
-		TransactionId safeXid;
-
-		LWLockAcquire(ProcArrayLock, LW_SHARED);
-		safeXid = GetOldestSafeDecodingTransactionId(false);
-		LWLockRelease(ProcArrayLock);
-
-		Assert(TransactionIdPrecedesOrEquals(safeXid, snap->xmin));
-	}
-#endif
-
-	MyPgXact->xmin = snap->xmin;
-
-	/* allocate in transaction context */
-	newxip = (TransactionId *)
-		palloc(sizeof(TransactionId) * GetMaxSnapshotXidCount());
-
-	/*
-	 * snapbuild.c builds transactions in an "inverted" manner, which means it
-	 * stores committed transactions in ->xip, not ones in progress. Build a
-	 * classical snapshot by marking all non-committed transactions as
-	 * in-progress. This can be expensive.
-	 */
-	for (xid = snap->xmin; NormalTransactionIdPrecedes(xid, snap->xmax);)
-	{
-		void	   *test;
-
-		/*
-		 * Check whether transaction committed using the decoding snapshot
-		 * meaning of ->xip.
-		 */
-		test = bsearch(&xid, snap->xip, snap->xcnt,
-					   sizeof(TransactionId), xidComparator);
-
-		if (test == NULL)
-		{
-			if (newxcnt >= GetMaxSnapshotXidCount())
-				ereport(ERROR,
-						(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
-						 errmsg("initial slot snapshot too large")));
-
-			newxip[newxcnt++] = xid;
-		}
-
-		TransactionIdAdvance(xid);
-	}
-
-	snap->xcnt = newxcnt;
-	snap->xip = newxip;
+	MyPgXact->snapshotcsn = snap->snapshotcsn;
 
 	return snap;
 }
@@ -661,10 +476,10 @@ SnapBuildExportSnapshot(SnapBuild *builder)
 	snapname = ExportSnapshot(snap);
 
 	ereport(LOG,
-			(errmsg_plural("exported logical decoding snapshot: \"%s\" with %u transaction ID",
-						   "exported logical decoding snapshot: \"%s\" with %u transaction IDs",
-						   snap->xcnt,
-						   snapname, snap->xcnt)));
+			(errmsg("exported logical decoding snapshot: \"%s\" at %X/%X",
+					snapname,
+					(uint32) (snap->snapshotcsn >> 32),
+					(uint32) snap->snapshotcsn)));
 	return snapname;
 }
 
@@ -722,16 +537,7 @@ SnapBuildProcessChange(SnapBuild *builder, TransactionId xid, XLogRecPtr lsn)
 	 * We can't handle data in transactions if we haven't built a snapshot
 	 * yet, so don't store them.
 	 */
-	if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-		return false;
-
-	/*
-	 * No point in keeping track of changes in transactions that we don't have
-	 * enough information about to decode. This means that they started before
-	 * we got into the SNAPBUILD_FULL_SNAPSHOT state.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT &&
-		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+	if (builder->state < SNAPBUILD_CONSISTENT)
 		return false;
 
 	/*
@@ -851,76 +657,6 @@ SnapBuildDistributeNewCatalogSnapshot(SnapBuild *builder, XLogRecPtr lsn)
 }
 
 /*
- * Keep track of a new catalog changing transaction that has committed.
- */
-static void
-SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	if (builder->committed.xcnt == builder->committed.xcnt_space)
-	{
-		builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
-
-		elog(DEBUG1, "increasing space for committed transactions to %u",
-			 (uint32) builder->committed.xcnt_space);
-
-		builder->committed.xip = repalloc(builder->committed.xip,
-										  builder->committed.xcnt_space * sizeof(TransactionId));
-	}
-
-	/*
-	 * TODO: It might make sense to keep the array sorted here instead of
-	 * doing it every time we build a new snapshot. On the other hand this
-	 * gets called repeatedly when a transaction with subtransactions commits.
-	 */
-	builder->committed.xip[builder->committed.xcnt++] = xid;
-}
-
-/*
- * Remove knowledge about transactions we treat as committed that are smaller
- * than ->xmin. Those won't ever get checked via the ->committed array but via
- * the clog machinery, so we don't need to waste memory on them.
- */
-static void
-SnapBuildPurgeCommittedTxn(SnapBuild *builder)
-{
-	int			off;
-	TransactionId *workspace;
-	int			surviving_xids = 0;
-
-	/* not ready yet */
-	if (!TransactionIdIsNormal(builder->xmin))
-		return;
-
-	/* TODO: Neater algorithm than just copying and iterating? */
-	workspace =
-		MemoryContextAlloc(builder->context,
-						   builder->committed.xcnt * sizeof(TransactionId));
-
-	/* copy xids that still are interesting to workspace */
-	for (off = 0; off < builder->committed.xcnt; off++)
-	{
-		if (NormalTransactionIdPrecedes(builder->committed.xip[off],
-										builder->xmin))
-			;					/* remove */
-		else
-			workspace[surviving_xids++] = builder->committed.xip[off];
-	}
-
-	/* copy workspace back to persistent state */
-	memcpy(builder->committed.xip, workspace,
-		   surviving_xids * sizeof(TransactionId));
-
-	elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
-		 (uint32) builder->committed.xcnt, (uint32) surviving_xids,
-		 builder->xmin, builder->xmax);
-	builder->committed.xcnt = surviving_xids;
-
-	pfree(workspace);
-}
-
-/*
  * Handle everything that needs to be done when a transaction commits
  */
 void
@@ -929,26 +665,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 {
 	int			nxact;
 
-	bool		needs_snapshot = false;
-	bool		needs_timetravel = false;
-	bool		sub_needs_timetravel = false;
+	bool		forced_timetravel = false;
 
-	TransactionId xmax = xid;
+	TransactionId xmax;
 
 	/*
-	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
-	 * will they be part of a snapshot.  So we don't need to record anything.
+	 * If we couldn't observe every change of a transaction because it was
+	 * already running at the point we started to observe we have to assume it
+	 * made catalog changes.
+	 *
+	 * This has the positive benefit that we afterwards have enough
+	 * information to build an exportable snapshot that's usable by pg_dump et
+	 * al.
 	 */
-	if (builder->state == SNAPBUILD_START ||
-		(builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-		 TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder))))
-	{
-		/* ensure that only commits after this are getting replayed */
-		if (builder->start_decoding_at <= lsn)
-			builder->start_decoding_at = lsn + 1;
-		return;
-	}
-
 	if (builder->state < SNAPBUILD_CONSISTENT)
 	{
 		/* ensure that only commits after this are getting replayed */
@@ -956,104 +685,45 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			builder->start_decoding_at = lsn + 1;
 
 		/*
-		 * If building an exportable snapshot, force xid to be tracked, even
-		 * if the transaction didn't modify the catalog.
+		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
+		 * timetravel ones, but we want to be able to export a snapshot when
+		 * we reached consistency.
 		 */
-		if (builder->building_full_snapshot)
-		{
-			needs_timetravel = true;
-		}
+		forced_timetravel = true;
+		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
 	}
 
+	xmax = builder->xmax;
+
+	if (NormalTransactionIdFollows(xid, xmax))
+		xmax = xid;
+	if (!forced_timetravel)
+	{
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
+			forced_timetravel = true;
+	}
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
 
-		/*
-		 * Add subtransaction to base snapshot if catalog modifying, we don't
-		 * distinguish to toplevel transactions there.
-		 */
-		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
-		{
-			sub_needs_timetravel = true;
-			needs_snapshot = true;
-
-			elog(DEBUG1, "found subtransaction %u:%u with catalog changes",
-				 xid, subxid);
-
-			SnapBuildAddCommittedTxn(builder, subxid);
+		if (NormalTransactionIdFollows(subxid, xmax))
+			xmax = subxid;
 
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
-		}
-
-		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state, even
-		 * if not catalog modifying.  Don't need to distribute a snapshot in
-		 * that case.
-		 */
-		else if (needs_timetravel)
+		if (!forced_timetravel)
 		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
+			if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+				forced_timetravel = true;
 		}
 	}
 
-	/* if top-level modified catalog, it'll need a snapshot */
-	if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
-	{
-		elog(DEBUG2, "found top level transaction %u, with catalog changes",
-			 xid);
-		needs_snapshot = true;
-		needs_timetravel = true;
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (sub_needs_timetravel)
-	{
-		/* track toplevel txn as well, subxact alone isn't meaningful */
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-	else if (needs_timetravel)
-	{
-		elog(DEBUG2, "forced transaction %u to do timetravel", xid);
-
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
-
-	if (!needs_timetravel)
-	{
-		/* record that we cannot export a general snapshot anymore */
-		builder->committed.includes_all_transactions = false;
-	}
-
-	Assert(!needs_snapshot || needs_timetravel);
-
-	/*
-	 * Adjust xmax of the snapshot builder, we only do that for committed,
-	 * catalog modifying, transactions, everything else isn't interesting for
-	 * us since we'll never look at the respective rows.
-	 */
-	if (needs_timetravel &&
-		(!TransactionIdIsValid(builder->xmax) ||
-		 TransactionIdFollowsOrEquals(xmax, builder->xmax)))
-	{
-		builder->xmax = xmax;
-		TransactionIdAdvance(builder->xmax);
-	}
+	builder->xmax = xmax;
+	/* We use the commit record's LSN as the snapshot */
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
 	/* if there's any reason to build a historic snapshot, do so now */
-	if (needs_snapshot)
+	if (forced_timetravel)
 	{
 		/*
-		 * If we haven't built a complete snapshot yet there's no need to hand
-		 * it out, it wouldn't (and couldn't) be used anyway.
-		 */
-		if (builder->state < SNAPBUILD_FULL_SNAPSHOT)
-			return;
-
-		/*
 		 * Decrease the snapshot builder's refcount of the old snapshot, note
 		 * that it still will be used if it has been handed out to the
 		 * reorderbuffer earlier.
@@ -1096,43 +766,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	ReorderBufferTXN *txn;
 
 	/*
-	 * If we're not consistent yet, inspect the record to see whether it
-	 * allows to get closer to being consistent. If we are consistent, dump
-	 * our snapshot so others or we, after a restart, can use it.
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-	{
-		/* returns false if there's no point in performing cleanup just yet */
-		if (!SnapBuildFindSnapshot(builder, lsn, running))
-			return;
-	}
-	else
-		SnapBuildSerialize(builder, lsn);
-
-	/*
 	 * Update range of interesting xids based on the running xacts
-	 * information. We don't increase ->xmax using it, because once we are in
-	 * a consistent state we can do that ourselves and much more efficiently
-	 * so, because we only need to do it for catalog transactions since we
-	 * only ever look at those.
-	 *
-	 * NB: We only increase xmax when a catalog modifying transaction commits
-	 * (see SnapBuildCommitTxn).  Because of this, xmax can be lower than
-	 * xmin, which looks odd but is correct and actually more efficient, since
-	 * we hit fast paths in tqual.c.
+	 * information.
 	 */
 	builder->xmin = running->oldestRunningXid;
+	builder->xmax = running->nextXid;
+	builder->snapshotcsn = (CommitSeqNo) lsn;
 
-	/* Remove transactions we don't need to keep track off anymore */
-	SnapBuildPurgeCommittedTxn(builder);
-
-	elog(DEBUG3, "xmin: %u, xmax: %u, oldestrunning: %u",
-		 builder->xmin, builder->xmax,
-		 running->oldestRunningXid);
+	elog(DEBUG3, "xmin: %u, xmax: %u",
+		 builder->xmin, builder->xmax);
+	Assert(lsn != InvalidXLogRecPtr);
 
 	/*
-	 * Increase shared memory limits, so vacuum can work on tuples we
-	 * prevented from being pruned till now.
+	 * Increase shared memory limits, so vacuum can work on tuples we prevented
+	 * from being pruned till now.
 	 */
 	LogicalIncreaseXminForSlot(lsn, running->oldestRunningXid);
 
@@ -1148,12 +795,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 * beginning. That point is where we can restart from.
 	 */
 
-	/*
-	 * Can't know about a serialized snapshot's location if we're not
-	 * consistent.
-	 */
 	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
+		builder->state = SNAPBUILD_CONSISTENT;
 
 	txn = ReorderBufferGetOldestTXN(builder->reorder);
 
@@ -1163,780 +806,4 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 */
 	if (txn != NULL && txn->restart_decoding_lsn != InvalidXLogRecPtr)
 		LogicalIncreaseRestartDecodingForSlot(lsn, txn->restart_decoding_lsn);
-
-	/*
-	 * No in-progress transaction, can reuse the last serialized snapshot if
-	 * we have one.
-	 */
-	else if (txn == NULL &&
-			 builder->reorder->current_restart_decoding_lsn != InvalidXLogRecPtr &&
-			 builder->last_serialized_snapshot != InvalidXLogRecPtr)
-		LogicalIncreaseRestartDecodingForSlot(lsn,
-											  builder->last_serialized_snapshot);
-}
-
-
-/*
- * Build the start of a snapshot that's capable of decoding the catalog.
- *
- * Helper function for SnapBuildProcessRunningXacts() while we're not yet
- * consistent.
- *
- * Returns true if there is a point in performing internal maintenance/cleanup
- * using the xl_running_xacts record.
- */
-static bool
-SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *running)
-{
-	/* ---
-	 * Build catalog decoding snapshot incrementally using information about
-	 * the currently running transactions. There are several ways to do that:
-	 *
-	 * a) There were no running transactions when the xl_running_xacts record
-	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state while waiting on c)'s sub-states.
-	 *
-	 * b) This (in a previous run) or another decoding slot serialized a
-	 *	  snapshot to disk that we can use.  Can't use this method for the
-	 *	  initial snapshot when slot is being created and needs full snapshot
-	 *	  for export or direct use, as that snapshot will only contain catalog
-	 *	  modifying transactions.
-	 *
-	 * c) First incrementally build a snapshot for catalog tuples
-	 *	  (BUILDING_SNAPSHOT), that requires all, already in-progress,
-	 *	  transactions to finish.  Every transaction starting after that
-	 *	  (FULL_SNAPSHOT state), has enough information to be decoded.  But
-	 *	  for older running transactions no viable snapshot exists yet, so
-	 *	  CONSISTENT will only be reached once all of those have finished.
-	 * ---
-	 */
-
-	/*
-	 * xl_running_xact record is older than what we can use, we might not have
-	 * all necessary catalog rows anymore.
-	 */
-	if (TransactionIdIsNormal(builder->initial_xmin_horizon) &&
-		NormalTransactionIdPrecedes(running->oldestRunningXid,
-									builder->initial_xmin_horizon))
-	{
-		ereport(DEBUG1,
-				(errmsg_internal("skipping snapshot at %X/%X while building logical decoding snapshot, xmin horizon too low",
-								 (uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail_internal("initial xmin horizon of %u vs the snapshot's %u",
-									builder->initial_xmin_horizon, running->oldestRunningXid)));
-
-
-		SnapBuildWaitSnapshot(running, builder->initial_xmin_horizon);
-
-		return true;
-	}
-
-	/*
-	 * a) No transaction were running, we can jump to consistent.
-	 *
-	 * This is not affected by races around xl_running_xacts, because we can
-	 * miss transaction commits, but currently not transactions starting.
-	 *
-	 * NB: We might have already started to incrementally assemble a snapshot,
-	 * so we need to be careful to deal with that.
-	 */
-	if (running->oldestRunningXid == running->nextXid)
-	{
-		if (builder->start_decoding_at == InvalidXLogRecPtr ||
-			builder->start_decoding_at <= lsn)
-			/* can decode everything after this */
-			builder->start_decoding_at = lsn + 1;
-
-		/* As no transactions were running xmin/xmax can be trivially set. */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no running transactions.")));
-
-		return false;
-	}
-	/* b) valid on disk state and not building full snapshot */
-	else if (!builder->building_full_snapshot &&
-			 SnapBuildRestore(builder, lsn))
-	{
-		/* there won't be any state to cleanup */
-		return false;
-	}
-
-	/*
-	 * c) transition from START to BUILDING_SNAPSHOT.
-	 *
-	 * In START state, and a xl_running_xacts record with running xacts is
-	 * encountered.  In that case, switch to BUILDING_SNAPSHOT state, and
-	 * record xl_running_xacts->nextXid.  Once all running xacts have finished
-	 * (i.e. they're all >= nextXid), we have a complete catalog snapshot.  It
-	 * might look that we could use xl_running_xact's ->xids information to
-	 * get there quicker, but that is problematic because transactions marked
-	 * as running, might already have inserted their commit record - it's
-	 * infeasible to change that with locking.
-	 */
-	else if (builder->state == SNAPBUILD_START)
-	{
-		builder->state = SNAPBUILD_BUILDING_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		/*
-		 * Start with an xmin/xmax that's correct for future, when all the
-		 * currently running transactions have finished. We'll update both
-		 * while waiting for the pending transactions to finish.
-		 */
-		builder->xmin = running->nextXid;	/* < are finished */
-		builder->xmax = running->nextXid;	/* >= are running */
-
-		/* so we can safely use the faster comparisons */
-		Assert(TransactionIdIsNormal(builder->xmin));
-		Assert(TransactionIdIsNormal(builder->xmax));
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial starting point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from BUILDING_SNAPSHOT to FULL_SNAPSHOT.
-	 *
-	 * In BUILDING_SNAPSHOT state, and this xl_running_xacts' oldestRunningXid
-	 * is >= than nextXid from when we switched to BUILDING_SNAPSHOT.  This
-	 * means all transactions starting afterwards have enough information to
-	 * be decoded.  Switch to FULL_SNAPSHOT.
-	 */
-	else if (builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_FULL_SNAPSHOT;
-		SnapBuildStartNextPhaseAt(builder, running->nextXid);
-
-		ereport(LOG,
-				(errmsg("logical decoding found initial consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("Waiting for transactions (approximately %d) older than %u to end.",
-						   running->xcnt, running->nextXid)));
-
-		SnapBuildWaitSnapshot(running, running->nextXid);
-	}
-
-	/*
-	 * c) transition from FULL_SNAPSHOT to CONSISTENT.
-	 *
-	 * In FULL_SNAPSHOT state (see d) ), and this xl_running_xacts'
-	 * oldestRunningXid is >= than nextXid from when we switched to
-	 * FULL_SNAPSHOT.  This means all transactions that are currently in
-	 * progress have a catalog snapshot, and all their changes have been
-	 * collected.  Switch to CONSISTENT.
-	 */
-	else if (builder->state == SNAPBUILD_FULL_SNAPSHOT &&
-			 TransactionIdPrecedesOrEquals(SnapBuildNextPhaseAt(builder),
-										   running->oldestRunningXid))
-	{
-		builder->state = SNAPBUILD_CONSISTENT;
-		SnapBuildStartNextPhaseAt(builder, InvalidTransactionId);
-
-		ereport(LOG,
-				(errmsg("logical decoding found consistent point at %X/%X",
-						(uint32) (lsn >> 32), (uint32) lsn),
-				 errdetail("There are no old transactions anymore.")));
-	}
-
-	/*
-	 * We already started to track running xacts and need to wait for all
-	 * in-progress ones to finish. We fall through to the normal processing of
-	 * records so incremental cleanup can be performed.
-	 */
-	return true;
-
-}
-
-/* ---
- * Iterate through xids in record, wait for all older than the cutoff to
- * finish.  Then, if possible, log a new xl_running_xacts record.
- *
- * This isn't required for the correctness of decoding, but to:
- * a) allow isolationtester to notice that we're currently waiting for
- *	  something.
- * b) log a new xl_running_xacts record where it'd be helpful, without having
- *	  to write for bgwriter or checkpointer.
- * ---
- */
-static void
-SnapBuildWaitSnapshot(xl_running_xacts *running, TransactionId cutoff)
-{
-	int			off;
-
-	for (off = 0; off < running->xcnt; off++)
-	{
-		TransactionId xid = running->xids[off];
-
-		/*
-		 * Upper layers should prevent that we ever need to wait on ourselves.
-		 * Check anyway, since failing to do so would either result in an
-		 * endless wait or an Assert() failure.
-		 */
-		if (TransactionIdIsCurrentTransactionId(xid))
-			elog(ERROR, "waiting for ourselves");
-
-		if (TransactionIdFollows(xid, cutoff))
-			continue;
-
-		XactLockTableWait(xid, NULL, NULL, XLTW_None);
-	}
-
-	/*
-	 * All transactions we needed to finish finished - try to ensure there is
-	 * another xl_running_xacts record in a timely manner, without having to
-	 * write for bgwriter or checkpointer to log one.  During recovery we
-	 * can't enforce that, so we'll have to wait.
-	 */
-	if (!RecoveryInProgress())
-	{
-		LogStandbySnapshot();
-	}
-}
-
-/* -----------------------------------
- * Snapshot serialization support
- * -----------------------------------
- */
-
-/*
- * We store current state of struct SnapBuild on disk in the following manner:
- *
- * struct SnapBuildOnDisk;
- * TransactionId * running.xcnt_space;
- * TransactionId * committed.xcnt; (*not xcnt_space*)
- *
- */
-typedef struct SnapBuildOnDisk
-{
-	/* first part of this struct needs to be version independent */
-
-	/* data not covered by checksum */
-	uint32		magic;
-	pg_crc32c	checksum;
-
-	/* data covered by checksum */
-
-	/* version, in case we want to support pg_upgrade */
-	uint32		version;
-	/* how large is the on disk data, excluding the constant sized part */
-	uint32		length;
-
-	/* version dependent part */
-	SnapBuild	builder;
-
-	/* variable amount of TransactionIds follows */
-} SnapBuildOnDisk;
-
-#define SnapBuildOnDiskConstantSize \
-	offsetof(SnapBuildOnDisk, builder)
-#define SnapBuildOnDiskNotChecksummedSize \
-	offsetof(SnapBuildOnDisk, version)
-
-#define SNAPBUILD_MAGIC 0x51A1E001
-#define SNAPBUILD_VERSION 2
-
-/*
- * Store/Load a snapshot from disk, depending on the snapshot builder's state.
- *
- * Supposed to be used by external (i.e. not snapbuild.c) code that just read
- * a record that's a potential location for a serialized snapshot.
- */
-void
-SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn)
-{
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		SnapBuildRestore(builder, lsn);
-	else
-		SnapBuildSerialize(builder, lsn);
-}
-
-/*
- * Serialize the snapshot 'builder' at the location 'lsn' if it hasn't already
- * been done by another decoding process.
- */
-static void
-SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
-{
-	Size		needed_length;
-	SnapBuildOnDisk *ondisk;
-	char	   *ondisk_c;
-	int			fd;
-	char		tmppath[MAXPGPATH];
-	char		path[MAXPGPATH];
-	int			ret;
-	struct stat stat_buf;
-	Size		sz;
-
-	Assert(lsn != InvalidXLogRecPtr);
-	Assert(builder->last_serialized_snapshot == InvalidXLogRecPtr ||
-		   builder->last_serialized_snapshot <= lsn);
-
-	/*
-	 * no point in serializing if we cannot continue to work immediately after
-	 * restoring the snapshot
-	 */
-	if (builder->state < SNAPBUILD_CONSISTENT)
-		return;
-
-	/*
-	 * We identify snapshots by the LSN they are valid for. We don't need to
-	 * include timelines in the name as each LSN maps to exactly one timeline
-	 * unless the user used pg_resetwal or similar. If a user did so, there's
-	 * no hope continuing to decode anyway.
-	 */
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	/*
-	 * first check whether some other backend already has written the snapshot
-	 * for this LSN. It's perfectly fine if there's none, so we accept ENOENT
-	 * as a valid state. Everything else is an unexpected error.
-	 */
-	ret = stat(path, &stat_buf);
-
-	if (ret != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errmsg("could not stat file \"%s\": %m", path)));
-
-	else if (ret == 0)
-	{
-		/*
-		 * somebody else has already serialized to this point, don't overwrite
-		 * but remember location, so we don't need to read old data again.
-		 *
-		 * To be sure it has been synced to disk after the rename() from the
-		 * tempfile filename to the real filename, we just repeat the fsync.
-		 * That ought to be cheap because in most scenarios it should already
-		 * be safely on disk.
-		 */
-		fsync_fname(path, false);
-		fsync_fname("pg_logical/snapshots", true);
-
-		builder->last_serialized_snapshot = lsn;
-		goto out;
-	}
-
-	/*
-	 * there is an obvious race condition here between the time we stat(2) the
-	 * file and us writing the file. But we rename the file into place
-	 * atomically and all files created need to contain the same data anyway,
-	 * so this is perfectly fine, although a bit of a resource waste. Locking
-	 * seems like pointless complication.
-	 */
-	elog(DEBUG1, "serializing snapshot to %s", path);
-
-	/* to make sure only we will write to this tempfile, include pid */
-	sprintf(tmppath, "pg_logical/snapshots/%X-%X.snap.%u.tmp",
-			(uint32) (lsn >> 32), (uint32) lsn, MyProcPid);
-
-	/*
-	 * Unlink temporary file if it already exists, needs to have been before a
-	 * crash/error since we won't enter this function twice from within a
-	 * single decoding slot/backend and the temporary file contains the pid of
-	 * the current process.
-	 */
-	if (unlink(tmppath) != 0 && errno != ENOENT)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not remove file \"%s\": %m", path)));
-
-	needed_length = sizeof(SnapBuildOnDisk) +
-		sizeof(TransactionId) * builder->committed.xcnt;
-
-	ondisk_c = MemoryContextAllocZero(builder->context, needed_length);
-	ondisk = (SnapBuildOnDisk *) ondisk_c;
-	ondisk->magic = SNAPBUILD_MAGIC;
-	ondisk->version = SNAPBUILD_VERSION;
-	ondisk->length = needed_length;
-	INIT_CRC32C(ondisk->checksum);
-	COMP_CRC32C(ondisk->checksum,
-				((char *) ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-	ondisk_c += sizeof(SnapBuildOnDisk);
-
-	memcpy(&ondisk->builder, builder, sizeof(SnapBuild));
-	/* NULL-ify memory-only data */
-	ondisk->builder.context = NULL;
-	ondisk->builder.snapshot = NULL;
-	ondisk->builder.reorder = NULL;
-	ondisk->builder.committed.xip = NULL;
-
-	COMP_CRC32C(ondisk->checksum,
-				&ondisk->builder,
-				sizeof(SnapBuild));
-
-	/* there shouldn't be any running xacts */
-	Assert(builder->was_running.was_xcnt == 0);
-
-	/* copy committed xacts */
-	sz = sizeof(TransactionId) * builder->committed.xcnt;
-	memcpy(ondisk_c, builder->committed.xip, sz);
-	COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
-	ondisk_c += sz;
-
-	FIN_CRC32C(ondisk->checksum);
-
-	/* we have valid data now, open tempfile and write it there */
-	fd = OpenTransientFile(tmppath,
-						   O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
-	if (fd < 0)
-		ereport(ERROR,
-				(errmsg("could not open file \"%s\": %m", path)));
-
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_WRITE);
-	if ((write(fd, ondisk, needed_length)) != needed_length)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not write to file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-
-	/*
-	 * fsync the file before renaming so that even if we crash after this we
-	 * have either a fully valid file or nothing.
-	 *
-	 * TODO: Do the fsync() via checkpoints/restartpoints, doing it here has
-	 * some noticeable overhead since it's performed synchronously during
-	 * decoding?
-	 */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_SYNC);
-	if (pg_fsync(fd) != 0)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not fsync file \"%s\": %m", tmppath)));
-	}
-	pgstat_report_wait_end();
-	CloseTransientFile(fd);
-
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * We may overwrite the work from some other backend, but that's ok, our
-	 * snapshot is valid as well, we'll just have done some superfluous work.
-	 */
-	if (rename(tmppath, path) != 0)
-	{
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not rename file \"%s\" to \"%s\": %m",
-						tmppath, path)));
-	}
-
-	/* make sure we persist */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-	/*
-	 * Now there's no way we can loose the dumped state anymore, remember this
-	 * as a serialization point.
-	 */
-	builder->last_serialized_snapshot = lsn;
-
-out:
-	ReorderBufferSetRestartPoint(builder->reorder,
-								 builder->last_serialized_snapshot);
-}
-
-/*
- * Restore a snapshot into 'builder' if previously one has been stored at the
- * location indicated by 'lsn'. Returns true if successful, false otherwise.
- */
-static bool
-SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
-{
-	SnapBuildOnDisk ondisk;
-	int			fd;
-	char		path[MAXPGPATH];
-	Size		sz;
-	int			readBytes;
-	pg_crc32c	checksum;
-
-	/* no point in loading a snapshot if we're already there */
-	if (builder->state == SNAPBUILD_CONSISTENT)
-		return false;
-
-	sprintf(path, "pg_logical/snapshots/%X-%X.snap",
-			(uint32) (lsn >> 32), (uint32) lsn);
-
-	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
-
-	if (fd < 0 && errno == ENOENT)
-		return false;
-	else if (fd < 0)
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", path)));
-
-	/* ----
-	 * Make sure the snapshot had been stored safely to disk, that's normally
-	 * cheap.
-	 * Note that we do not need PANIC here, nobody will be able to use the
-	 * slot without fsyncing, and saving it won't succeed without an fsync()
-	 * either...
-	 * ----
-	 */
-	fsync_fname(path, false);
-	fsync_fname("pg_logical/snapshots", true);
-
-
-	/* read statically sized portion of snapshot */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk, SnapBuildOnDiskConstantSize);
-	pgstat_report_wait_end();
-	if (readBytes != SnapBuildOnDiskConstantSize)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) SnapBuildOnDiskConstantSize)));
-	}
-
-	if (ondisk.magic != SNAPBUILD_MAGIC)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has wrong magic number: %u instead of %u",
-						path, ondisk.magic, SNAPBUILD_MAGIC)));
-
-	if (ondisk.version != SNAPBUILD_VERSION)
-		ereport(ERROR,
-				(errmsg("snapbuild state file \"%s\" has unsupported version: %u instead of %u",
-						path, ondisk.version, SNAPBUILD_VERSION)));
-
-	INIT_CRC32C(checksum);
-	COMP_CRC32C(checksum,
-				((char *) &ondisk) + SnapBuildOnDiskNotChecksummedSize,
-				SnapBuildOnDiskConstantSize - SnapBuildOnDiskNotChecksummedSize);
-
-	/* read SnapBuild */
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, &ondisk.builder, sizeof(SnapBuild));
-	pgstat_report_wait_end();
-	if (readBytes != sizeof(SnapBuild))
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sizeof(SnapBuild))));
-	}
-	COMP_CRC32C(checksum, &ondisk.builder, sizeof(SnapBuild));
-
-	/* restore running xacts (dead, but kept for backward compat) */
-	sz = sizeof(TransactionId) * ondisk.builder.was_running.was_xcnt_space;
-	ondisk.builder.was_running.was_xip =
-		MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.was_running.was_xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.was_running.was_xip, sz);
-
-	/* restore committed xacts information */
-	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
-	ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
-	pgstat_report_wait_start(WAIT_EVENT_SNAPBUILD_READ);
-	readBytes = read(fd, ondisk.builder.committed.xip, sz);
-	pgstat_report_wait_end();
-	if (readBytes != sz)
-	{
-		CloseTransientFile(fd);
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not read file \"%s\", read %d of %d: %m",
-						path, readBytes, (int) sz)));
-	}
-	COMP_CRC32C(checksum, ondisk.builder.committed.xip, sz);
-
-	CloseTransientFile(fd);
-
-	FIN_CRC32C(checksum);
-
-	/* verify checksum of what we've read */
-	if (!EQ_CRC32C(checksum, ondisk.checksum))
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("checksum mismatch for snapbuild state file \"%s\": is %u, should be %u",
-						path, checksum, ondisk.checksum)));
-
-	/*
-	 * ok, we now have a sensible snapshot here, figure out if it has more
-	 * information than we have.
-	 */
-
-	/*
-	 * We are only interested in consistent snapshots for now, comparing
-	 * whether one incomplete snapshot is more "advanced" seems to be
-	 * unnecessarily complex.
-	 */
-	if (ondisk.builder.state < SNAPBUILD_CONSISTENT)
-		goto snapshot_not_interesting;
-
-	/*
-	 * Don't use a snapshot that requires an xmin that we cannot guarantee to
-	 * be available.
-	 */
-	if (TransactionIdPrecedes(ondisk.builder.xmin, builder->initial_xmin_horizon))
-		goto snapshot_not_interesting;
-
-
-	/* ok, we think the snapshot is sensible, copy over everything important */
-	builder->xmin = ondisk.builder.xmin;
-	builder->xmax = ondisk.builder.xmax;
-	builder->state = ondisk.builder.state;
-
-	builder->committed.xcnt = ondisk.builder.committed.xcnt;
-	/* We only allocated/stored xcnt, not xcnt_space xids ! */
-	/* don't overwrite preallocated xip, if we don't have anything here */
-	if (builder->committed.xcnt > 0)
-	{
-		pfree(builder->committed.xip);
-		builder->committed.xcnt_space = ondisk.builder.committed.xcnt;
-		builder->committed.xip = ondisk.builder.committed.xip;
-	}
-	ondisk.builder.committed.xip = NULL;
-
-	/* our snapshot is not interesting anymore, build a new one */
-	if (builder->snapshot != NULL)
-	{
-		SnapBuildSnapDecRefcount(builder->snapshot);
-	}
-	builder->snapshot = SnapBuildBuildSnapshot(builder);
-	SnapBuildSnapIncRefcount(builder->snapshot);
-
-	ReorderBufferSetRestartPoint(builder->reorder, lsn);
-
-	Assert(builder->state == SNAPBUILD_CONSISTENT);
-
-	ereport(LOG,
-			(errmsg("logical decoding found consistent point at %X/%X",
-					(uint32) (lsn >> 32), (uint32) lsn),
-			 errdetail("Logical decoding will begin using saved snapshot.")));
-	return true;
-
-snapshot_not_interesting:
-	if (ondisk.builder.committed.xip != NULL)
-		pfree(ondisk.builder.committed.xip);
-	return false;
-}
-
-/*
- * Remove all serialized snapshots that are not required anymore because no
- * slot can need them. This doesn't actually have to run during a checkpoint,
- * but it's a convenient point to schedule this.
- *
- * NB: We run this during checkpoints even if logical decoding is disabled so
- * we cleanup old slots at some point after it got disabled.
- */
-void
-CheckPointSnapBuild(void)
-{
-	XLogRecPtr	cutoff;
-	XLogRecPtr	redo;
-	DIR		   *snap_dir;
-	struct dirent *snap_de;
-	char		path[MAXPGPATH + 21];
-
-	/*
-	 * We start off with a minimum of the last redo pointer. No new
-	 * replication slot will start before that, so that's a safe upper bound
-	 * for removal.
-	 */
-	redo = GetRedoRecPtr();
-
-	/* now check for the restart ptrs from existing slots */
-	cutoff = ReplicationSlotsComputeLogicalRestartLSN();
-
-	/* don't start earlier than the restart lsn */
-	if (redo < cutoff)
-		cutoff = redo;
-
-	snap_dir = AllocateDir("pg_logical/snapshots");
-	while ((snap_de = ReadDir(snap_dir, "pg_logical/snapshots")) != NULL)
-	{
-		uint32		hi;
-		uint32		lo;
-		XLogRecPtr	lsn;
-		struct stat statbuf;
-
-		if (strcmp(snap_de->d_name, ".") == 0 ||
-			strcmp(snap_de->d_name, "..") == 0)
-			continue;
-
-		snprintf(path, sizeof(path), "pg_logical/snapshots/%s", snap_de->d_name);
-
-		if (lstat(path, &statbuf) == 0 && !S_ISREG(statbuf.st_mode))
-		{
-			elog(DEBUG1, "only regular files expected: %s", path);
-			continue;
-		}
-
-		/*
-		 * temporary filenames from SnapBuildSerialize() include the LSN and
-		 * everything but are postfixed by .$pid.tmp. We can just remove them
-		 * the same as other files because there can be none that are
-		 * currently being written that are older than cutoff.
-		 *
-		 * We just log a message if a file doesn't fit the pattern, it's
-		 * probably some editors lock/state file or similar...
-		 */
-		if (sscanf(snap_de->d_name, "%X-%X.snap", &hi, &lo) != 2)
-		{
-			ereport(LOG,
-					(errmsg("could not parse file name \"%s\"", path)));
-			continue;
-		}
-
-		lsn = ((uint64) hi) << 32 | lo;
-
-		/* check whether we still need it */
-		if (lsn < cutoff || cutoff == InvalidXLogRecPtr)
-		{
-			elog(DEBUG1, "removing snapbuild snapshot %s", path);
-
-			/*
-			 * It's not particularly harmful, though strange, if we can't
-			 * remove the file here. Don't prevent the checkpoint from
-			 * completing, that'd be a cure worse than the disease.
-			 */
-			if (unlink(path) < 0)
-			{
-				ereport(LOG,
-						(errcode_for_file_access(),
-						 errmsg("could not remove file \"%s\": %m",
-								path)));
-				continue;
-			}
-		}
-	}
-	FreeDir(snap_dir);
 }
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2d1ed143e0..4e9f14090f 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -16,10 +16,10 @@
 
 #include "access/clog.h"
 #include "access/commit_ts.h"
+#include "access/csnlog.h"
 #include "access/heapam.h"
 #include "access/multixact.h"
 #include "access/nbtree.h"
-#include "access/subtrans.h"
 #include "access/twophase.h"
 #include "commands/async.h"
 #include "miscadmin.h"
@@ -127,8 +127,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 		size = add_size(size, ProcGlobalShmemSize());
 		size = add_size(size, XLOGShmemSize());
 		size = add_size(size, CLOGShmemSize());
+		size = add_size(size, CSNLOGShmemSize());
 		size = add_size(size, CommitTsShmemSize());
-		size = add_size(size, SUBTRANSShmemSize());
 		size = add_size(size, TwoPhaseShmemSize());
 		size = add_size(size, BackgroundWorkerShmemSize());
 		size = add_size(size, MultiXactShmemSize());
@@ -219,8 +219,8 @@ CreateSharedMemoryAndSemaphores(bool makePrivate, int port)
 	 */
 	XLOGShmemInit();
 	CLOGShmemInit();
+	CSNLOGShmemInit();
 	CommitTsShmemInit();
-	SUBTRANSShmemInit();
 	MultiXactShmemInit();
 	InitBufferPool();
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 37e12bd829..71a3997e21 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -13,24 +13,14 @@
  * See notes in src/backend/access/transam/README.
  *
  * The process arrays now also include structures representing prepared
- * transactions.  The xid and subxids fields of these are valid, as are the
+ * transactions.  The xid fields of these are valid, as are the
  * myProcLocks lists.  They can be distinguished from regular backend PGPROCs
  * at need by checking for pid == 0.
  *
- * During hot standby, we also keep a list of XIDs representing transactions
- * that are known to be running in the master (or more precisely, were running
- * as of the current point in the WAL stream).  This list is kept in the
- * KnownAssignedXids array, and is updated by watching the sequence of
- * arriving XIDs.  This is necessary because if we leave those XIDs out of
- * snapshots taken for standby queries, then they will appear to be already
- * complete, leading to MVCC failures.  Note that in hot standby, the PGPROC
- * array represents standby processes, which by definition are not running
- * transactions that have XIDs.
- *
- * It is perhaps possible for a backend on the master to terminate without
- * writing an abort record for its transaction.  While that shouldn't really
- * happen, it would tie up KnownAssignedXids indefinitely, so we protect
- * ourselves by pruning the array when a valid list of running XIDs arrives.
+ * During hot standby, we update latestCompletedXid, oldestActiveXid, and
+ * latestObservedXid, as we replay transaction commit/abort and standby WAL
+ * records. Note that in hot standby, the PGPROC array represents standby
+ * processes, which by definition are not running transactions that have XIDs.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -46,7 +36,8 @@
 #include <signal.h>
 
 #include "access/clog.h"
-#include "access/subtrans.h"
+#include "access/csnlog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/twophase.h"
 #include "access/xact.h"
@@ -68,24 +59,6 @@ typedef struct ProcArrayStruct
 	int			numProcs;		/* number of valid procs entries */
 	int			maxProcs;		/* allocated size of procs array */
 
-	/*
-	 * Known assigned XIDs handling
-	 */
-	int			maxKnownAssignedXids;	/* allocated size of array */
-	int			numKnownAssignedXids;	/* current # of valid entries */
-	int			tailKnownAssignedXids;	/* index of oldest valid element */
-	int			headKnownAssignedXids;	/* index of newest element, + 1 */
-	slock_t		known_assigned_xids_lck;	/* protects head/tail pointers */
-
-	/*
-	 * Highest subxid that has been removed from KnownAssignedXids array to
-	 * prevent overflow; or InvalidTransactionId if none.  We track this for
-	 * similar reasons to tracking overflowing cached subxids in PGXACT
-	 * entries.  Must hold exclusive ProcArrayLock to change this, and shared
-	 * lock to read it.
-	 */
-	TransactionId lastOverflowedXid;
-
 	/* oldest xmin of any replication slot */
 	TransactionId replication_slot_xmin;
 	/* oldest catalog xmin of any replication slot */
@@ -101,76 +74,23 @@ static PGPROC *allProcs;
 static PGXACT *allPgXact;
 
 /*
- * Bookkeeping for tracking emulated transactions in recovery
+ * Cached values for GetRecentGlobalXmin().
+ *
+ * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
+ * InvalidTransactionId, to ensure that no one tries to use a stale
+ * value. Readers should ensure that it has been set to something else
+ * before using it.
  */
-static TransactionId *KnownAssignedXids;
-static bool *KnownAssignedXidsValid;
-static TransactionId latestObservedXid = InvalidTransactionId;
+static int XminCacheResetCounter = 0;
+static TransactionId RecentGlobalXmin = InvalidTransactionId;
+static TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /*
- * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
- * the highest xid that might still be running that we don't have in
- * KnownAssignedXids.
+ * Bookkeeping for tracking transactions in recovery
  */
-static TransactionId standbySnapshotPendingXmin;
-
-#ifdef XIDCACHE_DEBUG
-
-/* counters for XidCache measurement */
-static long xc_by_recent_xmin = 0;
-static long xc_by_known_xact = 0;
-static long xc_by_my_xact = 0;
-static long xc_by_latest_xid = 0;
-static long xc_by_main_xid = 0;
-static long xc_by_child_xid = 0;
-static long xc_by_known_assigned = 0;
-static long xc_no_overflow = 0;
-static long xc_slow_answer = 0;
-
-#define xc_by_recent_xmin_inc()		(xc_by_recent_xmin++)
-#define xc_by_known_xact_inc()		(xc_by_known_xact++)
-#define xc_by_my_xact_inc()			(xc_by_my_xact++)
-#define xc_by_latest_xid_inc()		(xc_by_latest_xid++)
-#define xc_by_main_xid_inc()		(xc_by_main_xid++)
-#define xc_by_child_xid_inc()		(xc_by_child_xid++)
-#define xc_by_known_assigned_inc()	(xc_by_known_assigned++)
-#define xc_no_overflow_inc()		(xc_no_overflow++)
-#define xc_slow_answer_inc()		(xc_slow_answer++)
-
-static void DisplayXidCache(void);
-#else							/* !XIDCACHE_DEBUG */
-
-#define xc_by_recent_xmin_inc()		((void) 0)
-#define xc_by_known_xact_inc()		((void) 0)
-#define xc_by_my_xact_inc()			((void) 0)
-#define xc_by_latest_xid_inc()		((void) 0)
-#define xc_by_main_xid_inc()		((void) 0)
-#define xc_by_child_xid_inc()		((void) 0)
-#define xc_by_known_assigned_inc()	((void) 0)
-#define xc_no_overflow_inc()		((void) 0)
-#define xc_slow_answer_inc()		((void) 0)
-#endif							/* XIDCACHE_DEBUG */
-
-/* Primitives for KnownAssignedXids array handling for standby */
-static void KnownAssignedXidsCompress(bool force);
-static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock);
-static bool KnownAssignedXidsSearch(TransactionId xid, bool remove);
-static bool KnownAssignedXidExists(TransactionId xid);
-static void KnownAssignedXidsRemove(TransactionId xid);
-static void KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids);
-static void KnownAssignedXidsRemovePreceding(TransactionId xid);
-static int	KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax);
-static int KnownAssignedXidsGetAndSetXmin(TransactionId *xarray,
-							   TransactionId *xmin,
-							   TransactionId xmax);
-static TransactionId KnownAssignedXidsGetOldestXmin(void);
-static void KnownAssignedXidsDisplay(int trace_level);
-static void KnownAssignedXidsReset(void);
-static inline void ProcArrayEndTransactionInternal(PGPROC *proc,
-								PGXACT *pgxact, TransactionId latestXid);
-static void ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid);
+static TransactionId latestObservedXid = InvalidTransactionId;
+
+static void AdvanceOldestActiveXid(TransactionId myXid);
 
 /*
  * Report shared-memory space needed by CreateSharedProcArray.
@@ -186,31 +106,6 @@ ProcArrayShmemSize(void)
 	size = offsetof(ProcArrayStruct, pgprocnos);
 	size = add_size(size, mul_size(sizeof(int), PROCARRAY_MAXPROCS));
 
-	/*
-	 * During Hot Standby processing we have a data structure called
-	 * KnownAssignedXids, created in shared memory. Local data structures are
-	 * also created in various backends during GetSnapshotData(),
-	 * TransactionIdIsInProgress() and GetRunningTransactionData(). All of the
-	 * main structures created in those functions must be identically sized,
-	 * since we may at times copy the whole of the data structures around. We
-	 * refer to this size as TOTAL_MAX_CACHED_SUBXIDS.
-	 *
-	 * Ideally we'd only create this structure if we were actually doing hot
-	 * standby in the current run, but we don't know that yet at the time
-	 * shared memory is being set up.
-	 */
-#define TOTAL_MAX_CACHED_SUBXIDS \
-	((PGPROC_MAX_CACHED_SUBXIDS + 1) * PROCARRAY_MAXPROCS)
-
-	if (EnableHotStandby)
-	{
-		size = add_size(size,
-						mul_size(sizeof(TransactionId),
-								 TOTAL_MAX_CACHED_SUBXIDS));
-		size = add_size(size,
-						mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS));
-	}
-
 	return size;
 }
 
@@ -237,12 +132,6 @@ CreateSharedProcArray(void)
 		 */
 		procArray->numProcs = 0;
 		procArray->maxProcs = PROCARRAY_MAXPROCS;
-		procArray->maxKnownAssignedXids = TOTAL_MAX_CACHED_SUBXIDS;
-		procArray->numKnownAssignedXids = 0;
-		procArray->tailKnownAssignedXids = 0;
-		procArray->headKnownAssignedXids = 0;
-		SpinLockInit(&procArray->known_assigned_xids_lck);
-		procArray->lastOverflowedXid = InvalidTransactionId;
 		procArray->replication_slot_xmin = InvalidTransactionId;
 		procArray->replication_slot_catalog_xmin = InvalidTransactionId;
 	}
@@ -250,20 +139,6 @@ CreateSharedProcArray(void)
 	allProcs = ProcGlobal->allProcs;
 	allPgXact = ProcGlobal->allPgXact;
 
-	/* Create or attach to the KnownAssignedXids arrays too, if needed */
-	if (EnableHotStandby)
-	{
-		KnownAssignedXids = (TransactionId *)
-			ShmemInitStruct("KnownAssignedXids",
-							mul_size(sizeof(TransactionId),
-									 TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-		KnownAssignedXidsValid = (bool *)
-			ShmemInitStruct("KnownAssignedXidsValid",
-							mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
-							&found);
-	}
-
 	/* Register and initialize fields of ProcLWLockTranche */
 	LWLockRegisterTranche(LWTRANCHE_PROC, "proc");
 }
@@ -321,43 +196,15 @@ ProcArrayAdd(PGPROC *proc)
 
 /*
  * Remove the specified PGPROC from the shared array.
- *
- * When latestXid is a valid XID, we are removing a live 2PC gxact from the
- * array, and thus causing it to appear as "not running" anymore.  In this
- * case we must advance latestCompletedXid.  (This is essentially the same
- * as ProcArrayEndTransaction followed by removal of the PGPROC, but we take
- * the ProcArrayLock only once, and don't damage the content of the PGPROC;
- * twophase.c depends on the latter.)
  */
 void
-ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
+ProcArrayRemove(PGPROC *proc)
 {
 	ProcArrayStruct *arrayP = procArray;
 	int			index;
 
-#ifdef XIDCACHE_DEBUG
-	/* dump stats at backend shutdown, but not prepared-xact end */
-	if (proc->pid != 0)
-		DisplayXidCache();
-#endif
-
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/* Advance global latestCompletedXid while holding the lock */
-		if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-								  latestXid))
-			ShmemVariableCache->latestCompletedXid = latestXid;
-	}
-	else
-	{
-		/* Shouldn't be trying to remove a live transaction here */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-	}
-
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
 		if (arrayP->pgprocnos[index] == proc->pgprocno)
@@ -378,6 +225,15 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
 	elog(LOG, "failed to find proc %p in ProcArray", proc);
 }
 
+static void resetGlobalXminCache(void)
+{
+	if (++XminCacheResetCounter == 13)
+	{
+		XminCacheResetCounter = 0;
+		RecentGlobalXmin = InvalidTransactionId;
+		RecentGlobalDataXmin = InvalidTransactionId;
+	}
+}
 
 /*
  * ProcArrayEndTransaction -- mark a transaction as no longer running
@@ -386,211 +242,49 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
  * commit/abort must already be reported to WAL and pg_xact.
  *
  * proc is currently always MyProc, but we pass it explicitly for flexibility.
- * latestXid is the latest Xid among the transaction's main XID and
- * subtransactions, or InvalidTransactionId if it has no XID.  (We must ask
- * the caller to pass latestXid, instead of computing it from the PGPROC's
- * contents, because the subxid information in the PGPROC might be
- * incomplete.)
  */
 void
-ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
+ProcArrayEndTransaction(PGPROC *proc)
 {
 	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
+	TransactionId myXid;
 
-	if (TransactionIdIsValid(latestXid))
-	{
-		/*
-		 * We must lock ProcArrayLock while clearing our advertised XID, so
-		 * that we do not exit the set of "running" transactions while someone
-		 * else is taking a snapshot.  See discussion in
-		 * src/backend/access/transam/README.
-		 */
-		Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		/*
-		 * If we can immediately acquire ProcArrayLock, we clear our own XID
-		 * and release the lock.  If not, use group XID clearing to improve
-		 * efficiency.
-		 */
-		if (LWLockConditionalAcquire(ProcArrayLock, LW_EXCLUSIVE))
-		{
-			ProcArrayEndTransactionInternal(proc, pgxact, latestXid);
-			LWLockRelease(ProcArrayLock);
-		}
-		else
-			ProcArrayGroupClearXid(proc, latestXid);
-	}
-	else
-	{
-		/*
-		 * If we have no XID, we don't need to lock, since we won't affect
-		 * anyone else's calculation of a snapshot.  We might change their
-		 * estimate of global xmin, but that's OK.
-		 */
-		Assert(!TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-		proc->lxid = InvalidLocalTransactionId;
-		pgxact->xmin = InvalidTransactionId;
-		/* must be cleared with xid/xmin: */
-		pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
-		pgxact->delayChkpt = false; /* be sure this is cleared in abort */
-		proc->recoveryConflictPending = false;
+	myXid = pgxact->xid;
 
-		Assert(pgxact->nxids == 0);
-		Assert(pgxact->overflowed == false);
-	}
-}
-
-/*
- * Mark a write transaction as no longer running.
- *
- * We don't do any locking here; caller must handle that.
- */
-static inline void
-ProcArrayEndTransactionInternal(PGPROC *proc, PGXACT *pgxact,
-								TransactionId latestXid)
-{
+	/* A shared lock is enough to modify our own fields */
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
-	/* must be cleared with xid/xmin: */
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
+	/* must be cleared with xid/xmin/snapshotcsn: */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false; /* be sure this is cleared in abort */
 	proc->recoveryConflictPending = false;
 
-	/* Clear the subtransaction-XID cache too while holding the lock */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	LWLockRelease(ProcArrayLock);
+
+	/* If we were the oldest active XID, advance oldestXid */
+	if (TransactionIdIsValid(myXid))
+		AdvanceOldestActiveXid(myXid);
 
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
+	/* Reset cached variables */
+	resetGlobalXminCache();
 }
 
-/*
- * ProcArrayGroupClearXid -- group XID clearing
- *
- * When we cannot immediately acquire ProcArrayLock in exclusive mode at
- * commit time, add ourselves to a list of processes that need their XIDs
- * cleared.  The first process to add itself to the list will acquire
- * ProcArrayLock in exclusive mode and perform ProcArrayEndTransactionInternal
- * on behalf of all group members.  This avoids a great deal of contention
- * around ProcArrayLock when many processes are trying to commit at once,
- * since the lock need not be repeatedly handed off from one committing
- * process to the next.
- */
-static void
-ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
+void
+ProcArrayResetXmin(PGPROC *proc)
 {
-	volatile PROC_HDR *procglobal = ProcGlobal;
-	uint32		nextidx;
-	uint32		wakeidx;
-
-	/* We should definitely have an XID to clear. */
-	Assert(TransactionIdIsValid(allPgXact[proc->pgprocno].xid));
-
-	/* Add ourselves to the list of processes needing a group XID clear. */
-	proc->procArrayGroupMember = true;
-	proc->procArrayGroupMemberXid = latestXid;
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, nextidx);
-
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   (uint32) proc->pgprocno))
-			break;
-	}
-
-	/*
-	 * If the list was not empty, the leader will clear our XID.  It is
-	 * impossible to have followers without a leader because the first process
-	 * that has added itself to the list will always have nextidx as
-	 * INVALID_PGPROCNO.
-	 */
-	if (nextidx != INVALID_PGPROCNO)
-	{
-		int			extraWaits = 0;
-
-		/* Sleep until the leader clears our XID. */
-		pgstat_report_wait_start(WAIT_EVENT_PROCARRAY_GROUP_UPDATE);
-		for (;;)
-		{
-			/* acts as a read barrier */
-			PGSemaphoreLock(proc->sem);
-			if (!proc->procArrayGroupMember)
-				break;
-			extraWaits++;
-		}
-		pgstat_report_wait_end();
-
-		Assert(pg_atomic_read_u32(&proc->procArrayGroupNext) == INVALID_PGPROCNO);
-
-		/* Fix semaphore count for any absorbed wakeups */
-		while (extraWaits-- > 0)
-			PGSemaphoreUnlock(proc->sem);
-		return;
-	}
-
-	/* We are the leader.  Acquire the lock on behalf of everyone. */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Now that we've got the lock, clear the list of processes waiting for
-	 * group XID clearing, saving a pointer to the head of the list.  Trying
-	 * to pop elements one at a time could lead to an ABA problem.
-	 */
-	while (true)
-	{
-		nextidx = pg_atomic_read_u32(&procglobal->procArrayGroupFirst);
-		if (pg_atomic_compare_exchange_u32(&procglobal->procArrayGroupFirst,
-										   &nextidx,
-										   INVALID_PGPROCNO))
-			break;
-	}
-
-	/* Remember head of list so we can perform wakeups after dropping lock. */
-	wakeidx = nextidx;
-
-	/* Walk the list and clear all XIDs. */
-	while (nextidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[nextidx];
-		PGXACT	   *pgxact = &allPgXact[nextidx];
-
-		ProcArrayEndTransactionInternal(proc, pgxact, proc->procArrayGroupMemberXid);
-
-		/* Move to next proc in list. */
-		nextidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-	}
-
-	/* We're done with the lock now. */
-	LWLockRelease(ProcArrayLock);
+	PGXACT	   *pgxact = &allPgXact[proc->pgprocno];
 
 	/*
-	 * Now that we've released the lock, go back and wake everybody up.  We
-	 * don't do this under the lock so as to keep lock hold times to a
-	 * minimum.  The system calls we need to perform to wake other processes
-	 * up are probably much slower than the simple memory writes we did while
-	 * holding the lock.
+	 * Note we can do this without locking because we assume that storing an Xid
+	 * is atomic.
 	 */
-	while (wakeidx != INVALID_PGPROCNO)
-	{
-		PGPROC	   *proc = &allProcs[wakeidx];
-
-		wakeidx = pg_atomic_read_u32(&proc->procArrayGroupNext);
-		pg_atomic_write_u32(&proc->procArrayGroupNext, INVALID_PGPROCNO);
-
-		/* ensure all previous writes are visible before follower continues. */
-		pg_write_barrier();
-
-		proc->procArrayGroupMember = false;
+	pgxact->xmin = InvalidTransactionId;
 
-		if (proc != MyProc)
-			PGSemaphoreUnlock(proc->sem);
-	}
+	/* Reset cached variables */
+	resetGlobalXminCache();
 }
 
 /*
@@ -615,38 +309,47 @@ ProcArrayClearTransaction(PGPROC *proc)
 	pgxact->xid = InvalidTransactionId;
 	proc->lxid = InvalidLocalTransactionId;
 	pgxact->xmin = InvalidTransactionId;
+	pgxact->snapshotcsn = InvalidCommitSeqNo;
 	proc->recoveryConflictPending = false;
 
 	/* redundant, but just in case */
 	pgxact->vacuumFlags &= ~PROC_VACUUM_STATE_MASK;
 	pgxact->delayChkpt = false;
 
-	/* Clear the subtransaction-XID cache too */
-	pgxact->nxids = 0;
-	pgxact->overflowed = false;
+	/*
+	 * We don't need to update oldestActiveXid, because the gxact entry in
+	 * the procarray is still running with the same XID.
+	 */
+
+	/* Reset cached variables */
+	RecentGlobalXmin = InvalidTransactionId;
+	RecentGlobalDataXmin = InvalidTransactionId;
 }
 
 /*
  * ProcArrayInitRecovery -- initialize recovery xid mgmt environment
  *
- * Remember up to where the startup process initialized the CLOG and subtrans
+ * Remember up to where the startup process initialized the CLOG and CSNLOG
  * so we can ensure it's initialized gaplessly up to the point where necessary
  * while in recovery.
  */
 void
-ProcArrayInitRecovery(TransactionId initializedUptoXID)
+ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID)
 {
 	Assert(standbyState == STANDBY_INITIALIZED);
 	Assert(TransactionIdIsNormal(initializedUptoXID));
 
 	/*
-	 * we set latestObservedXid to the xid SUBTRANS has been initialized up
+	 * we set latestObservedXid to the xid SUBTRANS (XXX csnlog?) has been initialized up
 	 * to, so we can extend it from that point onwards in
 	 * RecordKnownAssignedTransactionIds, and when we get consistent in
 	 * ProcArrayApplyRecoveryInfo().
 	 */
 	latestObservedXid = initializedUptoXID;
 	TransactionIdRetreat(latestObservedXid);
+
+	/* also initialize oldestActiveXid */
+	pg_atomic_write_u32(&ShmemVariableCache->oldestActiveXid, oldestActiveXID);
 }
 
 /*
@@ -667,20 +370,11 @@ ProcArrayInitRecovery(TransactionId initializedUptoXID)
 void
 ProcArrayApplyRecoveryInfo(RunningTransactions running)
 {
-	TransactionId *xids;
-	int			nxids;
 	TransactionId nextXid;
-	int			i;
 
 	Assert(standbyState >= STANDBY_INITIALIZED);
 	Assert(TransactionIdIsValid(running->nextXid));
 	Assert(TransactionIdIsValid(running->oldestRunningXid));
-	Assert(TransactionIdIsNormal(running->latestCompletedXid));
-
-	/*
-	 * Remove stale transactions, if any.
-	 */
-	ExpireOldKnownAssignedTransactionIds(running->oldestRunningXid);
 
 	/*
 	 * Remove stale locks, if any.
@@ -688,7 +382,7 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 * Locks are always assigned to the toplevel xid so we don't need to care
 	 * about subxcnt/subxids (and by extension not about ->suboverflowed).
 	 */
-	StandbyReleaseOldLocks(running->xcnt, running->xids);
+	StandbyReleaseOldLocks(running->oldestRunningXid);
 
 	/*
 	 * If our snapshot is already valid, nothing else to do...
@@ -696,51 +390,6 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	if (standbyState == STANDBY_SNAPSHOT_READY)
 		return;
 
-	/*
-	 * If our initial RunningTransactionsData had an overflowed snapshot then
-	 * we knew we were missing some subxids from our snapshot. If we continue
-	 * to see overflowed snapshots then we might never be able to start up, so
-	 * we make another test to see if our snapshot is now valid. We know that
-	 * the missing subxids are equal to or earlier than nextXid. After we
-	 * initialise we continue to apply changes during recovery, so once the
-	 * oldestRunningXid is later than the nextXid from the initial snapshot we
-	 * know that we no longer have missing information and can mark the
-	 * snapshot as valid.
-	 */
-	if (standbyState == STANDBY_SNAPSHOT_PENDING)
-	{
-		/*
-		 * If the snapshot isn't overflowed or if its empty we can reset our
-		 * pending state and use this snapshot instead.
-		 */
-		if (!running->subxid_overflow || running->xcnt == 0)
-		{
-			/*
-			 * If we have already collected known assigned xids, we need to
-			 * throw them away before we apply the recovery snapshot.
-			 */
-			KnownAssignedXidsReset();
-			standbyState = STANDBY_INITIALIZED;
-		}
-		else
-		{
-			if (TransactionIdPrecedes(standbySnapshotPendingXmin,
-									  running->oldestRunningXid))
-			{
-				standbyState = STANDBY_SNAPSHOT_READY;
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshots are now enabled");
-			}
-			else
-				elog(trace_recovery(DEBUG1),
-					 "recovery snapshot waiting for non-overflowed snapshot or "
-					 "until oldest active xid on standby is at least %u (now %u)",
-					 standbySnapshotPendingXmin,
-					 running->oldestRunningXid);
-			return;
-		}
-	}
-
 	Assert(standbyState == STANDBY_INITIALIZED);
 
 	/*
@@ -751,78 +400,10 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	 */
 
 	/*
-	 * Nobody else is running yet, but take locks anyhow
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * KnownAssignedXids is sorted so we cannot just add the xids, we have to
-	 * sort them first.
-	 *
-	 * Some of the new xids are top-level xids and some are subtransactions.
-	 * We don't call SubtransSetParent because it doesn't matter yet. If we
-	 * aren't overflowed then all xids will fit in snapshot and so we don't
-	 * need subtrans. If we later overflow, an xid assignment record will add
-	 * xids to subtrans. If RunningXacts is overflowed then we don't have
-	 * enough information to correctly update subtrans anyway.
-	 */
-
-	/*
-	 * Allocate a temporary array to avoid modifying the array passed as
-	 * argument.
-	 */
-	xids = palloc(sizeof(TransactionId) * (running->xcnt + running->subxcnt));
-
-	/*
-	 * Add to the temp array any xids which have not already completed.
-	 */
-	nxids = 0;
-	for (i = 0; i < running->xcnt + running->subxcnt; i++)
-	{
-		TransactionId xid = running->xids[i];
-
-		/*
-		 * The running-xacts snapshot can contain xids that were still visible
-		 * in the procarray when the snapshot was taken, but were already
-		 * WAL-logged as completed. They're not running anymore, so ignore
-		 * them.
-		 */
-		if (TransactionIdDidCommit(xid) || TransactionIdDidAbort(xid))
-			continue;
-
-		xids[nxids++] = xid;
-	}
-
-	if (nxids > 0)
-	{
-		if (procArray->numKnownAssignedXids != 0)
-		{
-			LWLockRelease(ProcArrayLock);
-			elog(ERROR, "KnownAssignedXids is not empty");
-		}
-
-		/*
-		 * Sort the array so that we can add them safely into
-		 * KnownAssignedXids.
-		 */
-		qsort(xids, nxids, sizeof(TransactionId), xidComparator);
-
-		/*
-		 * Add the sorted snapshot into KnownAssignedXids
-		 */
-		for (i = 0; i < nxids; i++)
-			KnownAssignedXidsAdd(xids[i], xids[i], true);
-
-		KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	}
-
-	pfree(xids);
-
-	/*
-	 * latestObservedXid is at least set to the point where SUBTRANS was
+	 * latestObservedXid is at least set to the point where CSNLOG was
 	 * started up to (c.f. ProcArrayInitRecovery()) or to the biggest xid
-	 * RecordKnownAssignedTransactionIds() was called for.  Initialize
-	 * subtrans from thereon, up to nextXid - 1.
+	 * RecordKnownAssignedTransactionIds() (FIXME: gone!) was called for.  Initialize
+	 * csnlog from thereon, up to nextXid - 1.
 	 *
 	 * We need to duplicate parts of RecordKnownAssignedTransactionId() here,
 	 * because we've just added xids to the known assigned xids machinery that
@@ -832,52 +413,11 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 	TransactionIdAdvance(latestObservedXid);
 	while (TransactionIdPrecedes(latestObservedXid, running->nextXid))
 	{
-		ExtendSUBTRANS(latestObservedXid);
+		ExtendCSNLOG(latestObservedXid);
 		TransactionIdAdvance(latestObservedXid);
 	}
 	TransactionIdRetreat(latestObservedXid);	/* = running->nextXid - 1 */
 
-	/* ----------
-	 * Now we've got the running xids we need to set the global values that
-	 * are used to track snapshots as they evolve further.
-	 *
-	 * - latestCompletedXid which will be the xmax for snapshots
-	 * - lastOverflowedXid which shows whether snapshots overflow
-	 * - nextXid
-	 *
-	 * If the snapshot overflowed, then we still initialise with what we know,
-	 * but the recovery snapshot isn't fully valid yet because we know there
-	 * are some subxids missing. We don't know the specific subxids that are
-	 * missing, so conservatively assume the last one is latestObservedXid.
-	 * ----------
-	 */
-	if (running->subxid_overflow)
-	{
-		standbyState = STANDBY_SNAPSHOT_PENDING;
-
-		standbySnapshotPendingXmin = latestObservedXid;
-		procArray->lastOverflowedXid = latestObservedXid;
-	}
-	else
-	{
-		standbyState = STANDBY_SNAPSHOT_READY;
-
-		standbySnapshotPendingXmin = InvalidTransactionId;
-	}
-
-	/*
-	 * If a transaction wrote a commit record in the gap between taking and
-	 * logging the snapshot then latestCompletedXid may already be higher than
-	 * the value from the snapshot, so check before we use the incoming value.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  running->latestCompletedXid))
-		ShmemVariableCache->latestCompletedXid = running->latestCompletedXid;
-
-	Assert(TransactionIdIsNormal(ShmemVariableCache->latestCompletedXid));
-
-	LWLockRelease(ProcArrayLock);
-
 	/*
 	 * ShmemVariableCache->nextXid must be beyond any observed xid.
 	 *
@@ -896,367 +436,213 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
 
 	Assert(TransactionIdIsValid(ShmemVariableCache->nextXid));
 
-	KnownAssignedXidsDisplay(trace_recovery(DEBUG3));
-	if (standbyState == STANDBY_SNAPSHOT_READY)
-		elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
-	else
-		elog(trace_recovery(DEBUG1),
-			 "recovery snapshot waiting for non-overflowed snapshot or "
-			 "until oldest active xid on standby is at least %u (now %u)",
-			 standbySnapshotPendingXmin,
-			 running->oldestRunningXid);
+	standbyState = STANDBY_SNAPSHOT_READY;
+	elog(trace_recovery(DEBUG1), "recovery snapshots are now enabled");
 }
 
 /*
- * ProcArrayApplyXidAssignment
- *		Process an XLOG_XACT_ASSIGNMENT WAL record
+ * TransactionIdIsActive -- is xid the top-level XID of an active backend?
+ *
+ * This ignores prepared transactions and subtransactions, since that's not
+ * needed for current uses.
  */
-void
-ProcArrayApplyXidAssignment(TransactionId topxid,
-							int nsubxids, TransactionId *subxids)
+bool
+TransactionIdIsActive(TransactionId xid)
 {
-	TransactionId max_xid;
+	bool		result = false;
+	ProcArrayStruct *arrayP = procArray;
 	int			i;
 
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	max_xid = TransactionIdLatest(topxid, nsubxids, subxids);
-
-	/*
-	 * Mark all the subtransactions as observed.
-	 *
-	 * NOTE: This will fail if the subxid contains too many previously
-	 * unobserved xids to fit into known-assigned-xids. That shouldn't happen
-	 * as the code stands, because xid-assignment records should never contain
-	 * more than PGPROC_MAX_CACHED_SUBXIDS entries.
-	 */
-	RecordKnownAssignedTransactionIds(max_xid);
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
-	/*
-	 * Notice that we update pg_subtrans with the top-level xid, rather than
-	 * the parent xid. This is a difference between normal processing and
-	 * recovery, yet is still correct in all cases. The reason is that
-	 * subtransaction commit is not marked in clog until commit processing, so
-	 * all aborted subtransactions have already been clearly marked in clog.
-	 * As a result we are able to refer directly to the top-level
-	 * transaction's state rather than skipping through all the intermediate
-	 * states in the subtransaction tree. This should be the first time we
-	 * have attempted to SubTransSetParent().
-	 */
-	for (i = 0; i < nsubxids; i++)
-		SubTransSetParent(subxids[i], topxid);
+	for (i = 0; i < arrayP->numProcs; i++)
+	{
+		int			pgprocno = arrayP->pgprocnos[i];
+		volatile PGPROC *proc = &allProcs[pgprocno];
+		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		TransactionId pxid;
 
-	/* KnownAssignedXids isn't maintained yet, so we're done for now */
-	if (standbyState == STANDBY_INITIALIZED)
-		return;
+		/* Fetch xid just once - see GetNewTransactionId */
+		pxid = pgxact->xid;
 
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		if (!TransactionIdIsValid(pxid))
+			continue;
 
-	/*
-	 * Remove subxids from known-assigned-xacts.
-	 */
-	KnownAssignedXidsRemoveTree(InvalidTransactionId, nsubxids, subxids);
+		if (proc->pid == 0)
+			continue;			/* ignore prepared transactions */
 
-	/*
-	 * Advance lastOverflowedXid to be at least the last of these subxids.
-	 */
-	if (TransactionIdPrecedes(procArray->lastOverflowedXid, max_xid))
-		procArray->lastOverflowedXid = max_xid;
+		if (TransactionIdEquals(pxid, xid))
+		{
+			result = true;
+			break;
+		}
+	}
 
 	LWLockRelease(ProcArrayLock);
+
+	return result;
 }
 
 /*
- * TransactionIdIsInProgress -- is given transaction running in some backend
- *
- * Aside from some shortcuts such as checking RecentXmin and our own Xid,
- * there are four possibilities for finding a running transaction:
- *
- * 1. The given Xid is a main transaction Id.  We will find this out cheaply
- * by looking at the PGXACT struct for each backend.
+ * AdvanceOldestActiveXid --
  *
- * 2. The given Xid is one of the cached subxact Xids in the PGPROC array.
- * We can find this out cheaply too.
- *
- * 3. In Hot Standby mode, we must search the KnownAssignedXids list to see
- * if the Xid is running on the master.
- *
- * 4. Search the SubTrans tree to find the Xid's topmost parent, and then see
- * if that is running according to PGXACT or KnownAssignedXids.  This is the
- * slowest way, but sadly it has to be done always if the others failed,
- * unless we see that the cached subxact sets are complete (none have
- * overflowed).
- *
- * ProcArrayLock has to be held while we do 1, 2, 3.  If we save the top Xids
- * while doing 1 and 3, we can release the ProcArrayLock while we do 4.
- * This buys back some concurrency (and we can't retrieve the main Xids from
- * PGXACT again anyway; see GetNewTransactionId).
+ * Advance oldestActiveXid. 'oldXid' is the current value, and it's known to be
+ * finished now.
  */
-bool
-TransactionIdIsInProgress(TransactionId xid)
+static void
+AdvanceOldestActiveXid(TransactionId myXid)
 {
-	static TransactionId *xids = NULL;
-	int			nxids = 0;
-	ProcArrayStruct *arrayP = procArray;
-	TransactionId topxid;
-	int			i,
-				j;
+	TransactionId nextXid;
+	TransactionId xid;
+	TransactionId oldValue;
 
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.  (Note: in particular, this guarantees that
-	 * we reject InvalidTransactionId, FrozenTransactionId, etc as not
-	 * running.)
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-	{
-		xc_by_recent_xmin_inc();
-		return false;
-	}
+	oldValue = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
 
-	/*
-	 * We may have just checked the status of this transaction, so if it is
-	 * already known to be completed, we can fall out without any access to
-	 * shared memory.
-	 */
-	if (TransactionIdIsKnownCompleted(xid))
-	{
-		xc_by_known_xact_inc();
-		return false;
-	}
+	/* Quick exit if we were not the oldest active XID. */
+	if (myXid != oldValue)
+		return;
 
-	/*
-	 * Also, we can handle our own transaction (and subtransactions) without
-	 * any access to shared memory.
-	 */
-	if (TransactionIdIsCurrentTransactionId(xid))
-	{
-		xc_by_my_xact_inc();
-		return true;
-	}
+	xid = myXid;
+	TransactionIdAdvance(xid);
 
-	/*
-	 * If first time through, get workspace to remember main XIDs in. We
-	 * malloc it permanently to avoid repeated palloc/pfree overhead.
-	 */
-	if (xids == NULL)
+	for (;;)
 	{
 		/*
-		 * In hot standby mode, reserve enough space to hold all xids in the
-		 * known-assigned list. If we later finish recovery, we no longer need
-		 * the bigger array, but we don't bother to shrink it.
+		 * Current nextXid is the upper bound, if there are no transactions
+		 * active at all.
 		 */
-		int			maxxids = RecoveryInProgress() ? TOTAL_MAX_CACHED_SUBXIDS : arrayP->maxProcs;
+		/* assume we can read nextXid atomically without holding XidGenlock. */
+		nextXid = ShmemVariableCache->nextXid;
+		/* Scan the CSN Log for the next active xid */
+		xid = CSNLogGetNextActiveXid(xid, nextXid);
 
-		xids = (TransactionId *) malloc(maxxids * sizeof(TransactionId));
-		if (xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	/*
-	 * Now that we have the lock, we can check latestCompletedXid; if the
-	 * target Xid is after that, it's surely still running.
-	 */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid, xid))
-	{
-		LWLockRelease(ProcArrayLock);
-		xc_by_latest_xid_inc();
-		return true;
-	}
-
-	/* No shortcuts, gotta grovel through the array */
-	for (i = 0; i < arrayP->numProcs; i++)
-	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
-		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
-
-		/* Ignore my own proc --- dealt with it above */
-		if (proc == MyProc)
-			continue;
-
-		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
-
-		if (!TransactionIdIsValid(pxid))
-			continue;
-
-		/*
-		 * Step 1: check the main Xid
-		 */
-		if (TransactionIdEquals(pxid, xid))
+		if (xid == oldValue)
 		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_main_xid_inc();
-			return true;
+			/* nothing more to do */
+			break;
 		}
 
 		/*
-		 * We can ignore main Xids that are younger than the target Xid, since
-		 * the target could not possibly be their child.
-		 */
-		if (TransactionIdPrecedes(xid, pxid))
-			continue;
-
-		/*
-		 * Step 2: check the cached child-Xids arrays
+		 * Update oldestActiveXid with that value.
 		 */
-		for (j = pgxact->nxids - 1; j >= 0; j--)
+		if (!pg_atomic_compare_exchange_u32(&ShmemVariableCache->oldestActiveXid,
+											&oldValue,
+											xid))
 		{
-			/* Fetch xid just once - see GetNewTransactionId */
-			TransactionId cxid = proc->subxids.xids[j];
-
-			if (TransactionIdEquals(cxid, xid))
-			{
-				LWLockRelease(ProcArrayLock);
-				xc_by_child_xid_inc();
-				return true;
-			}
+			/*
+			 * Someone beat us to it. This can happen if we hit the race
+			 * condition described below. That's OK. We're no longer the oldest active
+			 * XID in that case, so we're done.
+			 */
+			Assert(TransactionIdFollows(oldValue, myXid));
+			break;
 		}
 
 		/*
-		 * Save the main Xid for step 4.  We only need to remember main Xids
-		 * that have uncached children.  (Note: there is no race condition
-		 * here because the overflowed flag cannot be cleared, only set, while
-		 * we hold ProcArrayLock.  So we can't miss an Xid that we need to
-		 * worry about.)
+		 * We're not necessarily done yet. It's possible that the XID that we saw
+		 * as still running committed just before we updated oldestActiveXid.
+		 * She didn't see herself as the oldest transaction, so she wouldn't
+		 * update oldestActiveXid. Loop back to check the XID that we saw as
+		 * the oldest in-progress one is still in-progress, and if not, update
+		 * oldestActiveXid again, on behalf of that transaction.
 		 */
-		if (pgxact->overflowed)
-			xids[nxids++] = pxid;
+		oldValue = xid;
 	}
+}
+
+
+/*
+ * This is like GetOldestXmin(NULL, true), but can return slightly stale, cached value.
+ */
+TransactionId
+GetRecentGlobalXmin(void)
+{
+	TransactionId globalXmin;
+	ProcArrayStruct *arrayP = procArray;
+	int			index;
+	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
+	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+
+	if (TransactionIdIsValid(RecentGlobalXmin))
+		return RecentGlobalXmin;
+
+	LWLockAcquire(ProcArrayLock, LW_SHARED);
 
 	/*
-	 * Step 3: in hot standby mode, check the known-assigned-xids list.  XIDs
-	 * in the list must be treated as running.
+	 * We initialize the MIN() calculation with oldestActiveXid. This
+	 * is a lower bound for the XIDs that might appear in the ProcArray later,
+	 * and so protects us against overestimating the result due to future
+	 * additions.
 	 */
-	if (RecoveryInProgress())
+	globalXmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+	Assert(TransactionIdIsNormal(globalXmin));
+
+	for (index = 0; index < arrayP->numProcs; index++)
 	{
-		/* none of the PGXACT entries should have XIDs in hot standby mode */
-		Assert(nxids == 0);
+		int			pgprocno = arrayP->pgprocnos[index];
+		volatile PGXACT *pgxact = &allPgXact[pgprocno];
+		TransactionId xmin = pgxact->xmin;
 
-		if (KnownAssignedXidExists(xid))
-		{
-			LWLockRelease(ProcArrayLock);
-			xc_by_known_assigned_inc();
-			return true;
-		}
+		/*
+		 * Backend is doing logical decoding which manages xmin separately,
+		 * check below.
+		 */
+		if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
+			continue;
+
+		if (pgxact->vacuumFlags & PROC_IN_VACUUM)
+			continue;
 
 		/*
-		 * If the KnownAssignedXids overflowed, we have to check pg_subtrans
-		 * too.  Fetch all xids from KnownAssignedXids that are lower than
-		 * xid, since if xid is a subtransaction its parent will always have a
-		 * lower value.  Note we will collect both main and subXIDs here, but
-		 * there's no help for it.
+		 * Consider the transaction's Xmin, if set.
 		 */
-		if (TransactionIdPrecedesOrEquals(xid, procArray->lastOverflowedXid))
-			nxids = KnownAssignedXidsGet(xids, xid);
+		if (TransactionIdIsNormal(xmin) &&
+			NormalTransactionIdPrecedes(xmin, globalXmin))
+			globalXmin = xmin;
 	}
 
+	/* fetch into volatile var while ProcArrayLock is held */
+	replication_slot_xmin = procArray->replication_slot_xmin;
+	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
+
 	LWLockRelease(ProcArrayLock);
 
-	/*
-	 * If none of the relevant caches overflowed, we know the Xid is not
-	 * running without even looking at pg_subtrans.
-	 */
-	if (nxids == 0)
-	{
-		xc_no_overflow_inc();
-		return false;
-	}
+	/* Update cached variables */
+	RecentGlobalXmin = globalXmin - vacuum_defer_cleanup_age;
+	if (!TransactionIdIsNormal(RecentGlobalXmin))
+		RecentGlobalXmin = FirstNormalTransactionId;
 
-	/*
-	 * Step 4: have to check pg_subtrans.
-	 *
-	 * At this point, we know it's either a subtransaction of one of the Xids
-	 * in xids[], or it's not running.  If it's an already-failed
-	 * subtransaction, we want to say "not running" even though its parent may
-	 * still be running.  So first, check pg_xact to see if it's been aborted.
-	 */
-	xc_slow_answer_inc();
+	/* Check whether there's a replication slot requiring an older xmin. */
+	if (TransactionIdIsValid(replication_slot_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_xmin;
 
-	if (TransactionIdDidAbort(xid))
-		return false;
+	/* Non-catalog tables can be vacuumed if older than this xid */
+	RecentGlobalDataXmin = RecentGlobalXmin;
 
 	/*
-	 * It isn't aborted, so check whether the transaction tree it belongs to
-	 * is still running (or, more precisely, whether it was running when we
-	 * held ProcArrayLock).
+	 * Check whether there's a replication slot requiring an older catalog
+	 * xmin.
 	 */
-	topxid = SubTransGetTopmostTransaction(xid);
-	Assert(TransactionIdIsValid(topxid));
-	if (!TransactionIdEquals(topxid, xid))
-	{
-		for (i = 0; i < nxids; i++)
-		{
-			if (TransactionIdEquals(xids[i], topxid))
-				return true;
-		}
-	}
+	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
+		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
+		RecentGlobalXmin = replication_slot_catalog_xmin;
 
-	return false;
+	return RecentGlobalXmin;
 }
 
-/*
- * TransactionIdIsActive -- is xid the top-level XID of an active backend?
- *
- * This differs from TransactionIdIsInProgress in that it ignores prepared
- * transactions, as well as transactions running on the master if we're in
- * hot standby.  Also, we ignore subtransactions since that's not needed
- * for current uses.
- */
-bool
-TransactionIdIsActive(TransactionId xid)
+TransactionId
+GetRecentGlobalDataXmin(void)
 {
-	bool		result = false;
-	ProcArrayStruct *arrayP = procArray;
-	int			i;
-
-	/*
-	 * Don't bother checking a transaction older than RecentXmin; it could not
-	 * possibly still be running.
-	 */
-	if (TransactionIdPrecedes(xid, RecentXmin))
-		return false;
-
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	for (i = 0; i < arrayP->numProcs; i++)
-	{
-		int			pgprocno = arrayP->pgprocnos[i];
-		volatile PGPROC *proc = &allProcs[pgprocno];
-		volatile PGXACT *pgxact = &allPgXact[pgprocno];
-		TransactionId pxid;
+	if (TransactionIdIsValid(RecentGlobalDataXmin))
+		return RecentGlobalDataXmin;
 
-		/* Fetch xid just once - see GetNewTransactionId */
-		pxid = pgxact->xid;
-
-		if (!TransactionIdIsValid(pxid))
-			continue;
-
-		if (proc->pid == 0)
-			continue;			/* ignore prepared transactions */
-
-		if (TransactionIdEquals(pxid, xid))
-		{
-			result = true;
-			break;
-		}
-	}
-
-	LWLockRelease(ProcArrayLock);
+	(void) GetRecentGlobalXmin();
+	Assert(TransactionIdIsValid(RecentGlobalDataXmin));
 
-	return result;
+	return RecentGlobalDataXmin;
 }
 
-
 /*
  * GetOldestXmin -- returns oldest transaction that was running
  *					when any current transaction was started.
@@ -1279,7 +665,7 @@ TransactionIdIsActive(TransactionId xid)
  * ignore concurrently running lazy VACUUMs because (a) they must be working
  * on other tables, and (b) they don't need to do snapshot-based lookups.
  *
- * This is also used to determine where to truncate pg_subtrans.  For that
+ * This is also used to determine where to truncate pg_csnlog. For that
  * backends in all databases have to be considered, so rel = NULL has to be
  * passed in.
  *
@@ -1310,6 +696,10 @@ TransactionIdIsActive(TransactionId xid)
  * The return value is also adjusted with vacuum_defer_cleanup_age, so
  * increasing that setting on the fly is another easy way to make
  * GetOldestXmin() move backwards, with no consequences for data integrity.
+ *
+ *
+ * XXX: We track GlobalXmin in shared memory now. Would it makes sense to
+ * have GetOldestXmin() just return that? At least for the rel == NULL case.
  */
 TransactionId
 GetOldestXmin(Relation rel, int flags)
@@ -1340,7 +730,7 @@ GetOldestXmin(Relation rel, int flags)
 	 * and so protects us against overestimating the result due to future
 	 * additions.
 	 */
-	result = ShmemVariableCache->latestCompletedXid;
+	result = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
 	Assert(TransactionIdIsNormal(result));
 	TransactionIdAdvance(result);
 
@@ -1383,28 +773,11 @@ GetOldestXmin(Relation rel, int flags)
 	replication_slot_xmin = procArray->replication_slot_xmin;
 	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
 
-	if (RecoveryInProgress())
-	{
-		/*
-		 * Check to see whether KnownAssignedXids contains an xid value older
-		 * than the main procarray.
-		 */
-		TransactionId kaxmin = KnownAssignedXidsGetOldestXmin();
-
-		LWLockRelease(ProcArrayLock);
+	LWLockRelease(ProcArrayLock);
 
-		if (TransactionIdIsNormal(kaxmin) &&
-			TransactionIdPrecedes(kaxmin, result))
-			result = kaxmin;
-	}
-	else
+	if (!RecoveryInProgress())
 	{
 		/*
-		 * No other information needed, so release the lock immediately.
-		 */
-		LWLockRelease(ProcArrayLock);
-
-		/*
 		 * Compute the cutoff XID by subtracting vacuum_defer_cleanup_age,
 		 * being careful not to generate a "permanent" XID.
 		 *
@@ -1448,337 +821,199 @@ GetOldestXmin(Relation rel, int flags)
 }
 
 /*
- * GetMaxSnapshotXidCount -- get max size for snapshot XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotXidCount(void)
-{
-	return procArray->maxProcs;
-}
 
-/*
- * GetMaxSnapshotSubxidCount -- get max size for snapshot sub-XID array
- *
- * We have to export this for use by snapmgr.c.
- */
-int
-GetMaxSnapshotSubxidCount(void)
-{
-	return TOTAL_MAX_CACHED_SUBXIDS;
-}
+oldestActiveXid
+	oldest XID that's currently in-progress
+
+GlobalXmin
+	oldest XID that's *seen* by any active snapshot as still in-progress
+
+latestCompletedXid
+	latest XID that has committed.
+
+CSN
+	current CSN
+
+
+
+Get snapshot:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read oldestActiveXid. Store it in MyProc->xmin
+3. Read CSN
+4. LWLockRelease(ProcArrayLock)
+
+End-of-xact:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Reset MyProc->xmin, xid and CSN
+3. Was my XID == oldestActiveXid? If so, advance oldestActiveXid.
+4. Was my xmin == oldestXmin? If so, advance oldestXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceGlobalXmin:
+
+1. LWLockAcquire(ProcArrayLock, LW_SHARED)
+2. Read current oldestActiveXid. That's the upper bound. If a transaction
+   begins now, that's the xmin it would get.
+3. Scan ProcArray, for the smallest xmin.
+4. Set that as the new GlobalXmin.
+5. LWLockRelease(ProcArrayLock)
+
+AdvanceOldestActiveXid:
+
+Two alternatives: scan the csnlog or scan the procarray. Scanning the
+procarray is tricky: it's possible that a backend has just read nextXid,
+but not set it in MyProc->xid yet.
+
+
+*/
+
+
 
 /*
- * GetSnapshotData -- returns information about running transactions.
- *
- * The returned snapshot includes xmin (lowest still-running xact ID),
- * xmax (highest completed xact ID + 1), and a list of running xact IDs
- * in the range xmin <= xid < xmax.  It is used as follows:
- *		All xact IDs < xmin are considered finished.
- *		All xact IDs >= xmax are considered still running.
- *		For an xact ID xmin <= xid < xmax, consult list to see whether
- *		it is considered running or not.
+ * GetSnapshotData -- returns an MVCC snapshot.
+ *
+ * The crux of the returned snapshot is the current Commit-Sequence-Number.
+ * All transactions that committed before the CSN is considered
+ * as visible to the snapshot, and all transactions that committed at or
+ * later are considered as still-in-progress.
+ *
+ * The returned snapshot also includes xmin (lowest still-running xact ID),
+ * and xmax (highest completed xact ID + 1). They can be used to avoid
+ * the more expensive check against the CSN:
+ *		All xact IDs < xmin are known to be finished.
+ *		All xact IDs >= xmax are known to be still running.
+ *		For an xact ID xmin <= xid < xmax, consult the CSNLOG to see
+ *		whether its CSN is before or after the snapshot's CSN.
+ *
  * This ensures that the set of transactions seen as "running" by the
  * current xact will not change after it takes the snapshot.
  *
- * All running top-level XIDs are included in the snapshot, except for lazy
- * VACUUM processes.  We also try to include running subtransaction XIDs,
- * but since PGPROC has only a limited cache area for subxact XIDs, full
- * information may not be available.  If we find any overflowed subxid arrays,
- * we have to mark the snapshot's subxid data as overflowed, and extra work
- * *may* need to be done to determine what's running (see XidInMVCCSnapshot()
- * in tqual.c).
- *
  * We also update the following backend-global variables:
  *		TransactionXmin: the oldest xmin of any snapshot in use in the
- *			current transaction (this is the same as MyPgXact->xmin).
- *		RecentXmin: the xmin computed for the most recent snapshot.  XIDs
- *			older than this are known not running any more.
+ *			current transaction.
  *		RecentGlobalXmin: the global xmin (oldest TransactionXmin across all
- *			running transactions, except those running LAZY VACUUM).  This is
- *			the same computation done by
- *			GetOldestXmin(NULL, PROCARRAY_FLAGS_VACUUM).
+ *			running transactions, except those running LAZY VACUUM). This
+ *			can be used to opportunistically remove old dead tuples.
  *		RecentGlobalDataXmin: the global xmin for non-catalog tables
  *			>= RecentGlobalXmin
- *
- * Note: this function should probably not be called with an argument that's
- * not statically allocated (see xip allocation below).
  */
 Snapshot
 GetSnapshotData(Snapshot snapshot)
 {
-	ProcArrayStruct *arrayP = procArray;
 	TransactionId xmin;
 	TransactionId xmax;
-	TransactionId globalxmin;
-	int			index;
-	int			count = 0;
-	int			subcount = 0;
-	bool		suboverflowed = false;
-	volatile TransactionId replication_slot_xmin = InvalidTransactionId;
-	volatile TransactionId replication_slot_catalog_xmin = InvalidTransactionId;
+	CommitSeqNo snapshotcsn;
+	bool		takenDuringRecovery;
 
 	Assert(snapshot != NULL);
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * This does open a possibility for avoiding repeated malloc/free: since
-	 * maxProcs does not change at runtime, we can simply reuse the previous
-	 * xip arrays if any.  (This relies on the fact that all callers pass
-	 * static SnapshotData structs.)
+	 * The ProcArrayLock is not needed here. We only set our xmin if
+	 * it's not already set. There are only a few functions that check
+	 * the xmin under exclusive ProcArrayLock:
+	 * 1) ProcArrayInstallRestored/ImportedXmin -- can only care about
+	 * our xmin long after it has been first set.
+	 * 2) ProcArrayEndTransaction is not called concurrently with
+	 * GetSnapshotData.
 	 */
-	if (snapshot->xip == NULL)
+
+	takenDuringRecovery = RecoveryInProgress();
+
+	/* Anything older than oldestActiveXid is surely finished by now. */
+	xmin = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+
+	/* Announce my xmin, to hold back GlobalXmin. */
+	if (!TransactionIdIsValid(MyPgXact->xmin))
 	{
+		TransactionId oldestActiveXid;
+
+		MyPgXact->xmin = xmin;
+
+		/*
+		 * Recheck, if oldestActiveXid advanced after we read it.
+		 *
+		 * This protects against a race condition with AdvanceGlobalXmin().
+		 * If a transaction ends runs AdvanceGlobalXmin(), just after we fetch
+		 * oldestActiveXid, but before we set MyPgXact->xmin, it's possible
+		 * that AdvanceGlobalXmin() computed a new GlobalXmin that doesn't
+		 * cover the xmin that we got. To fix that, check oldestActiveXid
+		 * again, after setting xmin. Redoing it once is enough, we don't need
+		 * to loop, because the (stale) xmin that we set prevents the same
+		 * race condition from advancing oldestXid again.
+		 *
+		 * For a brief moment, we can have the situation that our xmin is
+		 * lower than GlobalXmin, but it's OK because we don't use that xmin
+		 * until we've re-checked and corrected it if necessary.
+		 */
 		/*
-		 * First call for this snapshot. Snapshot is same size whether or not
-		 * we are in recovery, see later comments.
+		 * memory barrier to make sure that setting the xmin in our PGPROC entry
+		 * is made visible to others, before the read below.
 		 */
-		snapshot->xip = (TransactionId *)
-			malloc(GetMaxSnapshotXidCount() * sizeof(TransactionId));
-		if (snapshot->xip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-		Assert(snapshot->subxip == NULL);
-		snapshot->subxip = (TransactionId *)
-			malloc(GetMaxSnapshotSubxidCount() * sizeof(TransactionId));
-		if (snapshot->subxip == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
+		pg_memory_barrier();
+
+		oldestActiveXid  = pg_atomic_read_u32(&ShmemVariableCache->oldestActiveXid);
+		if (oldestActiveXid != xmin)
+		{
+			xmin = oldestActiveXid;
+
+			MyPgXact->xmin = xmin;
+		}
+
+		TransactionXmin = xmin;
 	}
 
 	/*
-	 * It is sufficient to get shared lock on ProcArrayLock, even if we are
-	 * going to set MyPgXact->xmin.
+	 * Get the current snapshot CSN, and copy that to my PGPROC entry. This
+	 * serializes us with any concurrent commits.
 	 */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
-
-	/* xmax is always latestCompletedXid + 1 */
-	xmax = ShmemVariableCache->latestCompletedXid;
+	snapshotcsn = pg_atomic_read_u64(&ShmemVariableCache->nextCommitSeqNo);
+	if (MyPgXact->snapshotcsn == InvalidCommitSeqNo)
+		MyPgXact->snapshotcsn = snapshotcsn;
+	/*
+	 * Also get xmax. It is always latestCompletedXid + 1.
+	 * Make sure to read it after CSN (see TransactionIdAsyncCommitTree())
+	 */
+	pg_read_barrier();
+	xmax = pg_atomic_read_u32(&ShmemVariableCache->latestCompletedXid);
 	Assert(TransactionIdIsNormal(xmax));
 	TransactionIdAdvance(xmax);
 
-	/* initialize xmin calculation with xmax */
-	globalxmin = xmin = xmax;
+	snapshot->xmin = xmin;
+	snapshot->xmax = xmax;
+	snapshot->snapshotcsn = snapshotcsn;
+	snapshot->curcid = GetCurrentCommandId(false);
+	snapshot->takenDuringRecovery = takenDuringRecovery;
 
-	snapshot->takenDuringRecovery = RecoveryInProgress();
+	/*
+	 * This is a new snapshot, so set both refcounts are zero, and mark it as
+	 * not copied in persistent memory.
+	 */
+	snapshot->active_count = 0;
+	snapshot->regd_count = 0;
+	snapshot->copied = false;
 
-	if (!snapshot->takenDuringRecovery)
+	if (old_snapshot_threshold < 0)
 	{
-		int		   *pgprocnos = arrayP->pgprocnos;
-		int			numProcs;
-
 		/*
-		 * Spin over procArray checking xid, xmin, and subxids.  The goal is
-		 * to gather all active xids, find the lowest xmin, and try to record
-		 * subxids.
+		 * If not using "snapshot too old" feature, fill related fields with
+		 * dummy values that don't require any locking.
 		 */
-		numProcs = arrayP->numProcs;
-		for (index = 0; index < numProcs; index++)
-		{
-			int			pgprocno = pgprocnos[index];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			TransactionId xid;
-
-			/*
-			 * Backend is doing logical decoding which manages xmin
-			 * separately, check below.
-			 */
-			if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
-				continue;
-
-			/* Ignore procs running LAZY VACUUM */
-			if (pgxact->vacuumFlags & PROC_IN_VACUUM)
-				continue;
-
-			/* Update globalxmin to be the smallest valid xmin */
-			xid = pgxact->xmin; /* fetch just once */
-			if (TransactionIdIsNormal(xid) &&
-				NormalTransactionIdPrecedes(xid, globalxmin))
-				globalxmin = xid;
-
-			/* Fetch xid just once - see GetNewTransactionId */
-			xid = pgxact->xid;
-
-			/*
-			 * If the transaction has no XID assigned, we can skip it; it
-			 * won't have sub-XIDs either.  If the XID is >= xmax, we can also
-			 * skip it; such transactions will be treated as running anyway
-			 * (and any sub-XIDs will also be >= xmax).
-			 */
-			if (!TransactionIdIsNormal(xid)
-				|| !NormalTransactionIdPrecedes(xid, xmax))
-				continue;
-
-			/*
-			 * We don't include our own XIDs (if any) in the snapshot, but we
-			 * must include them in xmin.
-			 */
-			if (NormalTransactionIdPrecedes(xid, xmin))
-				xmin = xid;
-			if (pgxact == MyPgXact)
-				continue;
-
-			/* Add XID to snapshot. */
-			snapshot->xip[count++] = xid;
-
-			/*
-			 * Save subtransaction XIDs if possible (if we've already
-			 * overflowed, there's no point).  Note that the subxact XIDs must
-			 * be later than their parent, so no need to check them against
-			 * xmin.  We could filter against xmax, but it seems better not to
-			 * do that much work while holding the ProcArrayLock.
-			 *
-			 * The other backend can add more subxids concurrently, but cannot
-			 * remove any.  Hence it's important to fetch nxids just once.
-			 * Should be safe to use memcpy, though.  (We needn't worry about
-			 * missing any xids added concurrently, because they must postdate
-			 * xmax.)
-			 *
-			 * Again, our own XIDs are not included in the snapshot.
-			 */
-			if (!suboverflowed)
-			{
-				if (pgxact->overflowed)
-					suboverflowed = true;
-				else
-				{
-					int			nxids = pgxact->nxids;
-
-					if (nxids > 0)
-					{
-						volatile PGPROC *proc = &allProcs[pgprocno];
-
-						memcpy(snapshot->subxip + subcount,
-							   (void *) proc->subxids.xids,
-							   nxids * sizeof(TransactionId));
-						subcount += nxids;
-					}
-				}
-			}
-		}
-	}
-	else
-	{
-		/*
-		 * We're in hot standby, so get XIDs from KnownAssignedXids.
-		 *
-		 * We store all xids directly into subxip[]. Here's why:
-		 *
-		 * In recovery we don't know which xids are top-level and which are
-		 * subxacts, a design choice that greatly simplifies xid processing.
-		 *
-		 * It seems like we would want to try to put xids into xip[] only, but
-		 * that is fairly small. We would either need to make that bigger or
-		 * to increase the rate at which we WAL-log xid assignment; neither is
-		 * an appealing choice.
-		 *
-		 * We could try to store xids into xip[] first and then into subxip[]
-		 * if there are too many xids. That only works if the snapshot doesn't
-		 * overflow because we do not search subxip[] in that case. A simpler
-		 * way is to just store all xids in the subxact array because this is
-		 * by far the bigger array. We just leave the xip array empty.
-		 *
-		 * Either way we need to change the way XidInMVCCSnapshot() works
-		 * depending upon when the snapshot was taken, or change normal
-		 * snapshot processing so it matches.
-		 *
-		 * Note: It is possible for recovery to end before we finish taking
-		 * the snapshot, and for newly assigned transaction ids to be added to
-		 * the ProcArray.  xmax cannot change while we hold ProcArrayLock, so
-		 * those newly added transaction ids would be filtered away, so we
-		 * need not be concerned about them.
-		 */
-		subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
-												  xmax);
-
-		if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
-			suboverflowed = true;
-	}
-
-
-	/* fetch into volatile var while ProcArrayLock is held */
-	replication_slot_xmin = procArray->replication_slot_xmin;
-	replication_slot_catalog_xmin = procArray->replication_slot_catalog_xmin;
-
-	if (!TransactionIdIsValid(MyPgXact->xmin))
-		MyPgXact->xmin = TransactionXmin = xmin;
-
-	LWLockRelease(ProcArrayLock);
-
-	/*
-	 * Update globalxmin to include actual process xids.  This is a slightly
-	 * different way of computing it than GetOldestXmin uses, but should give
-	 * the same result.
-	 */
-	if (TransactionIdPrecedes(xmin, globalxmin))
-		globalxmin = xmin;
-
-	/* Update global variables too */
-	RecentGlobalXmin = globalxmin - vacuum_defer_cleanup_age;
-	if (!TransactionIdIsNormal(RecentGlobalXmin))
-		RecentGlobalXmin = FirstNormalTransactionId;
-
-	/* Check whether there's a replication slot requiring an older xmin. */
-	if (TransactionIdIsValid(replication_slot_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_xmin;
-
-	/* Non-catalog tables can be vacuumed if older than this xid */
-	RecentGlobalDataXmin = RecentGlobalXmin;
-
-	/*
-	 * Check whether there's a replication slot requiring an older catalog
-	 * xmin.
-	 */
-	if (TransactionIdIsNormal(replication_slot_catalog_xmin) &&
-		NormalTransactionIdPrecedes(replication_slot_catalog_xmin, RecentGlobalXmin))
-		RecentGlobalXmin = replication_slot_catalog_xmin;
-
-	RecentXmin = xmin;
-
-	snapshot->xmin = xmin;
-	snapshot->xmax = xmax;
-	snapshot->xcnt = count;
-	snapshot->subxcnt = subcount;
-	snapshot->suboverflowed = suboverflowed;
-
-	snapshot->curcid = GetCurrentCommandId(false);
-
-	/*
-	 * This is a new snapshot, so set both refcounts are zero, and mark it as
-	 * not copied in persistent memory.
-	 */
-	snapshot->active_count = 0;
-	snapshot->regd_count = 0;
-	snapshot->copied = false;
-
-	if (old_snapshot_threshold < 0)
-	{
-		/*
-		 * If not using "snapshot too old" feature, fill related fields with
-		 * dummy values that don't require any locking.
-		 */
-		snapshot->lsn = InvalidXLogRecPtr;
-		snapshot->whenTaken = 0;
-	}
-	else
-	{
-		/*
-		 * Capture the current time and WAL stream location in case this
-		 * snapshot becomes old enough to need to fall back on the special
-		 * "old snapshot" logic.
-		 */
-		snapshot->lsn = GetXLogInsertRecPtr();
-		snapshot->whenTaken = GetSnapshotCurrentTimestamp();
-		MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
-	}
+		snapshot->lsn = InvalidXLogRecPtr;
+		snapshot->whenTaken = 0;
+	}
+	else
+	{
+		/*
+		 * Capture the current time and WAL stream location in case this
+		 * snapshot becomes old enough to need to fall back on the special
+		 * "old snapshot" logic.
+		 */
+		snapshot->lsn = GetXLogInsertRecPtr();
+		snapshot->whenTaken = GetSnapshotCurrentTimestamp();
+		MaintainOldSnapshotTimeMapping(snapshot->whenTaken, xmin);
+	}
 
 	return snapshot;
 }
@@ -1805,8 +1040,10 @@ ProcArrayInstallImportedXmin(TransactionId xmin,
 	if (!sourcevxid)
 		return false;
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	for (index = 0; index < arrayP->numProcs; index++)
 	{
@@ -1878,8 +1115,10 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 	Assert(TransactionIdIsNormal(xmin));
 	Assert(proc != NULL);
 
-	/* Get lock so source xact can't end while we're doing this */
-	LWLockAcquire(ProcArrayLock, LW_SHARED);
+	/*
+	 * Get exclusive lock so source xact can't end while we're doing this.
+	 */
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	pgxact = &allPgXact[proc->pgprocno];
 
@@ -1906,29 +1145,24 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
 /*
  * GetRunningTransactionData -- returns information about running transactions.
  *
- * Similar to GetSnapshotData but returns more information. We include
- * all PGXACTs with an assigned TransactionId, even VACUUM processes.
+ * Returns the oldest running TransactionId among all backends, even VACUUM
+ * processes.
+ *
+ * We acquire XidGenlock, but the caller is responsible for releasing it.
+ * Acquiring XidGenLock ensures that no new XID can be assigned until
+ * the caller has WAL-logged this snapshot, and releases the lock.
+ * FIXME: this also used to hold ProcArrayLock, to prevent any transactions
+ * from committing until the caller has WAL-logged. I don't think we need
+ * that anymore, but verify.
  *
- * We acquire XidGenLock and ProcArrayLock, but the caller is responsible for
- * releasing them. Acquiring XidGenLock ensures that no new XIDs enter the proc
- * array until the caller has WAL-logged this snapshot, and releases the
- * lock. Acquiring ProcArrayLock ensures that no transactions commit until the
- * lock is released.
+ * Returns the current xmin and xmax, like GetSnapshotData does.
  *
  * The returned data structure is statically allocated; caller should not
  * modify it, and must not assume it is valid past the next call.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
- *
- * Note that if any transaction has overflowed its cached subtransactions
- * then there is no real need include any subtransactions. That isn't a
- * common enough case to worry about optimising the size of the WAL record,
- * and we may wish to see that data for diagnostic purposes anyway.
  */
 RunningTransactions
 GetRunningTransactionData(void)
@@ -1938,52 +1172,18 @@ GetRunningTransactionData(void)
 
 	ProcArrayStruct *arrayP = procArray;
 	RunningTransactions CurrentRunningXacts = &CurrentRunningXactsData;
-	TransactionId latestCompletedXid;
 	TransactionId oldestRunningXid;
-	TransactionId *xids;
 	int			index;
-	int			count;
-	int			subcount;
-	bool		suboverflowed;
 
 	Assert(!RecoveryInProgress());
 
 	/*
-	 * Allocating space for maxProcs xids is usually overkill; numProcs would
-	 * be sufficient.  But it seems better to do the malloc while not holding
-	 * the lock, so we can't look at numProcs.  Likewise, we allocate much
-	 * more subxip storage than is probably needed.
-	 *
-	 * Should only be allocated in bgwriter, since only ever executed during
-	 * checkpoints.
-	 */
-	if (CurrentRunningXacts->xids == NULL)
-	{
-		/*
-		 * First call
-		 */
-		CurrentRunningXacts->xids = (TransactionId *)
-			malloc(TOTAL_MAX_CACHED_SUBXIDS * sizeof(TransactionId));
-		if (CurrentRunningXacts->xids == NULL)
-			ereport(ERROR,
-					(errcode(ERRCODE_OUT_OF_MEMORY),
-					 errmsg("out of memory")));
-	}
-
-	xids = CurrentRunningXacts->xids;
-
-	count = subcount = 0;
-	suboverflowed = false;
-
-	/*
 	 * Ensure that no xids enter or leave the procarray while we obtain
 	 * snapshot.
 	 */
 	LWLockAcquire(ProcArrayLock, LW_SHARED);
 	LWLockAcquire(XidGenLock, LW_SHARED);
 
-	latestCompletedXid = ShmemVariableCache->latestCompletedXid;
-
 	oldestRunningXid = ShmemVariableCache->nextXid;
 
 	/*
@@ -2005,47 +1205,8 @@ GetRunningTransactionData(void)
 		if (!TransactionIdIsValid(xid))
 			continue;
 
-		xids[count++] = xid;
-
 		if (TransactionIdPrecedes(xid, oldestRunningXid))
 			oldestRunningXid = xid;
-
-		if (pgxact->overflowed)
-			suboverflowed = true;
-	}
-
-	/*
-	 * Spin over procArray collecting all subxids, but only if there hasn't
-	 * been a suboverflow.
-	 */
-	if (!suboverflowed)
-	{
-		for (index = 0; index < arrayP->numProcs; index++)
-		{
-			int			pgprocno = arrayP->pgprocnos[index];
-			volatile PGPROC *proc = &allProcs[pgprocno];
-			volatile PGXACT *pgxact = &allPgXact[pgprocno];
-			int			nxids;
-
-			/*
-			 * Save subtransaction XIDs. Other backends can't add or remove
-			 * entries while we're holding XidGenLock.
-			 */
-			nxids = pgxact->nxids;
-			if (nxids > 0)
-			{
-				memcpy(&xids[count], (void *) proc->subxids.xids,
-					   nxids * sizeof(TransactionId));
-				count += nxids;
-				subcount += nxids;
-
-				/*
-				 * Top-level XID of a transaction is always less than any of
-				 * its subxids, so we don't need to check if any of the
-				 * subxids are smaller than oldestRunningXid
-				 */
-			}
-		}
 	}
 
 	/*
@@ -2057,18 +1218,14 @@ GetRunningTransactionData(void)
 	 * increases if slots do.
 	 */
 
-	CurrentRunningXacts->xcnt = count - subcount;
-	CurrentRunningXacts->subxcnt = subcount;
-	CurrentRunningXacts->subxid_overflow = suboverflowed;
 	CurrentRunningXacts->nextXid = ShmemVariableCache->nextXid;
 	CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
-	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
-	Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
 
-	/* We don't release the locks here, the caller is responsible for that */
+	LWLockRelease(ProcArrayLock);
+	/* We don't release XidGenLock here, the caller is responsible for that */
 
 	return CurrentRunningXacts;
 }
@@ -2076,17 +1233,18 @@ GetRunningTransactionData(void)
 /*
  * GetOldestActiveTransactionId()
  *
- * Similar to GetSnapshotData but returns just oldestActiveXid. We include
+ * Returns the oldest XID that's still running. We include
  * all PGXACTs with an assigned TransactionId, even VACUUM processes.
  * We look at all databases, though there is no need to include WALSender
  * since this has no effect on hot standby conflicts.
  *
- * This is never executed during recovery so there is no need to look at
- * KnownAssignedXids.
- *
  * We don't worry about updating other counters, we want to keep this as
  * simple as possible and leave GetSnapshotData() as the primary code for
  * that bookkeeping.
+ *
+ * XXX: We could just use return ShmemVariableCache->oldestActiveXid. this
+ * uses a different method of computing the value though, so maybe this is
+ * useful as a cross-check?
  */
 TransactionId
 GetOldestActiveTransactionId(void)
@@ -2541,7 +1699,7 @@ GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
  *
  * All callers that are checking xmins always now supply a valid and useful
  * value for limitXmin. The limitXmin is always lower than the lowest
- * numbered KnownAssignedXid that is not already a FATAL error. This is
+ * numbered KnownAssignedXid (XXX) that is not already a FATAL error. This is
  * because we only care about cleanup records that are cleaning up tuple
  * versions from committed transactions. In that case they will only occur
  * at the point where the record is less than the lowest running xid. That
@@ -2997,170 +2155,9 @@ ProcArrayGetReplicationSlotXmin(TransactionId *xmin,
 	LWLockRelease(ProcArrayLock);
 }
 
-
-#define XidCacheRemove(i) \
-	do { \
-		MyProc->subxids.xids[i] = MyProc->subxids.xids[MyPgXact->nxids - 1]; \
-		MyPgXact->nxids--; \
-	} while (0)
-
-/*
- * XidCacheRemoveRunningXids
- *
- * Remove a bunch of TransactionIds from the list of known-running
- * subtransactions for my backend.  Both the specified xid and those in
- * the xids[] array (of length nxids) are removed from the subxids cache.
- * latestXid must be the latest XID among the group.
- */
-void
-XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid)
-{
-	int			i,
-				j;
-
-	Assert(TransactionIdIsValid(xid));
-
-	/*
-	 * We must hold ProcArrayLock exclusively in order to remove transactions
-	 * from the PGPROC array.  (See src/backend/access/transam/README.)  It's
-	 * possible this could be relaxed since we know this routine is only used
-	 * to abort subtransactions, but pending closer analysis we'd best be
-	 * conservative.
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	/*
-	 * Under normal circumstances xid and xids[] will be in increasing order,
-	 * as will be the entries in subxids.  Scan backwards to avoid O(N^2)
-	 * behavior when removing a lot of xids.
-	 */
-	for (i = nxids - 1; i >= 0; i--)
-	{
-		TransactionId anxid = xids[i];
-
-		for (j = MyPgXact->nxids - 1; j >= 0; j--)
-		{
-			if (TransactionIdEquals(MyProc->subxids.xids[j], anxid))
-			{
-				XidCacheRemove(j);
-				break;
-			}
-		}
-
-		/*
-		 * Ordinarily we should have found it, unless the cache has
-		 * overflowed. However it's also possible for this routine to be
-		 * invoked multiple times for the same subtransaction, in case of an
-		 * error during AbortSubTransaction.  So instead of Assert, emit a
-		 * debug warning.
-		 */
-		if (j < 0 && !MyPgXact->overflowed)
-			elog(WARNING, "did not find subXID %u in MyProc", anxid);
-	}
-
-	for (j = MyPgXact->nxids - 1; j >= 0; j--)
-	{
-		if (TransactionIdEquals(MyProc->subxids.xids[j], xid))
-		{
-			XidCacheRemove(j);
-			break;
-		}
-	}
-	/* Ordinarily we should have found it, unless the cache has overflowed */
-	if (j < 0 && !MyPgXact->overflowed)
-		elog(WARNING, "did not find subXID %u in MyProc", xid);
-
-	/* Also advance global latestCompletedXid while holding the lock */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  latestXid))
-		ShmemVariableCache->latestCompletedXid = latestXid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-#ifdef XIDCACHE_DEBUG
-
-/*
- * Print stats about effectiveness of XID cache
- */
-static void
-DisplayXidCache(void)
-{
-	fprintf(stderr,
-			"XidCache: xmin: %ld, known: %ld, myxact: %ld, latest: %ld, mainxid: %ld, childxid: %ld, knownassigned: %ld, nooflo: %ld, slow: %ld\n",
-			xc_by_recent_xmin,
-			xc_by_known_xact,
-			xc_by_my_xact,
-			xc_by_latest_xid,
-			xc_by_main_xid,
-			xc_by_child_xid,
-			xc_by_known_assigned,
-			xc_no_overflow,
-			xc_slow_answer);
-}
-#endif							/* XIDCACHE_DEBUG */
-
-
-/* ----------------------------------------------
- *		KnownAssignedTransactions sub-module
- * ----------------------------------------------
- */
-
-/*
- * In Hot Standby mode, we maintain a list of transactions that are (or were)
- * running in the master at the current point in WAL.  These XIDs must be
- * treated as running by standby transactions, even though they are not in
- * the standby server's PGXACT array.
- *
- * We record all XIDs that we know have been assigned.  That includes all the
- * XIDs seen in WAL records, plus all unobserved XIDs that we can deduce have
- * been assigned.  We can deduce the existence of unobserved XIDs because we
- * know XIDs are assigned in sequence, with no gaps.  The KnownAssignedXids
- * list expands as new XIDs are observed or inferred, and contracts when
- * transaction completion records arrive.
- *
- * During hot standby we do not fret too much about the distinction between
- * top-level XIDs and subtransaction XIDs. We store both together in the
- * KnownAssignedXids list.  In backends, this is copied into snapshots in
- * GetSnapshotData(), taking advantage of the fact that XidInMVCCSnapshot()
- * doesn't care about the distinction either.  Subtransaction XIDs are
- * effectively treated as top-level XIDs and in the typical case pg_subtrans
- * links are *not* maintained (which does not affect visibility).
- *
- * We have room in KnownAssignedXids and in snapshots to hold maxProcs *
- * (1 + PGPROC_MAX_CACHED_SUBXIDS) XIDs, so every master transaction must
- * report its subtransaction XIDs in a WAL XLOG_XACT_ASSIGNMENT record at
- * least every PGPROC_MAX_CACHED_SUBXIDS.  When we receive one of these
- * records, we mark the subXIDs as children of the top XID in pg_subtrans,
- * and then remove them from KnownAssignedXids.  This prevents overflow of
- * KnownAssignedXids and snapshots, at the cost that status checks for these
- * subXIDs will take a slower path through TransactionIdIsInProgress().
- * This means that KnownAssignedXids is not necessarily complete for subXIDs,
- * though it should be complete for top-level XIDs; this is the same situation
- * that holds with respect to the PGPROC entries in normal running.
- *
- * When we throw away subXIDs from KnownAssignedXids, we need to keep track of
- * that, similarly to tracking overflow of a PGPROC's subxids array.  We do
- * that by remembering the lastOverflowedXID, ie the last thrown-away subXID.
- * As long as that is within the range of interesting XIDs, we have to assume
- * that subXIDs are missing from snapshots.  (Note that subXID overflow occurs
- * on primary when 65th subXID arrives, whereas on standby it occurs when 64th
- * subXID arrives - that is not an error.)
- *
- * Should a backend on primary somehow disappear before it can write an abort
- * record, then we just leave those XIDs in KnownAssignedXids. They actually
- * aborted but we think they were running; the distinction is irrelevant
- * because either way any changes done by the transaction are not visible to
- * backends in the standby.  We prune KnownAssignedXids when
- * XLOG_RUNNING_XACTS arrives, to forestall possible overflow of the
- * array due to such dead XIDs.
- */
-
 /*
  * RecordKnownAssignedTransactionIds
- *		Record the given XID in KnownAssignedXids, as well as any preceding
+ *		Record the given XID in KnownAssignedXids (FIXME: update comment, KnownAssignedXid is no more), as well as any preceding
  *		unobserved XIDs.
  *
  * RecordKnownAssignedTransactionIds() should be run for *every* WAL record
@@ -3189,7 +2186,7 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		TransactionId next_expected_xid;
 
 		/*
-		 * Extend subtrans like we do in GetNewTransactionId() during normal
+		 * Extend csnlog like we do in GetNewTransactionId() during normal
 		 * operation using individual extend steps. Note that we do not need
 		 * to extend clog since its extensions are WAL logged.
 		 *
@@ -3201,28 +2198,11 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		while (TransactionIdPrecedes(next_expected_xid, xid))
 		{
 			TransactionIdAdvance(next_expected_xid);
-			ExtendSUBTRANS(next_expected_xid);
+			ExtendCSNLOG(next_expected_xid);
 		}
 		Assert(next_expected_xid == xid);
 
 		/*
-		 * If the KnownAssignedXids machinery isn't up yet, there's nothing
-		 * more to do since we don't track assigned xids yet.
-		 */
-		if (standbyState <= STANDBY_INITIALIZED)
-		{
-			latestObservedXid = xid;
-			return;
-		}
-
-		/*
-		 * Add (latestObservedXid, xid] onto the KnownAssignedXids array.
-		 */
-		next_expected_xid = latestObservedXid;
-		TransactionIdAdvance(next_expected_xid);
-		KnownAssignedXidsAdd(next_expected_xid, xid, false);
-
-		/*
 		 * Now we can advance latestObservedXid
 		 */
 		latestObservedXid = xid;
@@ -3235,726 +2215,3 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
 		LWLockRelease(XidGenLock);
 	}
 }
-
-/*
- * ExpireTreeKnownAssignedTransactionIds
- *		Remove the given XIDs from KnownAssignedXids.
- *
- * Called during recovery in analogy with and in place of ProcArrayEndTransaction()
- */
-void
-ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
-									  TransactionId *subxids, TransactionId max_xid)
-{
-	Assert(standbyState >= STANDBY_INITIALIZED);
-
-	/*
-	 * Uses same locking as transaction commit
-	 */
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	KnownAssignedXidsRemoveTree(xid, nsubxids, subxids);
-
-	/* As in ProcArrayEndTransaction, advance latestCompletedXid */
-	if (TransactionIdPrecedes(ShmemVariableCache->latestCompletedXid,
-							  max_xid))
-		ShmemVariableCache->latestCompletedXid = max_xid;
-
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireAllKnownAssignedTransactionIds
- *		Remove all entries in KnownAssignedXids
- */
-void
-ExpireAllKnownAssignedTransactionIds(void)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(InvalidTransactionId);
-	LWLockRelease(ProcArrayLock);
-}
-
-/*
- * ExpireOldKnownAssignedTransactionIds
- *		Remove KnownAssignedXids entries preceding the given XID
- */
-void
-ExpireOldKnownAssignedTransactionIds(TransactionId xid)
-{
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-	KnownAssignedXidsRemovePreceding(xid);
-	LWLockRelease(ProcArrayLock);
-}
-
-
-/*
- * Private module functions to manipulate KnownAssignedXids
- *
- * There are 5 main uses of the KnownAssignedXids data structure:
- *
- *	* backends taking snapshots - all valid XIDs need to be copied out
- *	* backends seeking to determine presence of a specific XID
- *	* startup process adding new known-assigned XIDs
- *	* startup process removing specific XIDs as transactions end
- *	* startup process pruning array when special WAL records arrive
- *
- * This data structure is known to be a hot spot during Hot Standby, so we
- * go to some lengths to make these operations as efficient and as concurrent
- * as possible.
- *
- * The XIDs are stored in an array in sorted order --- TransactionIdPrecedes
- * order, to be exact --- to allow binary search for specific XIDs.  Note:
- * in general TransactionIdPrecedes would not provide a total order, but
- * we know that the entries present at any instant should not extend across
- * a large enough fraction of XID space to wrap around (the master would
- * shut down for fear of XID wrap long before that happens).  So it's OK to
- * use TransactionIdPrecedes as a binary-search comparator.
- *
- * It's cheap to maintain the sortedness during insertions, since new known
- * XIDs are always reported in XID order; we just append them at the right.
- *
- * To keep individual deletions cheap, we need to allow gaps in the array.
- * This is implemented by marking array elements as valid or invalid using
- * the parallel boolean array KnownAssignedXidsValid[].  A deletion is done
- * by setting KnownAssignedXidsValid[i] to false, *without* clearing the
- * XID entry itself.  This preserves the property that the XID entries are
- * sorted, so we can do binary searches easily.  Periodically we compress
- * out the unused entries; that's much cheaper than having to compress the
- * array immediately on every deletion.
- *
- * The actually valid items in KnownAssignedXids[] and KnownAssignedXidsValid[]
- * are those with indexes tail <= i < head; items outside this subscript range
- * have unspecified contents.  When head reaches the end of the array, we
- * force compression of unused entries rather than wrapping around, since
- * allowing wraparound would greatly complicate the search logic.  We maintain
- * an explicit tail pointer so that pruning of old XIDs can be done without
- * immediately moving the array contents.  In most cases only a small fraction
- * of the array contains valid entries at any instant.
- *
- * Although only the startup process can ever change the KnownAssignedXids
- * data structure, we still need interlocking so that standby backends will
- * not observe invalid intermediate states.  The convention is that backends
- * must hold shared ProcArrayLock to examine the array.  To remove XIDs from
- * the array, the startup process must hold ProcArrayLock exclusively, for
- * the usual transactional reasons (compare commit/abort of a transaction
- * during normal running).  Compressing unused entries out of the array
- * likewise requires exclusive lock.  To add XIDs to the array, we just insert
- * them into slots to the right of the head pointer and then advance the head
- * pointer.  This wouldn't require any lock at all, except that on machines
- * with weak memory ordering we need to be careful that other processors
- * see the array element changes before they see the head pointer change.
- * We handle this by using a spinlock to protect reads and writes of the
- * head/tail pointers.  (We could dispense with the spinlock if we were to
- * create suitable memory access barrier primitives and use those instead.)
- * The spinlock must be taken to read or write the head/tail pointers unless
- * the caller holds ProcArrayLock exclusively.
- *
- * Algorithmic analysis:
- *
- * If we have a maximum of M slots, with N XIDs currently spread across
- * S elements then we have N <= S <= M always.
- *
- *	* Adding a new XID is O(1) and needs little locking (unless compression
- *		must happen)
- *	* Compressing the array is O(S) and requires exclusive lock
- *	* Removing an XID is O(logS) and requires exclusive lock
- *	* Taking a snapshot is O(S) and requires shared lock
- *	* Checking for an XID is O(logS) and requires shared lock
- *
- * In comparison, using a hash table for KnownAssignedXids would mean that
- * taking snapshots would be O(M). If we can maintain S << M then the
- * sorted array technique will deliver significantly faster snapshots.
- * If we try to keep S too small then we will spend too much time compressing,
- * so there is an optimal point for any workload mix. We use a heuristic to
- * decide when to compress the array, though trimming also helps reduce
- * frequency of compressing. The heuristic requires us to track the number of
- * currently valid XIDs in the array.
- */
-
-
-/*
- * Compress KnownAssignedXids by shifting valid data down to the start of the
- * array, removing any gaps.
- *
- * A compression step is forced if "force" is true, otherwise we do it
- * only if a heuristic indicates it's a good time to do it.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsCompress(bool force)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			head,
-				tail;
-	int			compress_index;
-	int			i;
-
-	/* no spinlock required since we hold ProcArrayLock exclusively */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	if (!force)
-	{
-		/*
-		 * If we can choose how much to compress, use a heuristic to avoid
-		 * compressing too often or not often enough.
-		 *
-		 * Heuristic is if we have a large enough current spread and less than
-		 * 50% of the elements are currently in use, then compress. This
-		 * should ensure we compress fairly infrequently. We could compress
-		 * less often though the virtual array would spread out more and
-		 * snapshots would become more expensive.
-		 */
-		int			nelements = head - tail;
-
-		if (nelements < 4 * PROCARRAY_MAXPROCS ||
-			nelements < 2 * pArray->numKnownAssignedXids)
-			return;
-	}
-
-	/*
-	 * We compress the array by reading the valid values from tail to head,
-	 * re-aligning data to 0th element.
-	 */
-	compress_index = 0;
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			KnownAssignedXids[compress_index] = KnownAssignedXids[i];
-			KnownAssignedXidsValid[compress_index] = true;
-			compress_index++;
-		}
-	}
-
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = compress_index;
-}
-
-/*
- * Add xids into KnownAssignedXids at the head of the array.
- *
- * xids from from_xid to to_xid, inclusive, are added to the array.
- *
- * If exclusive_lock is true then caller already holds ProcArrayLock in
- * exclusive mode, so we need no extra locking here.  Else caller holds no
- * lock, so we need to be sure we maintain sufficient interlocks against
- * concurrent readers.  (Only the startup process ever calls this, so no need
- * to worry about concurrent writers.)
- */
-static void
-KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
-					 bool exclusive_lock)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	TransactionId next_xid;
-	int			head,
-				tail;
-	int			nxids;
-	int			i;
-
-	Assert(TransactionIdPrecedesOrEquals(from_xid, to_xid));
-
-	/*
-	 * Calculate how many array slots we'll need.  Normally this is cheap; in
-	 * the unusual case where the XIDs cross the wrap point, we do it the hard
-	 * way.
-	 */
-	if (to_xid >= from_xid)
-		nxids = to_xid - from_xid + 1;
-	else
-	{
-		nxids = 1;
-		next_xid = from_xid;
-		while (TransactionIdPrecedes(next_xid, to_xid))
-		{
-			nxids++;
-			TransactionIdAdvance(next_xid);
-		}
-	}
-
-	/*
-	 * Since only the startup process modifies the head/tail pointers, we
-	 * don't need a lock to read them here.
-	 */
-	head = pArray->headKnownAssignedXids;
-	tail = pArray->tailKnownAssignedXids;
-
-	Assert(head >= 0 && head <= pArray->maxKnownAssignedXids);
-	Assert(tail >= 0 && tail < pArray->maxKnownAssignedXids);
-
-	/*
-	 * Verify that insertions occur in TransactionId sequence.  Note that even
-	 * if the last existing element is marked invalid, it must still have a
-	 * correctly sequenced XID value.
-	 */
-	if (head > tail &&
-		TransactionIdFollowsOrEquals(KnownAssignedXids[head - 1], from_xid))
-	{
-		KnownAssignedXidsDisplay(LOG);
-		elog(ERROR, "out-of-order XID insertion in KnownAssignedXids");
-	}
-
-	/*
-	 * If our xids won't fit in the remaining space, compress out free space
-	 */
-	if (head + nxids > pArray->maxKnownAssignedXids)
-	{
-		/* must hold lock to compress */
-		if (!exclusive_lock)
-			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-		KnownAssignedXidsCompress(true);
-
-		head = pArray->headKnownAssignedXids;
-		/* note: we no longer care about the tail pointer */
-
-		if (!exclusive_lock)
-			LWLockRelease(ProcArrayLock);
-
-		/*
-		 * If it still won't fit then we're out of memory
-		 */
-		if (head + nxids > pArray->maxKnownAssignedXids)
-			elog(ERROR, "too many KnownAssignedXids");
-	}
-
-	/* Now we can insert the xids into the space starting at head */
-	next_xid = from_xid;
-	for (i = 0; i < nxids; i++)
-	{
-		KnownAssignedXids[head] = next_xid;
-		KnownAssignedXidsValid[head] = true;
-		TransactionIdAdvance(next_xid);
-		head++;
-	}
-
-	/* Adjust count of number of valid entries */
-	pArray->numKnownAssignedXids += nxids;
-
-	/*
-	 * Now update the head pointer.  We use a spinlock to protect this
-	 * pointer, not because the update is likely to be non-atomic, but to
-	 * ensure that other processors see the above array updates before they
-	 * see the head pointer change.
-	 *
-	 * If we're holding ProcArrayLock exclusively, there's no need to take the
-	 * spinlock.
-	 */
-	if (exclusive_lock)
-		pArray->headKnownAssignedXids = head;
-	else
-	{
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		pArray->headKnownAssignedXids = head;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-}
-
-/*
- * KnownAssignedXidsSearch
- *
- * Searches KnownAssignedXids for a specific xid and optionally removes it.
- * Returns true if it was found, false if not.
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- * Exclusive lock must be held for remove = true.
- */
-static bool
-KnownAssignedXidsSearch(TransactionId xid, bool remove)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			first,
-				last;
-	int			head;
-	int			tail;
-	int			result_index = -1;
-
-	if (remove)
-	{
-		/* we hold ProcArrayLock exclusively, so no need for spinlock */
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-	}
-	else
-	{
-		/* take spinlock to ensure we see up-to-date array contents */
-		SpinLockAcquire(&pArray->known_assigned_xids_lck);
-		tail = pArray->tailKnownAssignedXids;
-		head = pArray->headKnownAssignedXids;
-		SpinLockRelease(&pArray->known_assigned_xids_lck);
-	}
-
-	/*
-	 * Standard binary search.  Note we can ignore the KnownAssignedXidsValid
-	 * array here, since even invalid entries will contain sorted XIDs.
-	 */
-	first = tail;
-	last = head - 1;
-	while (first <= last)
-	{
-		int			mid_index;
-		TransactionId mid_xid;
-
-		mid_index = (first + last) / 2;
-		mid_xid = KnownAssignedXids[mid_index];
-
-		if (xid == mid_xid)
-		{
-			result_index = mid_index;
-			break;
-		}
-		else if (TransactionIdPrecedes(xid, mid_xid))
-			last = mid_index - 1;
-		else
-			first = mid_index + 1;
-	}
-
-	if (result_index < 0)
-		return false;			/* not in array */
-
-	if (!KnownAssignedXidsValid[result_index])
-		return false;			/* in array, but invalid */
-
-	if (remove)
-	{
-		KnownAssignedXidsValid[result_index] = false;
-
-		pArray->numKnownAssignedXids--;
-		Assert(pArray->numKnownAssignedXids >= 0);
-
-		/*
-		 * If we're removing the tail element then advance tail pointer over
-		 * any invalid elements.  This will speed future searches.
-		 */
-		if (result_index == tail)
-		{
-			tail++;
-			while (tail < head && !KnownAssignedXidsValid[tail])
-				tail++;
-			if (tail >= head)
-			{
-				/* Array is empty, so we can reset both pointers */
-				pArray->headKnownAssignedXids = 0;
-				pArray->tailKnownAssignedXids = 0;
-			}
-			else
-			{
-				pArray->tailKnownAssignedXids = tail;
-			}
-		}
-	}
-
-	return true;
-}
-
-/*
- * Is the specified XID present in KnownAssignedXids[]?
- *
- * Caller must hold ProcArrayLock in shared or exclusive mode.
- */
-static bool
-KnownAssignedXidExists(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	return KnownAssignedXidsSearch(xid, false);
-}
-
-/*
- * Remove the specified XID from KnownAssignedXids[].
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemove(TransactionId xid)
-{
-	Assert(TransactionIdIsValid(xid));
-
-	elog(trace_recovery(DEBUG4), "remove KnownAssignedXid %u", xid);
-
-	/*
-	 * Note: we cannot consider it an error to remove an XID that's not
-	 * present.  We intentionally remove subxact IDs while processing
-	 * XLOG_XACT_ASSIGNMENT, to avoid array overflow.  Then those XIDs will be
-	 * removed again when the top-level xact commits or aborts.
-	 *
-	 * It might be possible to track such XIDs to distinguish this case from
-	 * actual errors, but it would be complicated and probably not worth it.
-	 * So, just ignore the search result.
-	 */
-	(void) KnownAssignedXidsSearch(xid, true);
-}
-
-/*
- * KnownAssignedXidsRemoveTree
- *		Remove xid (if it's not InvalidTransactionId) and all the subxids.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
-							TransactionId *subxids)
-{
-	int			i;
-
-	if (TransactionIdIsValid(xid))
-		KnownAssignedXidsRemove(xid);
-
-	for (i = 0; i < nsubxids; i++)
-		KnownAssignedXidsRemove(subxids[i]);
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * Prune KnownAssignedXids up to, but *not* including xid. If xid is invalid
- * then clear the whole table.
- *
- * Caller must hold ProcArrayLock in exclusive mode.
- */
-static void
-KnownAssignedXidsRemovePreceding(TransactionId removeXid)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	int			count = 0;
-	int			head,
-				tail,
-				i;
-
-	if (!TransactionIdIsValid(removeXid))
-	{
-		elog(trace_recovery(DEBUG4), "removing all KnownAssignedXids");
-		pArray->numKnownAssignedXids = 0;
-		pArray->headKnownAssignedXids = pArray->tailKnownAssignedXids = 0;
-		return;
-	}
-
-	elog(trace_recovery(DEBUG4), "prune KnownAssignedXids to %u", removeXid);
-
-	/*
-	 * Mark entries invalid starting at the tail.  Since array is sorted, we
-	 * can stop as soon as we reach an entry >= removeXid.
-	 */
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			if (TransactionIdFollowsOrEquals(knownXid, removeXid))
-				break;
-
-			if (!StandbyTransactionIdIsPrepared(knownXid))
-			{
-				KnownAssignedXidsValid[i] = false;
-				count++;
-			}
-		}
-	}
-
-	pArray->numKnownAssignedXids -= count;
-	Assert(pArray->numKnownAssignedXids >= 0);
-
-	/*
-	 * Advance the tail pointer if we've marked the tail item invalid.
-	 */
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-			break;
-	}
-	if (i >= head)
-	{
-		/* Array is empty, so we can reset both pointers */
-		pArray->headKnownAssignedXids = 0;
-		pArray->tailKnownAssignedXids = 0;
-	}
-	else
-	{
-		pArray->tailKnownAssignedXids = i;
-	}
-
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
-}
-
-/*
- * KnownAssignedXidsGet - Get an array of xids by scanning KnownAssignedXids.
- * We filter out anything >= xmax.
- *
- * Returns the number of XIDs stored into xarray[].  Caller is responsible
- * that array is large enough.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGet(TransactionId *xarray, TransactionId xmax)
-{
-	TransactionId xtmp = InvalidTransactionId;
-
-	return KnownAssignedXidsGetAndSetXmin(xarray, &xtmp, xmax);
-}
-
-/*
- * KnownAssignedXidsGetAndSetXmin - as KnownAssignedXidsGet, plus
- * we reduce *xmin to the lowest xid value seen if not already lower.
- *
- * Caller must hold ProcArrayLock in (at least) shared mode.
- */
-static int
-KnownAssignedXidsGetAndSetXmin(TransactionId *xarray, TransactionId *xmin,
-							   TransactionId xmax)
-{
-	int			count = 0;
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop. We can stop
-	 * once we reach the initially seen head, since we are certain that an xid
-	 * cannot enter and then leave the array while we hold ProcArrayLock.  We
-	 * might miss newly-added xids, but they should be >= xmax so irrelevant
-	 * anyway.
-	 *
-	 * Must take spinlock to ensure we see up-to-date array contents.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-		{
-			TransactionId knownXid = KnownAssignedXids[i];
-
-			/*
-			 * Update xmin if required.  Only the first XID need be checked,
-			 * since the array is sorted.
-			 */
-			if (count == 0 &&
-				TransactionIdPrecedes(knownXid, *xmin))
-				*xmin = knownXid;
-
-			/*
-			 * Filter out anything >= xmax, again relying on sorted property
-			 * of array.
-			 */
-			if (TransactionIdIsValid(xmax) &&
-				TransactionIdFollowsOrEquals(knownXid, xmax))
-				break;
-
-			/* Add knownXid into output array */
-			xarray[count++] = knownXid;
-		}
-	}
-
-	return count;
-}
-
-/*
- * Get oldest XID in the KnownAssignedXids array, or InvalidTransactionId
- * if nothing there.
- */
-static TransactionId
-KnownAssignedXidsGetOldestXmin(void)
-{
-	int			head,
-				tail;
-	int			i;
-
-	/*
-	 * Fetch head just once, since it may change while we loop.
-	 */
-	SpinLockAcquire(&procArray->known_assigned_xids_lck);
-	tail = procArray->tailKnownAssignedXids;
-	head = procArray->headKnownAssignedXids;
-	SpinLockRelease(&procArray->known_assigned_xids_lck);
-
-	for (i = tail; i < head; i++)
-	{
-		/* Skip any gaps in the array */
-		if (KnownAssignedXidsValid[i])
-			return KnownAssignedXids[i];
-	}
-
-	return InvalidTransactionId;
-}
-
-/*
- * Display KnownAssignedXids to provide debug trail
- *
- * Currently this is only called within startup process, so we need no
- * special locking.
- *
- * Note this is pretty expensive, and much of the expense will be incurred
- * even if the elog message will get discarded.  It's not currently called
- * in any performance-critical places, however, so no need to be tenser.
- */
-static void
-KnownAssignedXidsDisplay(int trace_level)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-	StringInfoData buf;
-	int			head,
-				tail,
-				i;
-	int			nxids = 0;
-
-	tail = pArray->tailKnownAssignedXids;
-	head = pArray->headKnownAssignedXids;
-
-	initStringInfo(&buf);
-
-	for (i = tail; i < head; i++)
-	{
-		if (KnownAssignedXidsValid[i])
-		{
-			nxids++;
-			appendStringInfo(&buf, "[%d]=%u ", i, KnownAssignedXids[i]);
-		}
-	}
-
-	elog(trace_level, "%d KnownAssignedXids (num=%d tail=%d head=%d) %s",
-		 nxids,
-		 pArray->numKnownAssignedXids,
-		 pArray->tailKnownAssignedXids,
-		 pArray->headKnownAssignedXids,
-		 buf.data);
-
-	pfree(buf.data);
-}
-
-/*
- * KnownAssignedXidsReset
- *		Resets KnownAssignedXids to be empty
- */
-static void
-KnownAssignedXidsReset(void)
-{
-	/* use volatile pointer to prevent code rearrangement */
-	volatile ProcArrayStruct *pArray = procArray;
-
-	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
-
-	pArray->numKnownAssignedXids = 0;
-	pArray->tailKnownAssignedXids = 0;
-	pArray->headKnownAssignedXids = 0;
-
-	LWLockRelease(ProcArrayLock);
-}
diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index 22522676f3..476ec5b9c5 100644
--- a/src/backend/storage/ipc/shmem.c
+++ b/src/backend/storage/ipc/shmem.c
@@ -65,7 +65,7 @@
 
 #include "postgres.h"
 
-#include "access/transam.h"
+#include "access/mvccvars.h"
 #include "miscadmin.h"
 #include "storage/lwlock.h"
 #include "storage/pg_shmem.h"
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index d491ece60a..0ee15efaff 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -101,9 +101,6 @@ InitRecoveryTransactionEnvironment(void)
 void
 ShutdownRecoveryTransactionEnvironment(void)
 {
-	/* Mark all tracked in-progress transactions as finished. */
-	ExpireAllKnownAssignedTransactionIds();
-
 	/* Release all locks the tracked transactions were holding */
 	StandbyReleaseAllLocks();
 
@@ -309,7 +306,7 @@ ResolveRecoveryConflictWithTablespace(Oid tsid)
 	 *
 	 * We don't wait for commit because drop tablespace is non-transactional.
 	 */
-	temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
+	temp_file_users = GetConflictingVirtualXIDs(InvalidCommitSeqNo,
 												InvalidOid);
 	ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
 										   PROCSIG_RECOVERY_CONFLICT_TABLESPACE);
@@ -606,8 +603,7 @@ StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid relOid)
 
 	/* Already processed? */
 	if (!TransactionIdIsValid(xid) ||
-		TransactionIdDidCommit(xid) ||
-		TransactionIdDidAbort(xid))
+		TransactionIdGetStatus(xid) != XID_INPROGRESS)
 		return;
 
 	elog(trace_recovery(DEBUG4),
@@ -722,7 +718,7 @@ StandbyReleaseAllLocks(void)
  *		as long as they're not prepared transactions.
  */
 void
-StandbyReleaseOldLocks(int nxids, TransactionId *xids)
+StandbyReleaseOldLocks(TransactionId oldestRunningXid)
 {
 	ListCell   *cell,
 			   *prev,
@@ -741,26 +737,8 @@ StandbyReleaseOldLocks(int nxids, TransactionId *xids)
 
 		if (StandbyTransactionIdIsPrepared(lock->xid))
 			remove = false;
-		else
-		{
-			int			i;
-			bool		found = false;
-
-			for (i = 0; i < nxids; i++)
-			{
-				if (lock->xid == xids[i])
-				{
-					found = true;
-					break;
-				}
-			}
-
-			/*
-			 * If its not a running transaction, remove it.
-			 */
-			if (!found)
-				remove = true;
-		}
+		else if (TransactionIdPrecedes(lock->xid, oldestRunningXid))
+			remove = true;
 
 		if (remove)
 		{
@@ -815,13 +793,8 @@ standby_redo(XLogReaderState *record)
 		xl_running_xacts *xlrec = (xl_running_xacts *) XLogRecGetData(record);
 		RunningTransactionsData running;
 
-		running.xcnt = xlrec->xcnt;
-		running.subxcnt = xlrec->subxcnt;
-		running.subxid_overflow = xlrec->subxid_overflow;
 		running.nextXid = xlrec->nextXid;
-		running.latestCompletedXid = xlrec->latestCompletedXid;
 		running.oldestRunningXid = xlrec->oldestRunningXid;
-		running.xids = xlrec->xids;
 
 		ProcArrayApplyRecoveryInfo(&running);
 	}
@@ -929,27 +902,8 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 
-	/*
-	 * GetRunningTransactionData() acquired ProcArrayLock, we must release it.
-	 * For Hot Standby this can be done before inserting the WAL record
-	 * because ProcArrayApplyRecoveryInfo() rechecks the commit status using
-	 * the clog. For logical decoding, though, the lock can't be released
-	 * early because the clog might be "in the future" from the POV of the
-	 * historic snapshot. This would allow for situations where we're waiting
-	 * for the end of a transaction listed in the xl_running_xacts record
-	 * which, according to the WAL, has committed before the xl_running_xacts
-	 * record. Fortunately this routine isn't executed frequently, and it's
-	 * only a shared lock.
-	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	recptr = LogCurrentRunningXacts(running);
 
-	/* Release lock if we kept it longer ... */
-	if (wal_level >= WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
 
@@ -971,41 +925,21 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts)
 	xl_running_xacts xlrec;
 	XLogRecPtr	recptr;
 
-	xlrec.xcnt = CurrRunningXacts->xcnt;
-	xlrec.subxcnt = CurrRunningXacts->subxcnt;
-	xlrec.subxid_overflow = CurrRunningXacts->subxid_overflow;
 	xlrec.nextXid = CurrRunningXacts->nextXid;
 	xlrec.oldestRunningXid = CurrRunningXacts->oldestRunningXid;
-	xlrec.latestCompletedXid = CurrRunningXacts->latestCompletedXid;
 
 	/* Header */
 	XLogBeginInsert();
 	XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT);
-	XLogRegisterData((char *) (&xlrec), MinSizeOfXactRunningXacts);
-
-	/* array of TransactionIds */
-	if (xlrec.xcnt > 0)
-		XLogRegisterData((char *) CurrRunningXacts->xids,
-						 (xlrec.xcnt + xlrec.subxcnt) * sizeof(TransactionId));
+	XLogRegisterData((char *) (&xlrec), SizeOfXactRunningXacts);
 
 	recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS);
 
-	if (CurrRunningXacts->subxid_overflow)
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
-	else
-		elog(trace_recovery(DEBUG2),
-			 "snapshot of %u+%u running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)",
-			 CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt,
-			 (uint32) (recptr >> 32), (uint32) recptr,
-			 CurrRunningXacts->oldestRunningXid,
-			 CurrRunningXacts->latestCompletedXid,
-			 CurrRunningXacts->nextXid);
+	elog(trace_recovery(DEBUG2),
+		 "snapshot of running transaction ids (lsn %X/%X oldest xid %u next xid %u)",
+		 (uint32) (recptr >> 32), (uint32) recptr,
+		 CurrRunningXacts->oldestRunningXid,
+		 CurrRunningXacts->nextXid);
 
 	/*
 	 * Ensure running_xacts information is synced to disk not too far in the
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index da5679b7a3..3ebb58649f 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -579,6 +579,8 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 	for (;;)
 	{
+		TransactionId parentXid;
+
 		Assert(TransactionIdIsValid(xid));
 		Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
 
@@ -588,9 +590,23 @@ XactLockTableWait(TransactionId xid, Relation rel, ItemPointer ctid,
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		/*
+		 * Ok, this xid is not running anymore. But it might be a
+		 * subtransaction whose parent is still running.
+		 */
+		CommitSeqNo csn = TransactionIdGetCommitSeqNo(xid);
+		if (COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn))
+			break;
+
+		parentXid = SubTransGetParent(xid);
+		if (parentXid == InvalidTransactionId)
+		{
+			csn = TransactionIdGetCommitSeqNo(xid);
+			Assert(COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn));
 			break;
-		xid = SubTransGetParent(xid);
+		}
+
+		xid = parentXid;
 	}
 
 	if (oper != XLTW_None)
@@ -607,6 +623,7 @@ bool
 ConditionalXactLockTableWait(TransactionId xid)
 {
 	LOCKTAG		tag;
+	TransactionId parentXid;
 
 	for (;;)
 	{
@@ -620,9 +637,23 @@ ConditionalXactLockTableWait(TransactionId xid)
 
 		LockRelease(&tag, ShareLock, false);
 
-		if (!TransactionIdIsInProgress(xid))
+		/*
+		 * Ok, this xid is not running anymore. But it might be a
+		 * subtransaction whose parent is still running.
+		 */
+		CommitSeqNo csn = TransactionIdGetCommitSeqNo(xid);
+		if (COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn))
 			break;
-		xid = SubTransGetParent(xid);
+
+		parentXid = SubTransGetParent(xid);
+		if (parentXid == InvalidTransactionId)
+		{
+			csn = TransactionIdGetCommitSeqNo(xid);
+			Assert(COMMITSEQNO_IS_COMMITTED(csn) || COMMITSEQNO_IS_ABORTED(csn));
+			break;
+		}
+
+		xid = parentXid;
 	}
 
 	return true;
diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt
index e6025ecedb..75af22ec8a 100644
--- a/src/backend/storage/lmgr/lwlocknames.txt
+++ b/src/backend/storage/lmgr/lwlocknames.txt
@@ -16,7 +16,7 @@ WALWriteLock						8
 ControlFileLock						9
 CheckpointLock						10
 CLogControlLock						11
-SubtransControlLock					12
+CSNLogControlLock					12
 MultiXactGenLock					13
 MultiXactOffsetControlLock			14
 MultiXactMemberControlLock			15
@@ -47,6 +47,8 @@ CommitTsLock						39
 ReplicationOriginLock				40
 MultiXactTruncationLock				41
 OldSnapshotTimeMapLock				42
-BackendRandomLock					43
-LogicalRepWorkerLock				44
-CLogTruncationLock					45
+CommitSeqNoLock						43
+BackendRandomLock				44
+
+LogicalRepWorkerLock				45
+CLogTruncationLock				46
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 251a359bff..966fd36156 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -185,7 +185,9 @@
 
 #include "postgres.h"
 
+#include "access/clog.h"
 #include "access/htup_details.h"
+#include "access/mvccvars.h"
 #include "access/slru.h"
 #include "access/subtrans.h"
 #include "access/transam.h"
@@ -3902,7 +3904,7 @@ static bool
 XidIsConcurrent(TransactionId xid)
 {
 	Snapshot	snap;
-	uint32		i;
+	XLogRecPtr	csn;
 
 	Assert(TransactionIdIsValid(xid));
 	Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny()));
@@ -3915,11 +3917,11 @@ XidIsConcurrent(TransactionId xid)
 	if (TransactionIdFollowsOrEquals(xid, snap->xmax))
 		return true;
 
-	for (i = 0; i < snap->xcnt; i++)
-	{
-		if (xid == snap->xip[i])
-			return true;
-	}
+	csn = TransactionIdGetCommitSeqNo(xid);
+	if (COMMITSEQNO_IS_INPROGRESS(csn))
+		return true;
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+		return csn >= snap->snapshotcsn;
 
 	return false;
 }
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5f6727d501..121cd93013 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -365,7 +365,7 @@ InitProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->pid = MyProcPid;
 	/* backendId, databaseId and roleId will be filled in later */
 	MyProc->backendId = InvalidBackendId;
@@ -412,9 +412,10 @@ InitProcess(void)
 	/* Initialize fields for group transaction status update. */
 	MyProc->clogGroupMember = false;
 	MyProc->clogGroupMemberXid = InvalidTransactionId;
-	MyProc->clogGroupMemberXidStatus = TRANSACTION_STATUS_IN_PROGRESS;
+	MyProc->clogGroupMemberXidStatus = CLOG_XID_STATUS_IN_PROGRESS;
 	MyProc->clogGroupMemberPage = -1;
 	MyProc->clogGroupMemberLsn = InvalidXLogRecPtr;
+	MyProc->clogGroupNSubxids = 0;
 	pg_atomic_init_u32(&MyProc->clogGroupNext, INVALID_PGPROCNO);
 
 	/*
@@ -548,7 +549,7 @@ InitAuxiliaryProcess(void)
 	MyProc->fpVXIDLock = false;
 	MyProc->fpLocalTransactionId = InvalidLocalTransactionId;
 	MyPgXact->xid = InvalidTransactionId;
-	MyPgXact->xmin = InvalidTransactionId;
+	MyPgXact->snapshotcsn = InvalidCommitSeqNo;
 	MyProc->backendId = InvalidBackendId;
 	MyProc->databaseId = InvalidOid;
 	MyProc->roleId = InvalidOid;
@@ -779,7 +780,7 @@ static void
 RemoveProcFromArray(int code, Datum arg)
 {
 	Assert(MyProc != NULL);
-	ProcArrayRemove(MyProc, InvalidTransactionId);
+	ProcArrayRemove(MyProc);
 }
 
 /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index edff6da410..3780f951b3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -130,6 +130,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parsetree.h"
 #include "statistics/statistics.h"
+#include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
@@ -5469,7 +5470,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
 			slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel));
 			econtext->ecxt_scantuple = slot;
 			get_typlenbyval(vardata->atttype, &typLen, &typByVal);
-			InitNonVacuumableSnapshot(SnapshotNonVacuumable, RecentGlobalXmin);
+			InitNonVacuumableSnapshot(SnapshotNonVacuumable, GetRecentGlobalXmin());
 
 			/* set up an IS NOT NULL scan key so that we ignore nulls */
 			ScanKeyEntryInitialize(&scankeys[0],
diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c
index 9d312edf04..16a3663f1e 100644
--- a/src/backend/utils/adt/txid.c
+++ b/src/backend/utils/adt/txid.c
@@ -22,6 +22,7 @@
 #include "postgres.h"
 
 #include "access/clog.h"
+#include "access/mvccvars.h"
 #include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
@@ -53,6 +54,8 @@ typedef uint64 txid;
 
 /*
  * Snapshot containing 8byte txids.
+ *
+ * FIXME: this could be a fixed-length datatype now.
  */
 typedef struct
 {
@@ -63,17 +66,16 @@ typedef struct
 	 */
 	int32		__varsz;
 
-	uint32		nxip;			/* number of txids in xip array */
-	txid		xmin;
 	txid		xmax;
-	/* in-progress txids, xmin <= xip[i] < xmax: */
-	txid		xip[FLEXIBLE_ARRAY_MEMBER];
+	/*
+	 * FIXME: this is change in on-disk format if someone created a column
+	 * with txid datatype. Dump+reload won't load either.
+	 */
+	CommitSeqNo	snapshotcsn;
 } TxidSnapshot;
 
-#define TXID_SNAPSHOT_SIZE(nxip) \
-	(offsetof(TxidSnapshot, xip) + sizeof(txid) * (nxip))
-#define TXID_SNAPSHOT_MAX_NXIP \
-	((MaxAllocSize - offsetof(TxidSnapshot, xip)) / sizeof(txid))
+#define TXID_SNAPSHOT_SIZE \
+	(offsetof(TxidSnapshot, snapshotcsn) + sizeof(CommitSeqNo))
 
 /*
  * Epoch values from xact.c
@@ -183,60 +185,12 @@ convert_xid(TransactionId xid, const TxidEpoch *state)
 }
 
 /*
- * txid comparator for qsort/bsearch
- */
-static int
-cmp_txid(const void *aa, const void *bb)
-{
-	txid		a = *(const txid *) aa;
-	txid		b = *(const txid *) bb;
-
-	if (a < b)
-		return -1;
-	if (a > b)
-		return 1;
-	return 0;
-}
-
-/*
- * Sort a snapshot's txids, so we can use bsearch() later.  Also remove
- * any duplicates.
- *
- * For consistency of on-disk representation, we always sort even if bsearch
- * will not be used.
- */
-static void
-sort_snapshot(TxidSnapshot *snap)
-{
-	txid		last = 0;
-	int			nxip,
-				idx1,
-				idx2;
-
-	if (snap->nxip > 1)
-	{
-		qsort(snap->xip, snap->nxip, sizeof(txid), cmp_txid);
-
-		/* remove duplicates */
-		nxip = snap->nxip;
-		idx1 = idx2 = 0;
-		while (idx1 < nxip)
-		{
-			if (snap->xip[idx1] != last)
-				last = snap->xip[idx2++] = snap->xip[idx1];
-			else
-				snap->nxip--;
-			idx1++;
-		}
-	}
-}
-
-/*
  * check txid visibility.
  */
 static bool
 is_visible_txid(txid value, const TxidSnapshot *snap)
 {
+#ifdef BROKEN
 	if (value < snap->xmin)
 		return true;
 	else if (value >= snap->xmax)
@@ -262,50 +216,8 @@ is_visible_txid(txid value, const TxidSnapshot *snap)
 		}
 		return true;
 	}
-}
-
-/*
- * helper functions to use StringInfo for TxidSnapshot creation.
- */
-
-static StringInfo
-buf_init(txid xmin, txid xmax)
-{
-	TxidSnapshot snap;
-	StringInfo	buf;
-
-	snap.xmin = xmin;
-	snap.xmax = xmax;
-	snap.nxip = 0;
-
-	buf = makeStringInfo();
-	appendBinaryStringInfo(buf, (char *) &snap, TXID_SNAPSHOT_SIZE(0));
-	return buf;
-}
-
-static void
-buf_add_txid(StringInfo buf, txid xid)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	/* do this before possible realloc */
-	snap->nxip++;
-
-	appendBinaryStringInfo(buf, (char *) &xid, sizeof(xid));
-}
-
-static TxidSnapshot *
-buf_finalize(StringInfo buf)
-{
-	TxidSnapshot *snap = (TxidSnapshot *) buf->data;
-
-	SET_VARSIZE(snap, buf->len);
-
-	/* buf is not needed anymore */
-	buf->data = NULL;
-	pfree(buf);
-
-	return snap;
+#endif
+	return false;
 }
 
 /*
@@ -350,54 +262,29 @@ str2txid(const char *s, const char **endp)
 static TxidSnapshot *
 parse_snapshot(const char *str)
 {
-	txid		xmin;
-	txid		xmax;
-	txid		last_val = 0,
-				val;
 	const char *str_start = str;
 	const char *endp;
-	StringInfo	buf;
+	TxidSnapshot *snap;
+	uint32		csn_hi,
+				csn_lo;
 
-	xmin = str2txid(str, &endp);
-	if (*endp != ':')
-		goto bad_format;
-	str = endp + 1;
+	snap = palloc0(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
-	xmax = str2txid(str, &endp);
+	snap->xmax = str2txid(str, &endp);
 	if (*endp != ':')
 		goto bad_format;
 	str = endp + 1;
 
 	/* it should look sane */
-	if (xmin == 0 || xmax == 0 || xmin > xmax)
+	if (snap->xmax == 0)
 		goto bad_format;
 
-	/* allocate buffer */
-	buf = buf_init(xmin, xmax);
-
-	/* loop over values */
-	while (*str != '\0')
-	{
-		/* read next value */
-		val = str2txid(str, &endp);
-		str = endp;
-
-		/* require the input to be in order */
-		if (val < xmin || val >= xmax || val < last_val)
-			goto bad_format;
-
-		/* skip duplicates */
-		if (val != last_val)
-			buf_add_txid(buf, val);
-		last_val = val;
-
-		if (*str == ',')
-			str++;
-		else if (*str != '\0')
-			goto bad_format;
-	}
+	if (sscanf(str, "%X/%X", &csn_hi, &csn_lo) != 2)
+		goto bad_format;
+	snap->snapshotcsn = ((uint64) csn_hi) << 32 | csn_lo;
 
-	return buf_finalize(buf);
+	return snap;
 
 bad_format:
 	ereport(ERROR,
@@ -477,8 +364,6 @@ Datum
 txid_current_snapshot(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap;
-	uint32		nxip,
-				i;
 	TxidEpoch	state;
 	Snapshot	cur;
 
@@ -488,35 +373,13 @@ txid_current_snapshot(PG_FUNCTION_ARGS)
 
 	load_xid_epoch(&state);
 
-	/*
-	 * Compile-time limits on the procarray (MAX_BACKENDS processes plus
-	 * MAX_BACKENDS prepared transactions) guarantee nxip won't be too large.
-	 */
-	StaticAssertStmt(MAX_BACKENDS * 2 <= TXID_SNAPSHOT_MAX_NXIP,
-					 "possible overflow in txid_current_snapshot()");
-
 	/* allocate */
-	nxip = cur->xcnt;
-	snap = palloc(TXID_SNAPSHOT_SIZE(nxip));
+	snap = palloc(TXID_SNAPSHOT_SIZE);
+	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE);
 
 	/* fill */
-	snap->xmin = convert_xid(cur->xmin, &state);
 	snap->xmax = convert_xid(cur->xmax, &state);
-	snap->nxip = nxip;
-	for (i = 0; i < nxip; i++)
-		snap->xip[i] = convert_xid(cur->xip[i], &state);
-
-	/*
-	 * We want them guaranteed to be in ascending order.  This also removes
-	 * any duplicate xids.  Normally, an XID can only be assigned to one
-	 * backend, but when preparing a transaction for two-phase commit, there
-	 * is a transient state when both the original backend and the dummy
-	 * PGPROC entry reserved for the prepared transaction hold the same XID.
-	 */
-	sort_snapshot(snap);
-
-	/* set size after sorting, because it may have removed duplicate xips */
-	SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE(snap->nxip));
+	snap->snapshotcsn = cur->snapshotcsn;
 
 	PG_RETURN_POINTER(snap);
 }
@@ -547,19 +410,12 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData str;
-	uint32		i;
 
 	initStringInfo(&str);
 
-	appendStringInfo(&str, TXID_FMT ":", snap->xmin);
 	appendStringInfo(&str, TXID_FMT ":", snap->xmax);
-
-	for (i = 0; i < snap->nxip; i++)
-	{
-		if (i > 0)
-			appendStringInfoChar(&str, ',');
-		appendStringInfo(&str, TXID_FMT, snap->xip[i]);
-	}
+	appendStringInfo(&str, "%X/%X", (uint32) (snap->snapshotcsn >> 32),
+					 (uint32) snap->snapshotcsn);
 
 	PG_RETURN_CSTRING(str.data);
 }
@@ -574,6 +430,7 @@ txid_snapshot_out(PG_FUNCTION_ARGS)
 Datum
 txid_snapshot_recv(PG_FUNCTION_ARGS)
 {
+#ifdef BROKEN
 	StringInfo	buf = (StringInfo) PG_GETARG_POINTER(0);
 	TxidSnapshot *snap;
 	txid		last = 0;
@@ -582,11 +439,6 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	txid		xmin,
 				xmax;
 
-	/* load and validate nxip */
-	nxip = pq_getmsgint(buf, 4);
-	if (nxip < 0 || nxip > TXID_SNAPSHOT_MAX_NXIP)
-		goto bad_format;
-
 	xmin = pq_getmsgint64(buf);
 	xmax = pq_getmsgint64(buf);
 	if (xmin == 0 || xmax == 0 || xmin > xmax || xmax > MAX_TXID)
@@ -619,6 +471,7 @@ txid_snapshot_recv(PG_FUNCTION_ARGS)
 	PG_RETURN_POINTER(snap);
 
 bad_format:
+#endif
 	ereport(ERROR,
 			(errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
 			 errmsg("invalid external txid_snapshot data")));
@@ -637,14 +490,13 @@ txid_snapshot_send(PG_FUNCTION_ARGS)
 {
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 	StringInfoData buf;
-	uint32		i;
 
 	pq_begintypsend(&buf);
-	pq_sendint32(&buf, snap->nxip);
+#ifdef BROKEN
 	pq_sendint64(&buf, snap->xmin);
 	pq_sendint64(&buf, snap->xmax);
-	for (i = 0; i < snap->nxip; i++)
-		pq_sendint64(&buf, snap->xip[i]);
+#endif
+	pq_sendint64(&buf, snap->snapshotcsn);
 	PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
 }
 
@@ -665,14 +517,18 @@ txid_visible_in_snapshot(PG_FUNCTION_ARGS)
 /*
  * txid_snapshot_xmin(txid_snapshot) returns int8
  *
- *		return snapshot's xmin
+ *             return snapshot's xmin
  */
 Datum
 txid_snapshot_xmin(PG_FUNCTION_ARGS)
 {
+	/* FIXME: we don't store xmin in the TxidSnapshot anymore. Maybe we still should? */
+#ifdef BROKEN
 	TxidSnapshot *snap = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
 
 	PG_RETURN_INT64(snap->xmin);
+#endif
+	PG_RETURN_INT64(0);
 }
 
 /*
@@ -687,47 +543,6 @@ txid_snapshot_xmax(PG_FUNCTION_ARGS)
 
 	PG_RETURN_INT64(snap->xmax);
 }
-
-/*
- * txid_snapshot_xip(txid_snapshot) returns setof int8
- *
- *		return in-progress TXIDs in snapshot.
- */
-Datum
-txid_snapshot_xip(PG_FUNCTION_ARGS)
-{
-	FuncCallContext *fctx;
-	TxidSnapshot *snap;
-	txid		value;
-
-	/* on first call initialize snap_state and get copy of snapshot */
-	if (SRF_IS_FIRSTCALL())
-	{
-		TxidSnapshot *arg = (TxidSnapshot *) PG_GETARG_VARLENA_P(0);
-
-		fctx = SRF_FIRSTCALL_INIT();
-
-		/* make a copy of user snapshot */
-		snap = MemoryContextAlloc(fctx->multi_call_memory_ctx, VARSIZE(arg));
-		memcpy(snap, arg, VARSIZE(arg));
-
-		fctx->user_fctx = snap;
-	}
-
-	/* return values one-by-one */
-	fctx = SRF_PERCALL_SETUP();
-	snap = fctx->user_fctx;
-	if (fctx->call_cntr < snap->nxip)
-	{
-		value = snap->xip[fctx->call_cntr];
-		SRF_RETURN_NEXT(fctx, Int64GetDatum(value));
-	}
-	else
-	{
-		SRF_RETURN_DONE(fctx);
-	}
-}
-
 /*
  * Report the status of a recent transaction ID, or null for wrapped,
  * truncated away or otherwise too old XIDs.
diff --git a/src/backend/utils/probes.d b/src/backend/utils/probes.d
index 214dc712ca..c58d6adb6f 100644
--- a/src/backend/utils/probes.d
+++ b/src/backend/utils/probes.d
@@ -75,6 +75,8 @@ provider postgresql {
 	probe checkpoint__done(int, int, int, int, int);
 	probe clog__checkpoint__start(bool);
 	probe clog__checkpoint__done(bool);
+	probe csnlog__checkpoint__start(bool);
+	probe csnlog__checkpoint__done(bool);
 	probe subtrans__checkpoint__start(bool);
 	probe subtrans__checkpoint__done(bool);
 	probe multixact__checkpoint__start(bool);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index addf87dc3b..c137325db1 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -152,19 +152,11 @@ static Snapshot CatalogSnapshot = NULL;
 static Snapshot HistoricSnapshot = NULL;
 
 /*
- * These are updated by GetSnapshotData.  We initialize them this way
- * for the convenience of TransactionIdIsInProgress: even in bootstrap
- * mode, we don't want it to say that BootstrapTransactionId is in progress.
- *
- * RecentGlobalXmin and RecentGlobalDataXmin are initialized to
- * InvalidTransactionId, to ensure that no one tries to use a stale
- * value. Readers should ensure that it has been set to something else
- * before using it.
+ * These are updated by GetSnapshotData.  We initialize them this way,
+ * because even in bootstrap mode, we don't want it to say that
+ * BootstrapTransactionId is in progress.
  */
 TransactionId TransactionXmin = FirstNormalTransactionId;
-TransactionId RecentXmin = FirstNormalTransactionId;
-TransactionId RecentGlobalXmin = InvalidTransactionId;
-TransactionId RecentGlobalDataXmin = InvalidTransactionId;
 
 /* (table, ctid) => (cmin, cmax) mapping during timetravel */
 static HTAB *tuplecid_data = NULL;
@@ -238,9 +230,7 @@ typedef struct SerializedSnapshotData
 {
 	TransactionId xmin;
 	TransactionId xmax;
-	uint32		xcnt;
-	int32		subxcnt;
-	bool		suboverflowed;
+	CommitSeqNo snapshotcsn;
 	bool		takenDuringRecovery;
 	CommandId	curcid;
 	TimestampTz whenTaken;
@@ -579,26 +569,18 @@ SetTransactionSnapshot(Snapshot sourcesnap, VirtualTransactionId *sourcevxid,
 	 * Even though we are not going to use the snapshot it computes, we must
 	 * call GetSnapshotData, for two reasons: (1) to be sure that
 	 * CurrentSnapshotData's XID arrays have been allocated, and (2) to update
-	 * RecentXmin and RecentGlobalXmin.  (We could alternatively include those
+	 * RecentGlobalXmin.  (We could alternatively include those
 	 * two variables in exported snapshot files, but it seems better to have
 	 * snapshot importers compute reasonably up-to-date values for them.)
+	 *
+	 * FIXME: neither of those reasons hold anymore. Can we drop this?
 	 */
 	CurrentSnapshot = GetSnapshotData(&CurrentSnapshotData);
 
 	/*
 	 * Now copy appropriate fields from the source snapshot.
 	 */
-	CurrentSnapshot->xmin = sourcesnap->xmin;
 	CurrentSnapshot->xmax = sourcesnap->xmax;
-	CurrentSnapshot->xcnt = sourcesnap->xcnt;
-	Assert(sourcesnap->xcnt <= GetMaxSnapshotXidCount());
-	memcpy(CurrentSnapshot->xip, sourcesnap->xip,
-		   sourcesnap->xcnt * sizeof(TransactionId));
-	CurrentSnapshot->subxcnt = sourcesnap->subxcnt;
-	Assert(sourcesnap->subxcnt <= GetMaxSnapshotSubxidCount());
-	memcpy(CurrentSnapshot->subxip, sourcesnap->subxip,
-		   sourcesnap->subxcnt * sizeof(TransactionId));
-	CurrentSnapshot->suboverflowed = sourcesnap->suboverflowed;
 	CurrentSnapshot->takenDuringRecovery = sourcesnap->takenDuringRecovery;
 	/* NB: curcid should NOT be copied, it's a local matter */
 
@@ -660,50 +642,17 @@ static Snapshot
 CopySnapshot(Snapshot snapshot)
 {
 	Snapshot	newsnap;
-	Size		subxipoff;
-	Size		size;
 
 	Assert(snapshot != InvalidSnapshot);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
-	size = subxipoff = sizeof(SnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-	if (snapshot->subxcnt > 0)
-		size += snapshot->subxcnt * sizeof(TransactionId);
-
-	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	newsnap = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	memcpy(newsnap, snapshot, sizeof(SnapshotData));
 
 	newsnap->regd_count = 0;
 	newsnap->active_count = 0;
 	newsnap->copied = true;
 
-	/* setup XID array */
-	if (snapshot->xcnt > 0)
-	{
-		newsnap->xip = (TransactionId *) (newsnap + 1);
-		memcpy(newsnap->xip, snapshot->xip,
-			   snapshot->xcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->xip = NULL;
-
-	/*
-	 * Setup subXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (snapshot->subxcnt > 0 &&
-		(!snapshot->suboverflowed || snapshot->takenDuringRecovery))
-	{
-		newsnap->subxip = (TransactionId *) ((char *) newsnap + subxipoff);
-		memcpy(newsnap->subxip, snapshot->subxip,
-			   snapshot->subxcnt * sizeof(TransactionId));
-	}
-	else
-		newsnap->subxip = NULL;
-
 	return newsnap;
 }
 
@@ -984,7 +933,7 @@ SnapshotResetXmin(void)
 
 	if (pairingheap_is_empty(&RegisteredSnapshots))
 	{
-		MyPgXact->xmin = InvalidTransactionId;
+		ProcArrayResetXmin(MyProc);
 		return;
 	}
 
@@ -992,7 +941,7 @@ SnapshotResetXmin(void)
 										pairingheap_first(&RegisteredSnapshots));
 
 	if (TransactionIdPrecedes(MyPgXact->xmin, minSnapshot->xmin))
-		MyPgXact->xmin = minSnapshot->xmin;
+		ProcArrayResetXmin(MyProc);
 }
 
 /*
@@ -1159,13 +1108,8 @@ char *
 ExportSnapshot(Snapshot snapshot)
 {
 	TransactionId topXid;
-	TransactionId *children;
-	ExportedSnapshot *esnap;
-	int			nchildren;
-	int			addTopXid;
 	StringInfoData buf;
 	FILE	   *f;
-	int			i;
 	MemoryContext oldcxt;
 	char		path[MAXPGPATH];
 	char		pathtmp[MAXPGPATH];
@@ -1185,9 +1129,9 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 
 	/*
-	 * Get our transaction ID if there is one, to include in the snapshot.
+	 * This will assign a transaction ID if we do not yet have one.
 	 */
-	topXid = GetTopTransactionIdIfAny();
+	topXid = GetTopTransactionId();
 
 	/*
 	 * We cannot export a snapshot from a subtransaction because there's no
@@ -1200,20 +1144,6 @@ ExportSnapshot(Snapshot snapshot)
 				 errmsg("cannot export a snapshot from a subtransaction")));
 
 	/*
-	 * We do however allow previous committed subtransactions to exist.
-	 * Importers of the snapshot must see them as still running, so get their
-	 * XIDs to add them to the snapshot.
-	 */
-	nchildren = xactGetCommittedChildren(&children);
-
-	/*
-	 * Generate file path for the snapshot.  We start numbering of snapshots
-	 * inside the transaction from 1.
-	 */
-	snprintf(path, sizeof(path), SNAPSHOT_EXPORT_DIR "/%08X-%08X-%d",
-			 MyProc->backendId, MyProc->lxid, list_length(exportedSnapshots) + 1);
-
-	/*
 	 * Copy the snapshot into TopTransactionContext, add it to the
 	 * exportedSnapshots list, and mark it pseudo-registered.  We do this to
 	 * ensure that the snapshot's xmin is honored for the rest of the
@@ -1222,10 +1152,7 @@ ExportSnapshot(Snapshot snapshot)
 	snapshot = CopySnapshot(snapshot);
 
 	oldcxt = MemoryContextSwitchTo(TopTransactionContext);
-	esnap = (ExportedSnapshot *) palloc(sizeof(ExportedSnapshot));
-	esnap->snapfile = pstrdup(path);
-	esnap->snapshot = snapshot;
-	exportedSnapshots = lappend(exportedSnapshots, esnap);
+	exportedSnapshots = lappend(exportedSnapshots, snapshot);
 	MemoryContextSwitchTo(oldcxt);
 
 	snapshot->regd_count++;
@@ -1238,7 +1165,7 @@ ExportSnapshot(Snapshot snapshot)
 	 */
 	initStringInfo(&buf);
 
-	appendStringInfo(&buf, "vxid:%d/%u\n", MyProc->backendId, MyProc->lxid);
+	appendStringInfo(&buf, "xid:%u\n", topXid);
 	appendStringInfo(&buf, "pid:%d\n", MyProcPid);
 	appendStringInfo(&buf, "dbid:%u\n", MyDatabaseId);
 	appendStringInfo(&buf, "iso:%d\n", XactIsoLevel);
@@ -1247,42 +1174,10 @@ ExportSnapshot(Snapshot snapshot)
 	appendStringInfo(&buf, "xmin:%u\n", snapshot->xmin);
 	appendStringInfo(&buf, "xmax:%u\n", snapshot->xmax);
 
-	/*
-	 * We must include our own top transaction ID in the top-xid data, since
-	 * by definition we will still be running when the importing transaction
-	 * adopts the snapshot, but GetSnapshotData never includes our own XID in
-	 * the snapshot.  (There must, therefore, be enough room to add it.)
-	 *
-	 * However, it could be that our topXid is after the xmax, in which case
-	 * we shouldn't include it because xip[] members are expected to be before
-	 * xmax.  (We need not make the same check for subxip[] members, see
-	 * snapshot.h.)
-	 */
-	addTopXid = (TransactionIdIsValid(topXid) &&
-				 TransactionIdPrecedes(topXid, snapshot->xmax)) ? 1 : 0;
-	appendStringInfo(&buf, "xcnt:%d\n", snapshot->xcnt + addTopXid);
-	for (i = 0; i < snapshot->xcnt; i++)
-		appendStringInfo(&buf, "xip:%u\n", snapshot->xip[i]);
-	if (addTopXid)
-		appendStringInfo(&buf, "xip:%u\n", topXid);
-
-	/*
-	 * Similarly, we add our subcommitted child XIDs to the subxid data. Here,
-	 * we have to cope with possible overflow.
-	 */
-	if (snapshot->suboverflowed ||
-		snapshot->subxcnt + nchildren > GetMaxSnapshotSubxidCount())
-		appendStringInfoString(&buf, "sof:1\n");
-	else
-	{
-		appendStringInfoString(&buf, "sof:0\n");
-		appendStringInfo(&buf, "sxcnt:%d\n", snapshot->subxcnt + nchildren);
-		for (i = 0; i < snapshot->subxcnt; i++)
-			appendStringInfo(&buf, "sxp:%u\n", snapshot->subxip[i]);
-		for (i = 0; i < nchildren; i++)
-			appendStringInfo(&buf, "sxp:%u\n", children[i]);
-	}
 	appendStringInfo(&buf, "rec:%u\n", snapshot->takenDuringRecovery);
+	appendStringInfo(&buf, "snapshotcsn:%X/%X\n",
+					 (uint32) (snapshot->snapshotcsn >> 32),
+					 (uint32) snapshot->snapshotcsn);
 
 	/*
 	 * Now write the text representation into a file.  We first write to a
@@ -1342,85 +1237,6 @@ pg_export_snapshot(PG_FUNCTION_ARGS)
 
 
 /*
- * Parsing subroutines for ImportSnapshot: parse a line with the given
- * prefix followed by a value, and advance *s to the next line.  The
- * filename is provided for use in error messages.
- */
-static int
-parseIntFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	int			val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static TransactionId
-parseXidFromText(const char *prefix, char **s, const char *filename)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-	TransactionId val;
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%u", &val) != 1)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-	return val;
-}
-
-static void
-parseVxidFromText(const char *prefix, char **s, const char *filename,
-				  VirtualTransactionId *vxid)
-{
-	char	   *ptr = *s;
-	int			prefixlen = strlen(prefix);
-
-	if (strncmp(ptr, prefix, prefixlen) != 0)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr += prefixlen;
-	if (sscanf(ptr, "%d/%u", &vxid->backendId, &vxid->localTransactionId) != 2)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	ptr = strchr(ptr, '\n');
-	if (!ptr)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", filename)));
-	*s = ptr + 1;
-}
-
-/*
  * ImportSnapshot
  *		Import a previously exported snapshot.  The argument should be a
  *		filename in SNAPSHOT_EXPORT_DIR.  Load the snapshot from that file.
@@ -1429,170 +1245,7 @@ parseVxidFromText(const char *prefix, char **s, const char *filename,
 void
 ImportSnapshot(const char *idstr)
 {
-	char		path[MAXPGPATH];
-	FILE	   *f;
-	struct stat stat_buf;
-	char	   *filebuf;
-	int			xcnt;
-	int			i;
-	VirtualTransactionId src_vxid;
-	int			src_pid;
-	Oid			src_dbid;
-	int			src_isolevel;
-	bool		src_readonly;
-	SnapshotData snapshot;
-
-	/*
-	 * Must be at top level of a fresh transaction.  Note in particular that
-	 * we check we haven't acquired an XID --- if we have, it's conceivable
-	 * that the snapshot would show it as not running, making for very screwy
-	 * behavior.
-	 */
-	if (FirstSnapshotSet ||
-		GetTopTransactionIdIfAny() != InvalidTransactionId ||
-		IsSubTransaction())
-		ereport(ERROR,
-				(errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
-				 errmsg("SET TRANSACTION SNAPSHOT must be called before any query")));
-
-	/*
-	 * If we are in read committed mode then the next query would execute with
-	 * a new snapshot thus making this function call quite useless.
-	 */
-	if (!IsolationUsesXactSnapshot())
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("a snapshot-importing transaction must have isolation level SERIALIZABLE or REPEATABLE READ")));
-
-	/*
-	 * Verify the identifier: only 0-9, A-F and hyphens are allowed.  We do
-	 * this mainly to prevent reading arbitrary files.
-	 */
-	if (strspn(idstr, "0123456789ABCDEF-") != strlen(idstr))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* OK, read the file */
-	snprintf(path, MAXPGPATH, SNAPSHOT_EXPORT_DIR "/%s", idstr);
-
-	f = AllocateFile(path, PG_BINARY_R);
-	if (!f)
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid snapshot identifier: \"%s\"", idstr)));
-
-	/* get the size of the file so that we know how much memory we need */
-	if (fstat(fileno(f), &stat_buf))
-		elog(ERROR, "could not stat file \"%s\": %m", path);
-
-	/* and read the file into a palloc'd string */
-	filebuf = (char *) palloc(stat_buf.st_size + 1);
-	if (fread(filebuf, stat_buf.st_size, 1, f) != 1)
-		elog(ERROR, "could not read file \"%s\": %m", path);
-
-	filebuf[stat_buf.st_size] = '\0';
-
-	FreeFile(f);
-
-	/*
-	 * Construct a snapshot struct by parsing the file content.
-	 */
-	memset(&snapshot, 0, sizeof(snapshot));
-
-	parseVxidFromText("vxid:", &filebuf, path, &src_vxid);
-	src_pid = parseIntFromText("pid:", &filebuf, path);
-	/* we abuse parseXidFromText a bit here ... */
-	src_dbid = parseXidFromText("dbid:", &filebuf, path);
-	src_isolevel = parseIntFromText("iso:", &filebuf, path);
-	src_readonly = parseIntFromText("ro:", &filebuf, path);
-
-	snapshot.xmin = parseXidFromText("xmin:", &filebuf, path);
-	snapshot.xmax = parseXidFromText("xmax:", &filebuf, path);
-
-	snapshot.xcnt = xcnt = parseIntFromText("xcnt:", &filebuf, path);
-
-	/* sanity-check the xid count before palloc */
-	if (xcnt < 0 || xcnt > GetMaxSnapshotXidCount())
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	snapshot.xip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-	for (i = 0; i < xcnt; i++)
-		snapshot.xip[i] = parseXidFromText("xip:", &filebuf, path);
-
-	snapshot.suboverflowed = parseIntFromText("sof:", &filebuf, path);
-
-	if (!snapshot.suboverflowed)
-	{
-		snapshot.subxcnt = xcnt = parseIntFromText("sxcnt:", &filebuf, path);
-
-		/* sanity-check the xid count before palloc */
-		if (xcnt < 0 || xcnt > GetMaxSnapshotSubxidCount())
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-					 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-		snapshot.subxip = (TransactionId *) palloc(xcnt * sizeof(TransactionId));
-		for (i = 0; i < xcnt; i++)
-			snapshot.subxip[i] = parseXidFromText("sxp:", &filebuf, path);
-	}
-	else
-	{
-		snapshot.subxcnt = 0;
-		snapshot.subxip = NULL;
-	}
-
-	snapshot.takenDuringRecovery = parseIntFromText("rec:", &filebuf, path);
-
-	/*
-	 * Do some additional sanity checking, just to protect ourselves.  We
-	 * don't trouble to check the array elements, just the most critical
-	 * fields.
-	 */
-	if (!VirtualTransactionIdIsValid(src_vxid) ||
-		!OidIsValid(src_dbid) ||
-		!TransactionIdIsNormal(snapshot.xmin) ||
-		!TransactionIdIsNormal(snapshot.xmax))
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-				 errmsg("invalid snapshot data in file \"%s\"", path)));
-
-	/*
-	 * If we're serializable, the source transaction must be too, otherwise
-	 * predicate.c has problems (SxactGlobalXmin could go backwards).  Also, a
-	 * non-read-only transaction can't adopt a snapshot from a read-only
-	 * transaction, as predicate.c handles the cases very differently.
-	 */
-	if (IsolationIsSerializable())
-	{
-		if (src_isolevel != XACT_SERIALIZABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a serializable transaction cannot import a snapshot from a non-serializable transaction")));
-		if (src_readonly && !XactReadOnly)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("a non-read-only serializable transaction cannot import a snapshot from a read-only transaction")));
-	}
-
-	/*
-	 * We cannot import a snapshot that was taken in a different database,
-	 * because vacuum calculates OldestXmin on a per-database basis; so the
-	 * source transaction's xmin doesn't protect us from data loss.  This
-	 * restriction could be removed if the source transaction were to mark its
-	 * xmin as being globally applicable.  But that would require some
-	 * additional syntax, since that has to be known when the snapshot is
-	 * initially taken.  (See pgsql-hackers discussion of 2011-10-21.)
-	 */
-	if (src_dbid != MyDatabaseId)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot import a snapshot from a different database")));
-
-	/* OK, install the snapshot */
-	SetTransactionSnapshot(&snapshot, &src_vxid, src_pid, NULL);
+	Assert(false);
 }
 
 /*
@@ -1839,7 +1492,6 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 		if (NormalTransactionIdFollows(xlimit, recentXmin))
 			return xlimit;
 	}
-
 	return recentXmin;
 }
 
@@ -2050,13 +1702,7 @@ EstimateSnapshotSpace(Snapshot snap)
 	Assert(snap != InvalidSnapshot);
 	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = add_size(sizeof(SerializedSnapshotData),
-					mul_size(snap->xcnt, sizeof(TransactionId)));
-	if (snap->subxcnt > 0 &&
-		(!snap->suboverflowed || snap->takenDuringRecovery))
-		size = add_size(size,
-						mul_size(snap->subxcnt, sizeof(TransactionId)));
+	size = sizeof(SerializedSnapshotData);
 
 	return size;
 }
@@ -2071,51 +1717,20 @@ SerializeSnapshot(Snapshot snapshot, char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
 
-	Assert(snapshot->subxcnt >= 0);
-
 	/* Copy all required fields */
 	serialized_snapshot.xmin = snapshot->xmin;
 	serialized_snapshot.xmax = snapshot->xmax;
-	serialized_snapshot.xcnt = snapshot->xcnt;
-	serialized_snapshot.subxcnt = snapshot->subxcnt;
-	serialized_snapshot.suboverflowed = snapshot->suboverflowed;
 	serialized_snapshot.takenDuringRecovery = snapshot->takenDuringRecovery;
 	serialized_snapshot.curcid = snapshot->curcid;
 	serialized_snapshot.whenTaken = snapshot->whenTaken;
 	serialized_snapshot.lsn = snapshot->lsn;
 
-	/*
-	 * Ignore the SubXID array if it has overflowed, unless the snapshot was
-	 * taken during recovery - in that case, top-level XIDs are in subxip as
-	 * well, and we mustn't lose them.
-	 */
-	if (serialized_snapshot.suboverflowed && !snapshot->takenDuringRecovery)
-		serialized_snapshot.subxcnt = 0;
+	serialized_snapshot.snapshotcsn = snapshot->snapshotcsn;
 
 	/* Copy struct to possibly-unaligned buffer */
 	memcpy(start_address,
 		   &serialized_snapshot, sizeof(SerializedSnapshotData));
 
-	/* Copy XID array */
-	if (snapshot->xcnt > 0)
-		memcpy((TransactionId *) (start_address +
-								  sizeof(SerializedSnapshotData)),
-			   snapshot->xip, snapshot->xcnt * sizeof(TransactionId));
-
-	/*
-	 * Copy SubXID array. Don't bother to copy it if it had overflowed,
-	 * though, because it's not used anywhere in that case. Except if it's a
-	 * snapshot taken during recovery; all the top-level XIDs are in subxip as
-	 * well in that case, so we mustn't lose them.
-	 */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		Size		subxipoff = sizeof(SerializedSnapshotData) +
-		snapshot->xcnt * sizeof(TransactionId);
-
-		memcpy((TransactionId *) (start_address + subxipoff),
-			   snapshot->subxip, snapshot->subxcnt * sizeof(TransactionId));
-	}
 }
 
 /*
@@ -2129,52 +1744,21 @@ Snapshot
 RestoreSnapshot(char *start_address)
 {
 	SerializedSnapshotData serialized_snapshot;
-	Size		size;
 	Snapshot	snapshot;
-	TransactionId *serialized_xids;
 
 	memcpy(&serialized_snapshot, start_address,
 		   sizeof(SerializedSnapshotData));
-	serialized_xids = (TransactionId *)
-		(start_address + sizeof(SerializedSnapshotData));
-
-	/* We allocate any XID arrays needed in the same palloc block. */
-	size = sizeof(SnapshotData)
-		+ serialized_snapshot.xcnt * sizeof(TransactionId)
-		+ serialized_snapshot.subxcnt * sizeof(TransactionId);
 
 	/* Copy all required fields */
-	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, size);
+	snapshot = (Snapshot) MemoryContextAlloc(TopTransactionContext, sizeof(SnapshotData));
 	snapshot->satisfies = HeapTupleSatisfiesMVCC;
 	snapshot->xmin = serialized_snapshot.xmin;
 	snapshot->xmax = serialized_snapshot.xmax;
-	snapshot->xip = NULL;
-	snapshot->xcnt = serialized_snapshot.xcnt;
-	snapshot->subxip = NULL;
-	snapshot->subxcnt = serialized_snapshot.subxcnt;
-	snapshot->suboverflowed = serialized_snapshot.suboverflowed;
+	snapshot->snapshotcsn = serialized_snapshot.snapshotcsn;
 	snapshot->takenDuringRecovery = serialized_snapshot.takenDuringRecovery;
 	snapshot->curcid = serialized_snapshot.curcid;
 	snapshot->whenTaken = serialized_snapshot.whenTaken;
 	snapshot->lsn = serialized_snapshot.lsn;
-
-	/* Copy XIDs, if present. */
-	if (serialized_snapshot.xcnt > 0)
-	{
-		snapshot->xip = (TransactionId *) (snapshot + 1);
-		memcpy(snapshot->xip, serialized_xids,
-			   serialized_snapshot.xcnt * sizeof(TransactionId));
-	}
-
-	/* Copy SubXIDs, if present. */
-	if (serialized_snapshot.subxcnt > 0)
-	{
-		snapshot->subxip = ((TransactionId *) (snapshot + 1)) +
-			serialized_snapshot.xcnt;
-		memcpy(snapshot->subxip, serialized_xids + serialized_snapshot.xcnt,
-			   serialized_snapshot.subxcnt * sizeof(TransactionId));
-	}
-
 	/* Set the copied flag so that the caller will set refcounts correctly. */
 	snapshot->regd_count = 0;
 	snapshot->active_count = 0;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index a821e2eed1..3c3a8cc6ad 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -10,28 +10,6 @@
  * the passed-in buffer.  The caller must hold not only a pin, but at least
  * shared buffer content lock on the buffer containing the tuple.
  *
- * NOTE: When using a non-MVCC snapshot, we must check
- * TransactionIdIsInProgress (which looks in the PGXACT array)
- * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
- * pg_xact).  Otherwise we have a race condition: we might decide that a
- * just-committed transaction crashed, because none of the tests succeed.
- * xact.c is careful to record commit/abort in pg_xact before it unsets
- * MyPgXact->xid in the PGXACT array.  That fixes that problem, but it
- * also means there is a window where TransactionIdIsInProgress and
- * TransactionIdDidCommit will both return true.  If we check only
- * TransactionIdDidCommit, we could consider a tuple committed when a
- * later GetSnapshotData call will still think the originating transaction
- * is in progress, which leads to application-level inconsistency.  The
- * upshot is that we gotta check TransactionIdIsInProgress first in all
- * code paths, except for a few cases where we are looking at
- * subtransactions of our own main transaction and so there can't be any
- * race condition.
- *
- * When using an MVCC snapshot, we rely on XidInMVCCSnapshot rather than
- * TransactionIdIsInProgress, but the logic is otherwise the same: do not
- * check pg_xact until after deciding that the xact is no longer in progress.
- *
- *
  * Summary of visibility functions:
  *
  *	 HeapTupleSatisfiesMVCC()
@@ -66,7 +44,6 @@
 #include "access/htup_details.h"
 #include "access/multixact.h"
 #include "access/subtrans.h"
-#include "access/transam.h"
 #include "access/xact.h"
 #include "access/xlog.h"
 #include "storage/bufmgr.h"
@@ -81,6 +58,9 @@
 SnapshotData SnapshotSelfData = {HeapTupleSatisfiesSelf};
 SnapshotData SnapshotAnyData = {HeapTupleSatisfiesAny};
 
+/* local functions */
+static bool CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot);
+static bool IsMovedTupleVisible(HeapTuple htup, Buffer buffer);
 
 /*
  * SetHintBits()
@@ -120,7 +100,7 @@ SetHintBits(HeapTupleHeader tuple, Buffer buffer,
 	if (TransactionIdIsValid(xid))
 	{
 		/* NB: xid must be known committed here! */
-		XLogRecPtr	commitLSN = TransactionIdGetCommitLSN(xid);
+		XLogRecPtr		commitLSN = TransactionIdGetCommitLSN(xid);
 
 		if (BufferIsPermanent(buffer) && XLogNeedsFlush(commitLSN) &&
 			BufferGetLSNAtomic(buffer) < commitLSN)
@@ -176,6 +156,8 @@ bool
 HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -186,45 +168,10 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -258,17 +205,18 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot, &hintstatus);
+
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 
@@ -298,12 +246,13 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (!visible)
+		{
+			/* it must have aborted or crashed */
 			return true;
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
+		}
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -313,16 +262,15 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return true;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot, &hintstatus);
+	if (hintstatus == XID_ABORTED)
 	{
 		/* it must have aborted or crashed */
 		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 					InvalidTransactionId);
-		return true;
 	}
+	if (!visible)
+		return true;
 
 	/* xmax transaction committed */
 
@@ -377,51 +325,15 @@ HeapTupleSatisfiesToast(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
 		/*
 		 * An invalid Xmin can be left behind by a speculative insertion that
 		 * is canceled by super-deleting the tuple.  This also applies to
 		 * TOAST tuples created during speculative insertion.
 		 */
-		else if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
+		if (!TransactionIdIsValid(HeapTupleHeaderGetXmin(tuple)))
 			return false;
 	}
 
@@ -461,6 +373,7 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -471,45 +384,15 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
+			if (IsMovedTupleVisible(htup, buffer))
+				return HeapTupleMayBeUpdated;
+			else
 				return HeapTupleInvisible;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
 		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return HeapTupleInvisible;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return HeapTupleInvisible;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= curcid)
 				return HeapTupleInvisible;	/* inserted after scan started */
@@ -543,9 +426,11 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				 * left in this Xmax; otherwise, report the tuple as
 				 * locked/updated.
 				 */
-				if (!TransactionIdIsInProgress(xmax))
+				xidstatus = TransactionIdGetStatus(xmax);
+				if (xidstatus != XID_INPROGRESS)
 					return HeapTupleMayBeUpdated;
-				return HeapTupleBeingUpdated;
+				else
+					return HeapTupleBeingUpdated;
 			}
 
 			if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
@@ -589,17 +474,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			else
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
-			return HeapTupleInvisible;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return HeapTupleInvisible;
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			if (xidstatus == XID_COMMITTED)
+			{
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetXmin(tuple));
+			}
+			else
+			{
+				if (xidstatus == XID_ABORTED)
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return HeapTupleInvisible;
+			}
 		}
 	}
 
@@ -649,17 +538,21 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 				return HeapTupleInvisible;	/* updated before scan started */
 		}
 
-		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-			return HeapTupleBeingUpdated;
-
-		if (TransactionIdDidCommit(xmax))
-			return HeapTupleUpdated;
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
+		{
+			case XID_INPROGRESS:
+				return HeapTupleBeingUpdated;
+			case XID_COMMITTED:
+				return HeapTupleUpdated;
+			case XID_ABORTED:
+				break;
+		}
 
 		/*
 		 * By here, the update in the Xmax is either aborted or crashed, but
 		 * what about the other members?
 		 */
-
 		if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
 		{
 			/*
@@ -687,15 +580,18 @@ HeapTupleSatisfiesUpdate(HeapTuple htup, CommandId curcid,
 			return HeapTupleInvisible;	/* updated before scan started */
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
-		return HeapTupleBeingUpdated;
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return HeapTupleMayBeUpdated;
+		case XID_INPROGRESS:
+			return HeapTupleBeingUpdated;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return HeapTupleMayBeUpdated;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -740,6 +636,7 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 						Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -753,45 +650,10 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!TransactionIdIsInProgress(xvac))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (TransactionIdIsInProgress(xvac))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return true;
@@ -825,35 +687,39 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 			return false;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+		else
 		{
-			/*
-			 * Return the speculative token to caller.  Caller can worry about
-			 * xmax, since it requires a conclusively locked row version, and
-			 * a concurrent update to this tuple is a conflict of its
-			 * purposes.
-			 */
-			if (HeapTupleHeaderIsSpeculative(tuple))
+			xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+			switch (xidstatus)
 			{
-				snapshot->speculativeToken =
-					HeapTupleHeaderGetSpeculativeToken(tuple);
-
-				Assert(snapshot->speculativeToken != 0);
+				case XID_INPROGRESS:
+					/*
+					 * Return the speculative token to caller.  Caller can worry about
+					 * xmax, since it requires a conclusively locked row version, and
+					 * a concurrent update to this tuple is a conflict of its
+					 * purposes.
+					 */
+					if (HeapTupleHeaderIsSpeculative(tuple))
+					{
+						snapshot->speculativeToken =
+							HeapTupleHeaderGetSpeculativeToken(tuple);
+
+						Assert(snapshot->speculativeToken != 0);
+					}
+
+					snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
+					/* XXX shouldn't we fall through to look at xmax? */
+					return true;		/* in insertion by other */
+				case XID_COMMITTED:
+					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+								HeapTupleHeaderGetRawXmin(tuple));
+					break;
+				case XID_ABORTED:
+					/* it must have aborted or crashed */
+					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+								InvalidTransactionId);
+				return false;
 			}
-
-			snapshot->xmin = HeapTupleHeaderGetRawXmin(tuple);
-			/* XXX shouldn't we fall through to look at xmax? */
-			return true;		/* in insertion by other */
-		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
-		else
-		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
 		}
 	}
 
@@ -883,15 +749,19 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 
 		if (TransactionIdIsCurrentTransactionId(xmax))
 			return false;
-		if (TransactionIdIsInProgress(xmax))
+
+		xidstatus = TransactionIdGetStatus(xmax);
+		switch (xidstatus)
 		{
-			snapshot->xmax = xmax;
-			return true;
+			case XID_INPROGRESS:
+				snapshot->xmax = xmax;
+				return true;
+			case XID_COMMITTED:
+				return false;
+			case XID_ABORTED:
+				/* it must have aborted or crashed */
+				return true;
 		}
-		if (TransactionIdDidCommit(xmax))
-			return false;
-		/* it must have aborted or crashed */
-		return true;
 	}
 
 	if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmax(tuple)))
@@ -901,19 +771,20 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 
-	if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+	xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+	switch (xidstatus)
 	{
-		if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
-			snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
-		return true;
-	}
-
-	if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
-	{
-		/* it must have aborted or crashed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
-					InvalidTransactionId);
-		return true;
+		case XID_INPROGRESS:
+			if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
+				snapshot->xmax = HeapTupleHeaderGetRawXmax(tuple);
+			return true;
+		case XID_ABORTED:
+			/* it must have aborted or crashed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
+						InvalidTransactionId);
+			return true;
+		case XID_COMMITTED:
+			break;
 	}
 
 	/* xmax transaction committed */
@@ -942,28 +813,14 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
  *		transactions shown as in-progress by the snapshot
  *		transactions started after the snapshot was taken
  *		changes made by the current command
- *
- * Notice that here, we will not update the tuple status hint bits if the
- * inserting/deleting transaction is still running according to our snapshot,
- * even if in reality it's committed or aborted by now.  This is intentional.
- * Checking the true transaction state would require access to high-traffic
- * shared data structures, creating contention we'd rather do without, and it
- * would not change the result of our visibility check anyway.  The hint bits
- * will be updated by the first visitor that has a snapshot new enough to see
- * the inserting/deleting transaction as done.  In the meantime, the cost of
- * leaving the hint bits unset is basically that each HeapTupleSatisfiesMVCC
- * call will need to run TransactionIdIsCurrentTransactionId in addition to
- * XidInMVCCSnapshot (but it would have to do the latter anyway).  In the old
- * coding where we tried to set the hint bits as soon as possible, we instead
- * did TransactionIdIsInProgress in each call --- to no avail, as long as the
- * inserting/deleting transaction was still running --- which was more cycles
- * and more contention on the PGXACT array.
  */
 bool
 HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 					   Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	bool		visible;
+	TransactionIdStatus	hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -974,45 +831,10 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;
 
 		/* Used by pre-9.0 binary upgrades */
-		if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+		if (tuple->t_infomask & HEAP_MOVED)
+			return IsMovedTupleVisible(htup, buffer);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return false;
-			if (!XidInMVCCSnapshot(xvac, snapshot))
-			{
-				if (TransactionIdDidCommit(xvac))
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
-			}
-		}
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (!TransactionIdIsCurrentTransactionId(xvac))
-			{
-				if (XidInMVCCSnapshot(xvac, snapshot))
-					return false;
-				if (TransactionIdDidCommit(xvac))
-					SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-								InvalidTransactionId);
-				else
-				{
-					SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-								InvalidTransactionId);
-					return false;
-				}
-			}
-		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (HeapTupleHeaderGetCmin(tuple) >= snapshot->curcid)
 				return false;	/* inserted after scan started */
@@ -1054,25 +876,29 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		else if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						HeapTupleHeaderGetRawXmin(tuple));
 		else
 		{
-			/* it must have aborted or crashed */
-			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-						InvalidTransactionId);
-			return false;
+			visible = XidVisibleInSnapshot(HeapTupleHeaderGetXmin(tuple),
+										   snapshot, &hintstatus);
+			if (hintstatus == XID_COMMITTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+							HeapTupleHeaderGetRawXmin(tuple));
+			if (hintstatus == XID_ABORTED)
+				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+							InvalidTransactionId);
+			if (!visible)
+				return false;
 		}
 	}
 	else
 	{
 		/* xmin is committed, but maybe not according to our snapshot */
-		if (!HeapTupleHeaderXminFrozen(tuple) &&
-			XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
-			return false;		/* treat as still in progress */
+		if (!HeapTupleHeaderXminFrozen(tuple))
+		{
+			visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot);
+			if (!visible)
+				return false;		/* treat as still in progress */
+		}
 	}
 
 	/* by here, the inserting transaction has committed */
@@ -1102,12 +928,15 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 			else
 				return false;	/* deleted before scan started */
 		}
-		if (XidInMVCCSnapshot(xmax, snapshot))
-			return true;
-		if (TransactionIdDidCommit(xmax))
+
+		visible = XidVisibleInSnapshot(xmax, snapshot, &hintstatus);
+		if (visible)
 			return false;		/* updating transaction committed */
-		/* it must have aborted or crashed */
-		return true;
+		else
+		{
+			/* it must have aborted or crashed */
+			return true;
+		}
 	}
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
@@ -1120,25 +949,28 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 				return false;	/* deleted before scan started */
 		}
 
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
-			return true;
-
-		if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		visible = XidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple),
+									   snapshot, &hintstatus);
+		if (hintstatus == XID_COMMITTED)
+		{
+			/* xmax transaction committed */
+			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
+						HeapTupleHeaderGetRawXmax(tuple));
+		}
+		if (hintstatus == XID_ABORTED)
 		{
 			/* it must have aborted or crashed */
 			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 						InvalidTransactionId);
-			return true;
 		}
-
-		/* xmax transaction committed */
-		SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
-					HeapTupleHeaderGetRawXmax(tuple));
+		if (!visible)
+			return true;		/* treat as still in progress */
 	}
 	else
 	{
 		/* xmax is committed, but maybe not according to our snapshot */
-		if (XidInMVCCSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot))
+		visible = CommittedXidVisibleInSnapshot(HeapTupleHeaderGetRawXmax(tuple), snapshot);
+		if (!visible)
 			return true;		/* treat as still in progress */
 	}
 
@@ -1147,7 +979,6 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
 	return false;
 }
 
-
 /*
  * HeapTupleSatisfiesVacuum
  *
@@ -1155,16 +986,22 @@ HeapTupleSatisfiesMVCC(HeapTuple htup, Snapshot snapshot,
  *	we mainly want to know is if a tuple is potentially visible to *any*
  *	running transaction.  If so, it can't be removed yet by VACUUM.
  *
- * OldestXmin is a cutoff XID (obtained from GetOldestXmin()).  Tuples
- * deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
- * still be visible to some open transaction, so we can't remove them,
- * even if we see that the deleting transaction has committed.
+ * OldestSnapshot is a cutoff snapshot (obtained from GetOldestSnapshot()).
+ * Tuples deleted by XIDs that are still visible to OldestSnapshot are deemed
+ * "recently dead"; they might still be visible to some open transaction,
+ * so we can't remove them, even if we see that the deleting transaction
+ * has committed.
+ *
+ * Note: predicate.c calls this with a current snapshot, rather than one obtained
+ * from GetOldestSnapshot(). So even if this function determines that a tuple
+ * is not visible to anyone anymore, we can't "kill" the tuple right here.
  */
 HTSV_Result
 HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 						 Buffer buffer)
 {
 	HeapTupleHeader tuple = htup->t_data;
+	TransactionIdStatus	xidstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1179,44 +1016,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
 			return HEAPTUPLE_DEAD;
-		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_OFF)
-		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
 
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
-				return HEAPTUPLE_DEAD;
-			}
-			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-						InvalidTransactionId);
-		}
 		/* Used by pre-9.0 binary upgrades */
-		else if (tuple->t_infomask & HEAP_MOVED_IN)
+		if (tuple->t_infomask & HEAP_MOVED)
 		{
-			TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
-
-			if (TransactionIdIsCurrentTransactionId(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdIsInProgress(xvac))
-				return HEAPTUPLE_INSERT_IN_PROGRESS;
-			if (TransactionIdDidCommit(xvac))
-				SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
-							InvalidTransactionId);
+			if (IsMovedTupleVisible(htup, buffer))
+				return HEAPTUPLE_LIVE;
 			else
-			{
-				SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
-							InvalidTransactionId);
 				return HEAPTUPLE_DEAD;
-			}
 		}
-		else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
+
+		if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuple)))
 		{
 			if (tuple->t_infomask & HEAP_XMAX_INVALID)	/* xid invalid */
 				return HEAPTUPLE_INSERT_IN_PROGRESS;
@@ -1230,7 +1040,10 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* deleting subtransaction must have aborted */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuple)))
+
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmin(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 		{
 			/*
 			 * It'd be possible to discern between INSERT/DELETE in progress
@@ -1242,7 +1055,7 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			 */
 			return HEAPTUPLE_INSERT_IN_PROGRESS;
 		}
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
 						HeapTupleHeaderGetRawXmin(tuple));
 		else
@@ -1293,7 +1106,8 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			}
 			else
 			{
-				if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+				xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+				if (xidstatus == XID_INPROGRESS)
 					return HEAPTUPLE_LIVE;
 				SetHintBits(tuple, buffer, HEAP_XMAX_INVALID,
 							InvalidTransactionId);
@@ -1323,13 +1137,17 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 			/* not LOCKED_ONLY, so it has to have an xmax */
 			Assert(TransactionIdIsValid(xmax));
 
-			if (TransactionIdIsInProgress(xmax))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			else if (TransactionIdDidCommit(xmax))
-				/* there are still lockers around -- can't return DEAD here */
-				return HEAPTUPLE_RECENTLY_DEAD;
-			/* updating transaction aborted */
-			return HEAPTUPLE_LIVE;
+			switch(TransactionIdGetStatus(xmax))
+			{
+				case XID_INPROGRESS:
+					return HEAPTUPLE_DELETE_IN_PROGRESS;
+				case XID_COMMITTED:
+					/* there are still lockers around -- can't return DEAD here */
+					return HEAPTUPLE_RECENTLY_DEAD;
+				case XID_ABORTED:
+					/* updating transaction aborted */
+					return HEAPTUPLE_LIVE;
+			}
 		}
 
 		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
@@ -1339,8 +1157,12 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 		/* not LOCKED_ONLY, so it has to have an xmax */
 		Assert(TransactionIdIsValid(xmax));
 
-		/* multi is not running -- updating xact cannot be */
-		Assert(!TransactionIdIsInProgress(xmax));
+		/*
+		 * multi is not running -- updating xact cannot be (this assertion
+		 * won't catch a running subtransaction)
+		 */
+		Assert(!TransactionIdIsActive(xmax));
+
 		if (TransactionIdDidCommit(xmax))
 		{
 			if (!TransactionIdPrecedes(xmax, OldestXmin))
@@ -1359,9 +1181,11 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 
 	if (!(tuple->t_infomask & HEAP_XMAX_COMMITTED))
 	{
-		if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuple)))
+		xidstatus = TransactionIdGetStatus(HeapTupleHeaderGetRawXmax(tuple));
+
+		if (xidstatus == XID_INPROGRESS)
 			return HEAPTUPLE_DELETE_IN_PROGRESS;
-		else if (TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuple)))
+		else if (xidstatus == XID_COMMITTED)
 			SetHintBits(tuple, buffer, HEAP_XMAX_COMMITTED,
 						HeapTupleHeaderGetRawXmax(tuple));
 		else
@@ -1471,127 +1295,95 @@ HeapTupleIsSurelyDead(HeapTuple htup, TransactionId OldestXmin)
 }
 
 /*
- * XidInMVCCSnapshot
- *		Is the given XID still-in-progress according to the snapshot?
+ * XidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
  *
- * Note: GetSnapshotData never stores either top xid or subxids of our own
- * backend into a snapshot, so these xids will not be reported as "running"
- * by this function.  This is OK for current uses, because we always check
- * TransactionIdIsCurrentTransactionId first, except when it's known the
- * XID could not be ours anyway.
+ * On return, *hintstatus is set to indicate if the transaction had committed,
+ * or aborted, whether or not it's not visible to us.
  */
 bool
-XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
+XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					 TransactionIdStatus *hintstatus)
 {
-	uint32		i;
+	CommitSeqNo csn;
 
-	/*
-	 * Make a quick range check to eliminate most XIDs without looking at the
-	 * xip arrays.  Note that this is OK even if we convert a subxact XID to
-	 * its parent below, because a subxact with XID < xmin has surely also got
-	 * a parent with XID < xmin, while one with XID >= xmax must belong to a
-	 * parent that was not yet committed at the time of this snapshot.
-	 */
-
-	/* Any xid < xmin is not in-progress */
-	if (TransactionIdPrecedes(xid, snapshot->xmin))
-		return false;
-	/* Any xid >= xmax is in-progress */
-	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
-		return true;
+	*hintstatus = XID_INPROGRESS;
 
 	/*
-	 * Snapshot information is stored slightly differently in snapshots taken
-	 * during recovery.
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here).
+	 *
+	 * We can't do anything useful with xmin, because the xmin only tells us
+	 * whether we see it as completed. We have to check the transaction log to
+	 * see if the transaction committed or aborted, in any case.
 	 */
-	if (!snapshot->takenDuringRecovery)
-	{
-		/*
-		 * If the snapshot contains full subxact data, the fastest way to
-		 * check things is just to compare the given XID against both subxact
-		 * XIDs and top-level XIDs.  If the snapshot overflowed, we have to
-		 * use pg_subtrans to convert a subxact XID to its parent XID, but
-		 * then we need only look at top-level XIDs not subxacts.
-		 */
-		if (!snapshot->suboverflowed)
-		{
-			/* we have full data, so search subxip */
-			int32		j;
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
 
-			for (j = 0; j < snapshot->subxcnt; j++)
-			{
-				if (TransactionIdEquals(xid, snapshot->subxip[j]))
-					return true;
-			}
+	csn = TransactionIdGetCommitSeqNo(xid);
 
-			/* not there, fall through to search xip[] */
-		}
+	if (COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		*hintstatus = XID_COMMITTED;
+		if (csn < snapshot->snapshotcsn)
+			return true;
 		else
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
-
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
-
-		for (i = 0; i < snapshot->xcnt; i++)
-		{
-			if (TransactionIdEquals(xid, snapshot->xip[i]))
-				return true;
-		}
+			return false;
 	}
 	else
 	{
-		int32		j;
+		if (csn == COMMITSEQNO_ABORTED)
+			*hintstatus = XID_ABORTED;
+		return false;
+	}
+}
 
-		/*
-		 * In recovery we store all xids in the subxact array because it is by
-		 * far the bigger array, and we mostly don't know which xids are
-		 * top-level and which are subxacts. The xip array is empty.
-		 *
-		 * We start by searching subtrans, if we overflowed.
-		 */
-		if (snapshot->suboverflowed)
-		{
-			/*
-			 * Snapshot overflowed, so convert xid to top-level.  This is safe
-			 * because we eliminated too-old XIDs above.
-			 */
-			xid = SubTransGetTopmostTransaction(xid);
+/*
+ * CommittedXidVisibleInSnapshot
+ *		Is the given XID visible according to the snapshot?
+ *
+ * This is the same as XidVisibleInSnapshot, but the caller knows that the
+ * given XID committed. The only question is whether it's visible to our
+ * snapshot or not.
+ */
+static bool
+CommittedXidVisibleInSnapshot(TransactionId xid, Snapshot snapshot)
+{
+	CommitSeqNo csn;
 
-			/*
-			 * If xid was indeed a subxact, we might now have an xid < xmin,
-			 * so recheck to avoid an array scan.  No point in rechecking
-			 * xmax.
-			 */
-			if (TransactionIdPrecedes(xid, snapshot->xmin))
-				return false;
-		}
+	/*
+	 * Make a quick range check to eliminate most XIDs without looking at the
+	 * CSN log.
+	 */
+	if (TransactionIdPrecedes(xid, snapshot->xmin))
+		return true;
+
+	/*
+	 * Any xid >= xmax is in-progress (or aborted, but we don't distinguish
+	 * that here.
+	 */
+	if (TransactionIdFollowsOrEquals(xid, snapshot->xmax))
+		return false;
 
+	csn = TransactionIdGetCommitSeqNo(xid);
+
+	if (!COMMITSEQNO_IS_COMMITTED(csn))
+	{
+		elog(WARNING, "transaction %u was hinted as committed, but was not marked as committed in the transaction log", xid);
 		/*
-		 * We now have either a top-level xid higher than xmin or an
-		 * indeterminate xid. We don't know whether it's top level or subxact
-		 * but it doesn't matter. If it's present, the xid is visible.
+		 * We have contradicting evidence on whether the transaction committed or
+		 * not. Let's assume that it did. That seems better than erroring out.
 		 */
-		for (j = 0; j < snapshot->subxcnt; j++)
-		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-				return true;
-		}
+		return true;
 	}
 
-	return false;
+	if (csn < snapshot->snapshotcsn)
+		return true;
+	else
+		return false;
 }
 
+
 /*
  * Is the tuple really only locked?  That is, is it not updated?
  *
@@ -1605,6 +1397,7 @@ bool
 HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 {
 	TransactionId xmax;
+	TransactionIdStatus	xidstatus;
 
 	/* if there's no valid Xmax, then there's obviously no update either */
 	if (tuple->t_infomask & HEAP_XMAX_INVALID)
@@ -1632,9 +1425,11 @@ HeapTupleHeaderIsOnlyLocked(HeapTupleHeader tuple)
 
 	if (TransactionIdIsCurrentTransactionId(xmax))
 		return false;
-	if (TransactionIdIsInProgress(xmax))
+
+	xidstatus = TransactionIdGetStatus(xmax);
+	if (xidstatus == XID_INPROGRESS)
 		return false;
-	if (TransactionIdDidCommit(xmax))
+	if (xidstatus == XID_COMMITTED)
 		return false;
 
 	/*
@@ -1675,6 +1470,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	HeapTupleHeader tuple = htup->t_data;
 	TransactionId xmin = HeapTupleHeaderGetXmin(tuple);
 	TransactionId xmax = HeapTupleHeaderGetRawXmax(tuple);
+	TransactionIdStatus hintstatus;
 
 	Assert(ItemPointerIsValid(&htup->t_self));
 	Assert(htup->t_tableOid != InvalidOid);
@@ -1686,7 +1482,7 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		return false;
 	}
 	/* check if it's one of our txids, toplevel is also in there */
-	else if (TransactionIdInArray(xmin, snapshot->subxip, snapshot->subxcnt))
+	else if (TransactionIdInArray(xmin, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin = HeapTupleHeaderGetRawCommandId(tuple);
@@ -1697,7 +1493,8 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		 * cmin/cmax was stored in a combocid. So we need to lookup the actual
 		 * values externally.
 		 */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1710,34 +1507,11 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 			return false;		/* inserted after scan started */
 		/* fall through */
 	}
-	/* committed before our xmin horizon. Do a normal visibility check. */
-	else if (TransactionIdPrecedes(xmin, snapshot->xmin))
-	{
-		Assert(!(HeapTupleHeaderXminCommitted(tuple) &&
-				 !TransactionIdDidCommit(xmin)));
-
-		/* check for hint bit first, consult clog afterwards */
-		if (!HeapTupleHeaderXminCommitted(tuple) &&
-			!TransactionIdDidCommit(xmin))
-			return false;
-		/* fall through */
-	}
-	/* beyond our xmax horizon, i.e. invisible */
-	else if (TransactionIdFollowsOrEquals(xmin, snapshot->xmax))
-	{
-		return false;
-	}
-	/* check if it's a committed transaction in [xmin, xmax) */
-	else if (TransactionIdInArray(xmin, snapshot->xip, snapshot->xcnt))
-	{
-		/* fall through */
-	}
-
 	/*
-	 * none of the above, i.e. between [xmin, xmax) but hasn't committed. I.e.
-	 * invisible.
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
 	 */
-	else
+	else if (!XidVisibleInSnapshot(xmin, snapshot, &hintstatus))
 	{
 		return false;
 	}
@@ -1761,14 +1535,15 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 	}
 
 	/* check if it's one of our txids, toplevel is also in there */
-	if (TransactionIdInArray(xmax, snapshot->subxip, snapshot->subxcnt))
+	if (TransactionIdInArray(xmax, snapshot->this_xip, snapshot->this_xcnt))
 	{
 		bool		resolved;
 		CommandId	cmin;
 		CommandId	cmax = HeapTupleHeaderGetRawCommandId(tuple);
 
 		/* Lookup actual cmin/cmax values */
-		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(), snapshot,
+		resolved = ResolveCminCmaxDuringDecoding(HistoricSnapshotGetTupleCids(),
+												 snapshot,
 												 htup, buffer,
 												 &cmin, &cmax);
 
@@ -1782,26 +1557,74 @@ HeapTupleSatisfiesHistoricMVCC(HeapTuple htup, Snapshot snapshot,
 		else
 			return false;		/* deleted before scan started */
 	}
-	/* below xmin horizon, normal transaction state is valid */
-	else if (TransactionIdPrecedes(xmax, snapshot->xmin))
-	{
-		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED &&
-				 !TransactionIdDidCommit(xmax)));
+	/*
+	 * it's not "this" transaction. Do a normal visibility check using the
+	 * snapshot.
+	 */
+	if (XidVisibleInSnapshot(xmax, snapshot, &hintstatus))
+		return false;
+	else
+		return true;
+}
 
-		/* check hint bit first */
-		if (tuple->t_infomask & HEAP_XMAX_COMMITTED)
-			return false;
 
-		/* check clog */
-		return !TransactionIdDidCommit(xmax);
+/*
+ * Check the visibility on a tuple with HEAP_MOVED flags set.
+ *
+ * Returns true if the tuple is visible, false otherwise. These flags are
+ * no longer used, any such tuples must've come from binary upgrade of a
+ * pre-9.0 system, so we can assume that the xid is long finished by now.
+ */
+static bool
+IsMovedTupleVisible(HeapTuple htup, Buffer buffer)
+{
+	HeapTupleHeader tuple = htup->t_data;
+	TransactionId xvac = HeapTupleHeaderGetXvac(tuple);
+	TransactionIdStatus xidstatus;
+
+	/*
+	 * Check that the xvac is not a live transaction. This should never
+	 * happen, because HEAP_MOVED flags are not set by current code.
+	 */
+	if (TransactionIdIsCurrentTransactionId(xvac))
+		elog(ERROR, "HEAP_MOVED tuple with in-progress xvac: %u", xvac);
+
+	xidstatus = TransactionIdGetStatus(xvac);
+
+	if (tuple->t_infomask & HEAP_MOVED_OFF)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+	}
+	/* Used by pre-9.0 binary upgrades */
+	else if (tuple->t_infomask & HEAP_MOVED_IN)
+	{
+		if (xidstatus == XID_COMMITTED)
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_COMMITTED,
+						InvalidTransactionId);
+			return true;
+		}
+		else
+		{
+			SetHintBits(tuple, buffer, HEAP_XMIN_INVALID,
+						InvalidTransactionId);
+			return false;
+		}
 	}
-	/* above xmax horizon, we cannot possibly see the deleting transaction */
-	else if (TransactionIdFollowsOrEquals(xmax, snapshot->xmax))
-		return true;
-	/* xmax is between [xmin, xmax), check known committed array */
-	else if (TransactionIdInArray(xmax, snapshot->xip, snapshot->xcnt))
-		return false;
-	/* xmax is between [xmin, xmax), but known not to have committed yet */
 	else
-		return true;
+	{
+		elog(ERROR, "IsMovedTupleVisible() called on a non-moved tuple");
+		return true; /* keep compiler quiet */
+	}
 }
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index bb2bc065ef..f93fdc472d 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -201,12 +201,12 @@ static const char *backend_options = "--single -F -O -j -c search_path=pg_catalo
 static const char *const subdirs[] = {
 	"global",
 	"pg_wal/archive_status",
+	"pg_csnlog",
 	"pg_commit_ts",
 	"pg_dynshmem",
 	"pg_notify",
 	"pg_serial",
 	"pg_snapshots",
-	"pg_subtrans",
 	"pg_twophase",
 	"pg_multixact",
 	"pg_multixact/members",
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 7bae0902b5..0755ffd864 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -17,16 +17,19 @@
 /*
  * Possible transaction statuses --- note that all-zeroes is the initial
  * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
  */
-typedef int XidStatus;
+typedef int CLogXidStatus;
+
+#define CLOG_XID_STATUS_IN_PROGRESS		0x00
+#define CLOG_XID_STATUS_COMMITTED		0x01
+#define CLOG_XID_STATUS_ABORTED			0x02
 
-#define TRANSACTION_STATUS_IN_PROGRESS		0x00
-#define TRANSACTION_STATUS_COMMITTED		0x01
-#define TRANSACTION_STATUS_ABORTED			0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED	0x03
+/*
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet. We don't create these anymore, but accept
+ * them in existing clog, if we've been pg_upgraded from an older version.
+ */
+#define CLOG_XID_STATUS_SUB_COMMITTED	0x03
 
 typedef struct xl_clog_truncate
 {
@@ -35,9 +38,9 @@ typedef struct xl_clog_truncate
 	Oid			oldestXactDb;
 } xl_clog_truncate;
 
-extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
-						   TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern void CLogSetTreeStatus(TransactionId xid, int nsubxids,
+				  TransactionId *subxids, CLogXidStatus status, XLogRecPtr lsn);
+extern CLogXidStatus CLogGetStatus(TransactionId xid, XLogRecPtr *lsn);
 
 extern Size CLOGShmemBuffers(void);
 extern Size CLOGShmemSize(void);
diff --git a/src/include/access/csnlog.h b/src/include/access/csnlog.h
new file mode 100644
index 0000000000..165effbee6
--- /dev/null
+++ b/src/include/access/csnlog.h
@@ -0,0 +1,33 @@
+/*
+ * csnlog.h
+ *
+ * Commit-Sequence-Number log.
+ *
+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clog.h
+ */
+#ifndef CSNLOG_H
+#define CSNLOG_H
+
+#include "access/xlog.h"
+
+extern void CSNLogSetCommitSeqNo(TransactionId xid, int nsubxids,
+					 TransactionId *subxids, CommitSeqNo csn);
+extern CommitSeqNo CSNLogGetCommitSeqNo(TransactionId xid);
+extern TransactionId CSNLogGetNextActiveXid(TransactionId start,
+											TransactionId end);
+
+extern Size CSNLOGShmemBuffers(void);
+extern Size CSNLOGShmemSize(void);
+extern void CSNLOGShmemInit(void);
+extern void BootStrapCSNLOG(void);
+extern void StartupCSNLOG(TransactionId oldestActiveXID);
+extern void TrimCSNLOG(void);
+extern void ShutdownCSNLOG(void);
+extern void CheckPointCSNLOG(void);
+extern void ExtendCSNLOG(TransactionId newestXact);
+extern void TruncateCSNLOG(TransactionId oldestXact);
+
+#endif   /* CSNLOG_H */
diff --git a/src/include/access/mvccvars.h b/src/include/access/mvccvars.h
new file mode 100644
index 0000000000..66de5a8ea6
--- /dev/null
+++ b/src/include/access/mvccvars.h
@@ -0,0 +1,86 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvccvars.h
+ *	  Shared memory variables for XID assignment and snapshots
+ *
+ *
+ * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/mvccvars.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MVCCVARS_H
+#define MVCCVARS_H
+
+#include "port/atomics.h"
+
+/*
+ * VariableCache is a data structure in shared memory that is used to track
+ * OID and XID assignment state.  For largely historical reasons, there is
+ * just one struct with different fields that are protected by different
+ * LWLocks.
+ *
+ * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
+ * used just to generate useful messages when xidWarnLimit or xidStopLimit
+ * are exceeded.
+ */
+typedef struct VariableCacheData
+{
+	/*
+	 * These fields are protected by OidGenLock.
+	 */
+	Oid			nextOid;		/* next OID to assign */
+	uint32		oidCount;		/* OIDs available before must do XLOG work */
+
+	/*
+	 * These fields are protected by XidGenLock.
+	 */
+	TransactionId nextXid;		/* next XID to assign */
+
+	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
+	TransactionId xidVacLimit;	/* start forcing autovacuums here */
+	TransactionId xidWarnLimit; /* start complaining here */
+	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
+	TransactionId xidWrapLimit; /* where the world ends */
+	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
+
+
+	/*
+	 * Fields related to MVCC snapshots.
+	 *
+	 * lastCommitSeqNo is the CSN assigned to last committed transaction.
+	 * It is protected by CommitSeqNoLock.
+	 *
+	 * latestCompletedXid is the highest XID that has committed. Anything
+	 * > this is seen by still in-progress by everyone. Use atomic ops to
+	 * update.
+	 *
+	 * oldestActiveXid is the XID of the oldest transaction that's still
+	 * in-progress. (Or rather, the oldest XID among all still in-progress
+	 * transactions; it's not necessarily the one that started first).
+	 * Must hold ProcArrayLock in shared mode, and use atomic ops, to update.
+	 */
+	pg_atomic_uint64 nextCommitSeqNo;
+	pg_atomic_uint32 latestCompletedXid;
+	pg_atomic_uint32 oldestActiveXid;
+
+	/*
+	 * These fields are protected by CommitTsLock
+	 */
+	TransactionId oldestCommitTsXid;
+	TransactionId newestCommitTsXid;
+
+	/*
+	 * These fields are protected by CLogTruncationLock
+	 */
+	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
+} VariableCacheData;
+
+typedef VariableCacheData *VariableCache;
+
+/* in transam/varsup.c */
+extern PGDLLIMPORT VariableCache ShmemVariableCache;
+
+#endif   /* MVCCVARS_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index 20114c4d44..1ae022771a 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -105,6 +105,8 @@ typedef struct SlruSharedData
 } SlruSharedData;
 
 typedef SlruSharedData *SlruShared;
+typedef struct HTAB HTAB;
+typedef struct PageSlotEntry PageSlotEntry;
 
 /*
  * SlruCtlData is an unshared structure that points to the active information
@@ -113,6 +115,7 @@ typedef SlruSharedData *SlruShared;
 typedef struct SlruCtlData
 {
 	SlruShared	shared;
+	HTAB *pageToSlot;
 
 	/*
 	 * This flag tells whether to fsync writes (true for pg_xact and multixact
@@ -145,6 +148,8 @@ extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
 				  TransactionId xid);
 extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
 						   TransactionId xid);
+extern int SimpleLruReadPage_ReadOnly_Locked(SlruCtl ctl, int pageno,
+						   TransactionId xid);
 extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
 extern void SimpleLruFlush(SlruCtl ctl, bool allow_redirtied);
 extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/subtrans.h b/src/include/access/subtrans.h
index 41716d7b71..92267be465 100644
--- a/src/include/access/subtrans.h
+++ b/src/include/access/subtrans.h
@@ -11,20 +11,9 @@
 #ifndef SUBTRANS_H
 #define SUBTRANS_H
 
-/* Number of SLRU buffers to use for subtrans */
-#define NUM_SUBTRANS_BUFFERS	32
-
+/* these are in csnlog.c now */
 extern void SubTransSetParent(TransactionId xid, TransactionId parent);
 extern TransactionId SubTransGetParent(TransactionId xid);
 extern TransactionId SubTransGetTopmostTransaction(TransactionId xid);
 
-extern Size SUBTRANSShmemSize(void);
-extern void SUBTRANSShmemInit(void);
-extern void BootStrapSUBTRANS(void);
-extern void StartupSUBTRANS(TransactionId oldestActiveXID);
-extern void ShutdownSUBTRANS(void);
-extern void CheckPointSUBTRANS(void);
-extern void ExtendSUBTRANS(TransactionId newestXact);
-extern void TruncateSUBTRANS(TransactionId oldestXact);
-
-#endif							/* SUBTRANS_H */
+#endif   /* SUBTRANS_H */
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 86076dede1..7a3839ce19 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -93,57 +93,6 @@
 #define FirstBootstrapObjectId	10000
 #define FirstNormalObjectId		16384
 
-/*
- * VariableCache is a data structure in shared memory that is used to track
- * OID and XID assignment state.  For largely historical reasons, there is
- * just one struct with different fields that are protected by different
- * LWLocks.
- *
- * Note: xidWrapLimit and oldestXidDB are not "active" values, but are
- * used just to generate useful messages when xidWarnLimit or xidStopLimit
- * are exceeded.
- */
-typedef struct VariableCacheData
-{
-	/*
-	 * These fields are protected by OidGenLock.
-	 */
-	Oid			nextOid;		/* next OID to assign */
-	uint32		oidCount;		/* OIDs available before must do XLOG work */
-
-	/*
-	 * These fields are protected by XidGenLock.
-	 */
-	TransactionId nextXid;		/* next XID to assign */
-
-	TransactionId oldestXid;	/* cluster-wide minimum datfrozenxid */
-	TransactionId xidVacLimit;	/* start forcing autovacuums here */
-	TransactionId xidWarnLimit; /* start complaining here */
-	TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
-	TransactionId xidWrapLimit; /* where the world ends */
-	Oid			oldestXidDB;	/* database with minimum datfrozenxid */
-
-	/*
-	 * These fields are protected by CommitTsLock
-	 */
-	TransactionId oldestCommitTsXid;
-	TransactionId newestCommitTsXid;
-
-	/*
-	 * These fields are protected by ProcArrayLock.
-	 */
-	TransactionId latestCompletedXid;	/* newest XID that has committed or
-										 * aborted */
-
-	/*
-	 * These fields are protected by CLogTruncationLock
-	 */
-	TransactionId oldestClogXid;	/* oldest it's safe to look up in clog */
-
-} VariableCacheData;
-
-typedef VariableCacheData *VariableCache;
-
 
 /* ----------------
  *		extern declarations
@@ -153,15 +102,44 @@ typedef VariableCacheData *VariableCache;
 /* in transam/xact.c */
 extern bool TransactionStartedDuringRecovery(void);
 
-/* in transam/varsup.c */
-extern PGDLLIMPORT VariableCache ShmemVariableCache;
-
 /*
  * prototypes for functions in transam/transam.c
  */
 extern bool TransactionIdDidCommit(TransactionId transactionId);
 extern bool TransactionIdDidAbort(TransactionId transactionId);
-extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
+
+
+#define COMMITSEQNO_INPROGRESS	UINT64CONST(0x0)
+#define COMMITSEQNO_ABORTED		UINT64CONST(0x1)
+/*
+ * COMMITSEQNO_COMMITING is an intermediate state that is used to set CSN
+ * atomically for a top level transaction and its subtransactions.
+ * High-level users should not see this value, see TransactionIdGetCommitSeqNo().
+ */
+#define COMMITSEQNO_COMMITTING	UINT64CONST(0x2)
+#define COMMITSEQNO_FROZEN		UINT64CONST(0x3)
+#define COMMITSEQNO_FIRST_NORMAL UINT64CONST(0x4)
+
+#define COMMITSEQNO_IS_INPROGRESS(csn) ((csn) == COMMITSEQNO_INPROGRESS)
+#define COMMITSEQNO_IS_ABORTED(csn) ((csn) == COMMITSEQNO_ABORTED)
+#define COMMITSEQNO_IS_FROZEN(csn) ((csn) == COMMITSEQNO_FROZEN)
+#define COMMITSEQNO_IS_NORMAL(csn) ((csn) >= COMMITSEQNO_FIRST_NORMAL)
+#define COMMITSEQNO_IS_COMMITTING(csn) ((csn) == COMMITSEQNO_COMMITTING)
+#define COMMITSEQNO_IS_COMMITTED(csn) ((csn) >= COMMITSEQNO_FROZEN && !COMMITSEQNO_IS_SUBTRANS(csn))
+
+#define CSN_SUBTRANS_BIT		(UINT64CONST(1)<<63)
+
+#define COMMITSEQNO_IS_SUBTRANS(csn) ((csn) & CSN_SUBTRANS_BIT)
+
+typedef enum
+{
+	XID_COMMITTED,
+	XID_ABORTED,
+	XID_INPROGRESS
+} TransactionIdStatus;
+
+extern CommitSeqNo TransactionIdGetCommitSeqNo(TransactionId xid);
+extern TransactionIdStatus TransactionIdGetStatus(TransactionId transactionId);
 extern void TransactionIdAbort(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 118b0a8432..015cbe58b2 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -135,7 +135,7 @@ typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
 #define XLOG_XACT_ABORT				0x20
 #define XLOG_XACT_COMMIT_PREPARED	0x30
 #define XLOG_XACT_ABORT_PREPARED	0x40
-#define XLOG_XACT_ASSIGNMENT		0x50
+/* free opcode 0x50 */
 /* free opcode 0x60 */
 /* free opcode 0x70 */
 
@@ -334,7 +334,6 @@ extern TransactionId GetCurrentTransactionId(void);
 extern TransactionId GetCurrentTransactionIdIfAny(void);
 extern TransactionId GetStableLatestTransactionId(void);
 extern SubTransactionId GetCurrentSubTransactionId(void);
-extern void MarkCurrentTransactionIdLoggedIfAny(void);
 extern bool SubTransactionIsActive(SubTransactionId subxid);
 extern CommandId GetCurrentCommandId(bool used);
 extern TimestampTz GetCurrentTransactionStartTimestamp(void);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 8fd6010ba0..676c12df36 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -52,11 +52,6 @@ extern bool InRecovery;
  * we haven't yet processed a RUNNING_XACTS or shutdown-checkpoint WAL record
  * to initialize our master-transaction tracking system.
  *
- * When the transaction tracking is initialized, we enter the SNAPSHOT_PENDING
- * state. The tracked information might still be incomplete, so we can't allow
- * connections yet, but redo functions must update the in-memory state when
- * appropriate.
- *
  * In SNAPSHOT_READY mode, we have full knowledge of transactions that are
  * (or were) running in the master at the current WAL location. Snapshots
  * can be taken, and read-only queries can be run.
@@ -65,13 +60,12 @@ typedef enum
 {
 	STANDBY_DISABLED,
 	STANDBY_INITIALIZED,
-	STANDBY_SNAPSHOT_PENDING,
 	STANDBY_SNAPSHOT_READY
 } HotStandbyState;
 
 extern HotStandbyState standbyState;
 
-#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_PENDING)
+#define InHotStandby (standbyState >= STANDBY_SNAPSHOT_READY)
 
 /*
  * Recovery target type.
diff --git a/src/include/c.h b/src/include/c.h
index a61428843a..702658b089 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -462,6 +462,13 @@ typedef uint32 CommandId;
 #define InvalidCommandId	(~(CommandId)0)
 
 /*
+ * CommitSeqNo is currently an LSN, but keep use a separate datatype for clarity.
+ */
+typedef uint64 CommitSeqNo;
+
+#define InvalidCommitSeqNo		((CommitSeqNo) 0)
+
+/*
  * Array indexing support
  */
 #define MAXDIM 6
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c969375981..090d94b4b1 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5085,8 +5085,6 @@ DATA(insert OID = 2945 (  txid_snapshot_xmin		PGNSP PGUID 12 1  0 0 0 f f f f t
 DESCR("get xmin of snapshot");
 DATA(insert OID = 2946 (  txid_snapshot_xmax		PGNSP PGUID 12 1  0 0 0 f f f f t f i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xmax _null_ _null_ _null_ ));
 DESCR("get xmax of snapshot");
-DATA(insert OID = 2947 (  txid_snapshot_xip			PGNSP PGUID 12 1 50 0 0 f f f f t t i s 1 0 20 "2970" _null_ _null_ _null_ _null_ _null_ txid_snapshot_xip _null_ _null_ _null_ ));
-DESCR("get set of in-progress txids in snapshot");
 DATA(insert OID = 2948 (  txid_visible_in_snapshot	PGNSP PGUID 12 1  0 0 0 f f f f t f i s 2 0 16 "20 2970" _null_ _null_ _null_ _null_ _null_ txid_visible_in_snapshot _null_ _null_ _null_ ));
 DESCR("is txid visible in snapshot?");
 DATA(insert OID = 3360 (  txid_status				PGNSP PGUID 12 1  0 0 0 f f f f t f v s 1 0 25 "20" _null_ _null_ _null_ _null_ _null_ txid_status _null_ _null_ _null_ ));
diff --git a/src/include/replication/snapbuild.h b/src/include/replication/snapbuild.h
index 7653717f83..6e93a9033f 100644
--- a/src/include/replication/snapbuild.h
+++ b/src/include/replication/snapbuild.h
@@ -20,30 +20,14 @@ typedef enum
 	/*
 	 * Initial state, we can't do much yet.
 	 */
-	SNAPBUILD_START = -1,
+	SNAPBUILD_START,
 
 	/*
-	 * Collecting committed transactions, to build the initial catalog
-	 * snapshot.
+	 * Found a point after hitting built_full_snapshot where all transactions
+	 * that were running at that point finished. Till we reach that we hold
+	 * off calling any commit callbacks.
 	 */
-	SNAPBUILD_BUILDING_SNAPSHOT = 0,
-
-	/*
-	 * We have collected enough information to decode tuples in transactions
-	 * that started after this.
-	 *
-	 * Once we reached this we start to collect changes. We cannot apply them
-	 * yet, because they might be based on transactions that were still
-	 * running when FULL_SNAPSHOT was reached.
-	 */
-	SNAPBUILD_FULL_SNAPSHOT = 1,
-
-	/*
-	 * Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
-	 * were running at that point finished. Till we reach that we hold off
-	 * calling any commit callbacks.
-	 */
-	SNAPBUILD_CONSISTENT = 2
+	SNAPBUILD_CONSISTENT
 } SnapBuildState;
 
 /* forward declare so we don't have to expose the struct to the public */
@@ -57,10 +41,8 @@ struct ReorderBuffer;
 struct xl_heap_new_cid;
 struct xl_running_xacts;
 
-extern void CheckPointSnapBuild(void);
-
 extern SnapBuild *AllocateSnapshotBuilder(struct ReorderBuffer *cache,
-						TransactionId xmin_horizon, XLogRecPtr start_lsn,
+						XLogRecPtr start_lsn,
 						bool need_full_snapshot);
 extern void FreeSnapshotBuilder(SnapBuild *cache);
 
@@ -85,6 +67,7 @@ extern void SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid,
 					   XLogRecPtr lsn, struct xl_heap_new_cid *cid);
 extern void SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn,
 							 struct xl_running_xacts *running);
-extern void SnapBuildSerializationPoint(SnapBuild *builder, XLogRecPtr lsn);
+extern void SnapBuildProcessInitialSnapshot(SnapBuild *builder, XLogRecPtr lsn,
+								TransactionId xmin, TransactionId xmax);
 
 #endif							/* SNAPBUILD_H */
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 596fdadc63..f54a6c6d70 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -197,7 +197,7 @@ typedef enum BuiltinTrancheIds
 {
 	LWTRANCHE_CLOG_BUFFERS = NUM_INDIVIDUAL_LWLOCKS,
 	LWTRANCHE_COMMITTS_BUFFERS,
-	LWTRANCHE_SUBTRANS_BUFFERS,
+	LWTRANCHE_CSNLOG_BUFFERS,
 	LWTRANCHE_MXACTOFFSET_BUFFERS,
 	LWTRANCHE_MXACTMEMBER_BUFFERS,
 	LWTRANCHE_ASYNC_BUFFERS,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 205f484510..bc611fd8cc 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -23,24 +23,6 @@
 #include "storage/proclist_types.h"
 
 /*
- * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
- * for non-aborted subtransactions of its current top transaction.  These
- * have to be treated as running XIDs by other backends.
- *
- * We also keep track of whether the cache overflowed (ie, the transaction has
- * generated at least one subtransaction that didn't fit in the cache).
- * If none of the caches have overflowed, we can assume that an XID that's not
- * listed anywhere in the PGPROC array is not a running transaction.  Else we
- * have to look at pg_subtrans.
- */
-#define PGPROC_MAX_CACHED_SUBXIDS 64	/* XXX guessed-at value */
-
-struct XidCache
-{
-	TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
-};
-
-/*
  * Flags for PGXACT->vacuumFlags
  *
  * Note: If you modify these flags, you need to modify PROCARRAY_XXX flags
@@ -77,6 +59,14 @@ struct XidCache
 #define INVALID_PGPROCNO		PG_INT32_MAX
 
 /*
+ * The number of subtransactions below which we consider to apply clog group
+ * update optimization.  Testing reveals that the number higher than this can
+ * hurt performance.
+ */
+#define THRESHOLD_SUBTRANS_CLOG_OPT	5
+
+
+/*
  * Each backend has a PGPROC struct in shared memory.  There is also a list of
  * currently-unused PGPROC structs that will be reallocated to new backends.
  *
@@ -156,8 +146,6 @@ struct PGPROC
 	 */
 	SHM_QUEUE	myProcLocks[NUM_LOCK_PARTITIONS];
 
-	struct XidCache subxids;	/* cache for subtransaction XIDs */
-
 	/* Support for group XID clearing. */
 	/* true, if member of ProcArray group waiting for XID clear */
 	bool		procArrayGroupMember;
@@ -176,12 +164,14 @@ struct PGPROC
 	bool		clogGroupMember;	/* true, if member of clog group */
 	pg_atomic_uint32 clogGroupNext; /* next clog group member */
 	TransactionId clogGroupMemberXid;	/* transaction id of clog group member */
-	XidStatus	clogGroupMemberXidStatus;	/* transaction status of clog
+	CLogXidStatus	clogGroupMemberXidStatus;	/* transaction status of clog
 											 * group member */
 	int			clogGroupMemberPage;	/* clog page corresponding to
 										 * transaction id of clog group member */
 	XLogRecPtr	clogGroupMemberLsn; /* WAL location of commit record for clog
 									 * group member */
+	TransactionId		clogGroupSubxids[THRESHOLD_SUBTRANS_CLOG_OPT];
+	int					clogGroupNSubxids;
 
 	/* Per-backend LWLock.  Protects fields below (but not group fields). */
 	LWLock		backendLock;
@@ -215,6 +205,9 @@ extern PGDLLIMPORT struct PGXACT *MyPgXact;
  * considerably on systems with many CPU cores, by reducing the number of
  * cache lines needing to be fetched.  Thus, think very carefully before adding
  * anything else here.
+ *
+ * XXX: GetSnapshotData no longer does that, so perhaps we should put these
+ * back to PGPROC for simplicity's sake.
  */
 typedef struct PGXACT
 {
@@ -224,15 +217,17 @@ typedef struct PGXACT
 
 	TransactionId xmin;			/* minimal running XID as it was when we were
 								 * starting our xact, excluding LAZY VACUUM:
-								 * vacuum must not remove tuples deleted by
 								 * xid >= xmin ! */
 
+	CommitSeqNo	snapshotcsn;	/* oldest snapshot in use in this backend:
+								 * vacuum must not remove tuples deleted by
+								 * xacts with commit seqno > snapshotcsn !
+								 * XXX: currently unused, vacuum uses just xmin, still.
+								 */
+
 	uint8		vacuumFlags;	/* vacuum-related flags, see above */
-	bool		overflowed;
 	bool		delayChkpt;		/* true if this proc delays checkpoint start;
 								 * previously called InCommit */
-
-	uint8		nxids;
 } PGXACT;
 
 /*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 174c537be4..1e54b5d92c 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -58,25 +58,18 @@
 extern Size ProcArrayShmemSize(void);
 extern void CreateSharedProcArray(void);
 extern void ProcArrayAdd(PGPROC *proc);
-extern void ProcArrayRemove(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayRemove(PGPROC *proc);
 
-extern void ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid);
+extern void ProcArrayEndTransaction(PGPROC *proc);
 extern void ProcArrayClearTransaction(PGPROC *proc);
+extern void ProcArrayResetXmin(PGPROC *proc);
 
-extern void ProcArrayInitRecovery(TransactionId initializedUptoXID);
+extern void ProcArrayInitRecovery(TransactionId oldestActiveXID, TransactionId initializedUptoXID);
 extern void ProcArrayApplyRecoveryInfo(RunningTransactions running);
 extern void ProcArrayApplyXidAssignment(TransactionId topxid,
 							int nsubxids, TransactionId *subxids);
 
 extern void RecordKnownAssignedTransactionIds(TransactionId xid);
-extern void ExpireTreeKnownAssignedTransactionIds(TransactionId xid,
-									  int nsubxids, TransactionId *subxids,
-									  TransactionId max_xid);
-extern void ExpireAllKnownAssignedTransactionIds(void);
-extern void ExpireOldKnownAssignedTransactionIds(TransactionId xid);
-
-extern int	GetMaxSnapshotXidCount(void);
-extern int	GetMaxSnapshotSubxidCount(void);
 
 extern Snapshot GetSnapshotData(Snapshot snapshot);
 
@@ -86,8 +79,9 @@ extern bool ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc);
 
 extern RunningTransactions GetRunningTransactionData(void);
 
-extern bool TransactionIdIsInProgress(TransactionId xid);
 extern bool TransactionIdIsActive(TransactionId xid);
+extern TransactionId GetRecentGlobalXmin(void);
+extern TransactionId GetRecentGlobalDataXmin(void);
 extern TransactionId GetOldestXmin(Relation rel, int flags);
 extern TransactionId GetOldestActiveTransactionId(void);
 extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
@@ -100,9 +94,8 @@ extern PGPROC *BackendPidGetProcWithLock(int pid);
 extern int	BackendXidGetPid(TransactionId xid);
 extern bool IsBackendPid(int pid);
 
-extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin,
-					  bool excludeXmin0, bool allDbs, int excludeVacuum,
-					  int *nvxids);
+extern VirtualTransactionId *GetCurrentVirtualXIDs(TransactionId limitXmin, bool excludeXmin0,
+					  bool allDbs, int excludeVacuum, int *nvxids);
 extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId limitXmin, Oid dbOid);
 extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode);
 
@@ -114,10 +107,6 @@ extern int	CountUserBackends(Oid roleid);
 extern bool CountOtherDBBackends(Oid databaseId,
 					 int *nbackends, int *nprepared);
 
-extern void XidCacheRemoveRunningXids(TransactionId xid,
-						  int nxids, const TransactionId *xids,
-						  TransactionId latestXid);
-
 extern void ProcArraySetReplicationSlotXmin(TransactionId xmin,
 								TransactionId catalog_xmin, bool already_locked);
 
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index f5404b4c1f..80d0917615 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -50,10 +50,7 @@ extern void StandbyAcquireAccessExclusiveLock(TransactionId xid, Oid dbOid, Oid
 extern void StandbyReleaseLockTree(TransactionId xid,
 					   int nsubxids, TransactionId *subxids);
 extern void StandbyReleaseAllLocks(void);
-extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
-
-#define MinSizeOfXactRunningXacts offsetof(xl_running_xacts, xids)
-
+extern void StandbyReleaseOldLocks(TransactionId oldestRunningXid);
 
 /*
  * Declarations for GetRunningTransactionData(). Similar to Snapshots, but
@@ -69,14 +66,8 @@ extern void StandbyReleaseOldLocks(int nxids, TransactionId *xids);
 
 typedef struct RunningTransactionsData
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
-	TransactionId oldestRunningXid; /* *not* oldestXmin */
-	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId *xids;		/* array of (sub)xids still running */
+	TransactionId oldestRunningXid;		/* *not* oldestXmin */
 } RunningTransactionsData;
 
 typedef RunningTransactionsData *RunningTransactions;
diff --git a/src/include/storage/standbydefs.h b/src/include/storage/standbydefs.h
index a0af6788e9..2bc167e5cc 100644
--- a/src/include/storage/standbydefs.h
+++ b/src/include/storage/standbydefs.h
@@ -46,16 +46,13 @@ typedef struct xl_standby_locks
  */
 typedef struct xl_running_xacts
 {
-	int			xcnt;			/* # of xact ids in xids[] */
-	int			subxcnt;		/* # of subxact ids in xids[] */
-	bool		subxid_overflow;	/* snapshot overflowed, subxids missing */
 	TransactionId nextXid;		/* copy of ShmemVariableCache->nextXid */
 	TransactionId oldestRunningXid; /* *not* oldestXmin */
 	TransactionId latestCompletedXid;	/* so we can set xmax */
-
-	TransactionId xids[FLEXIBLE_ARRAY_MEMBER];
 } xl_running_xacts;
 
+#define SizeOfXactRunningXacts (offsetof(xl_running_xacts, latestCompletedXid) + sizeof(TransactionId))
+
 /*
  * Invalidations for standby, currently only when transactions without an
  * assigned xid commit.
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index fc64153780..bbef99b875 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -57,9 +57,6 @@ extern TimestampTz GetOldSnapshotThresholdTimestamp(void);
 extern bool FirstSnapshotSet;
 
 extern TransactionId TransactionXmin;
-extern TransactionId RecentXmin;
-extern PGDLLIMPORT TransactionId RecentGlobalXmin;
-extern TransactionId RecentGlobalDataXmin;
 
 extern Snapshot GetTransactionSnapshot(void);
 extern Snapshot GetLatestSnapshot(void);
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index bf519778df..759cbd4fc8 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -60,37 +60,18 @@ typedef struct SnapshotData
 	 * specially by HeapTupleSatisfiesDirty, and xmin is used specially by
 	 * HeapTupleSatisfiesNonVacuumable.)
 	 *
-	 * An MVCC snapshot can never see the effects of XIDs >= xmax. It can see
-	 * the effects of all older XIDs except those listed in the snapshot. xmin
-	 * is stored as an optimization to avoid needing to search the XID arrays
-	 * for most tuples.
+	 * An MVCC snapshot can see the effects of those XIDs that committed
+	 * after snapshotlsn. xmin and xmax are stored as an optimization, to
+	 * avoid checking the commit LSN for most tuples.
 	 */
 	TransactionId xmin;			/* all XID < xmin are visible to me */
 	TransactionId xmax;			/* all XID >= xmax are invisible to me */
 
 	/*
-	 * For normal MVCC snapshot this contains the all xact IDs that are in
-	 * progress, unless the snapshot was taken during recovery in which case
-	 * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e.
-	 * it contains *committed* transactions between xmin and xmax.
-	 *
-	 * note: all ids in xip[] satisfy xmin <= xip[i] < xmax
-	 */
-	TransactionId *xip;
-	uint32		xcnt;			/* # of xact ids in xip[] */
-
-	/*
-	 * For non-historic MVCC snapshots, this contains subxact IDs that are in
-	 * progress (and other transactions that are in progress if taken during
-	 * recovery). For historic snapshot it contains *all* xids assigned to the
-	 * replayed transaction, including the toplevel xid.
-	 *
-	 * note: all ids in subxip[] are >= xmin, but we don't bother filtering
-	 * out any that are >= xmax
+	 * This snapshot can see the effects of all transactions with CSN <=
+	 * snapshotcsn.
 	 */
-	TransactionId *subxip;
-	int32		subxcnt;		/* # of xact ids in subxip[] */
-	bool		suboverflowed;	/* has the subxip array overflowed? */
+	CommitSeqNo	snapshotcsn;
 
 	bool		takenDuringRecovery;	/* recovery-shaped snapshot? */
 	bool		copied;			/* false if it's a static snapshot */
@@ -104,6 +85,14 @@ typedef struct SnapshotData
 	uint32		speculativeToken;
 
 	/*
+	 * this_xip contains *all* xids assigned to the replayed transaction,
+	 * including the toplevel xid. Used only in a historic MVCC snapshot,
+	 * used in logical decoding.
+	 */
+	TransactionId *this_xip;
+	uint32		this_xcnt;			/* # of xact ids in this_xip[] */
+
+	/*
 	 * Book-keeping information, used by the snapshot manager
 	 */
 	uint32		active_count;	/* refcount on ActiveSnapshot stack */
diff --git a/src/include/utils/tqual.h b/src/include/utils/tqual.h
index 96eaf01ca0..4666b35385 100644
--- a/src/include/utils/tqual.h
+++ b/src/include/utils/tqual.h
@@ -17,6 +17,7 @@
 
 #include "utils/snapshot.h"
 #include "access/xlogdefs.h"
+#include "access/transam.h"
 
 
 /* Static variables representing various special snapshot semantics */
@@ -78,7 +79,8 @@ extern HTSV_Result HeapTupleSatisfiesVacuum(HeapTuple htup,
 						 TransactionId OldestXmin, Buffer buffer);
 extern bool HeapTupleIsSurelyDead(HeapTuple htup,
 					  TransactionId OldestXmin);
-extern bool XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot);
+extern bool XidVisibleInSnapshot(TransactionId xid, Snapshot snapshot,
+					  TransactionIdStatus *hintstatus);
 
 extern void HeapTupleSetHintBits(HeapTupleHeader tuple, Buffer buffer,
 					 uint16 infomask, TransactionId xid);
diff --git a/src/test/modules/mvcctorture/Makefile b/src/test/modules/mvcctorture/Makefile
new file mode 100644
index 0000000000..cc4ebc838a
--- /dev/null
+++ b/src/test/modules/mvcctorture/Makefile
@@ -0,0 +1,18 @@
+# src/test/modules/mvcctorture/Makefile
+
+MODULE_big	= mvcctorture
+OBJS		= mvcctorture.o
+
+EXTENSION = mvcctorture
+DATA = mvcctorture--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/mvcctorture
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/mvcctorture/README b/src/test/modules/mvcctorture/README
new file mode 100644
index 0000000000..915b00129a
--- /dev/null
+++ b/src/test/modules/mvcctorture/README
@@ -0,0 +1,25 @@
+A litte helper module for testing MVCC performance.
+
+The populate_mvcc_test_table function can be used to create a test table,
+with given number of rows. Each row in the table is stamped with a different
+xmin, and XMIN_COMMITTED hint bit can be set or not. Furthermore, the
+xmins values are shuffled, to defeat caching in transam.c and clog.c as badly
+as possible.
+
+The test table is always called "mvcc_test_table". You'll have to drop it
+yourself between tests.
+
+For example:
+
+-- Create a test table with 10 million rows, without setting hint bits
+select populate_mvcc_test_table(10000000, false);
+
+-- See how long it takes to scan it
+\timing
+select count(*) from mvcc_test_table;
+
+
+
+If you do the above, but have another psql session open, in a transaction
+that's done some updates, i.e. is holding backthe xmin horizon, you will
+see the worst-case performance of the CSN patch.
diff --git a/src/test/modules/mvcctorture/mvcctorture--1.0.sql b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
new file mode 100644
index 0000000000..652a6a3f39
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture--1.0.sql
@@ -0,0 +1,9 @@
+/* src/test/modules/mvcctorture/mvcctorture--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION mvcctorture" to load this file. \quit
+
+CREATE FUNCTION populate_mvcc_test_table(int4, bool)
+RETURNS void
+AS 'MODULE_PATHNAME', 'populate_mvcc_test_table'
+LANGUAGE C STRICT;
diff --git a/src/test/modules/mvcctorture/mvcctorture.c b/src/test/modules/mvcctorture/mvcctorture.c
new file mode 100644
index 0000000000..a89a2e6e96
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.c
@@ -0,0 +1,129 @@
+/*-------------------------------------------------------------------------
+ *
+ * mvctorture.c
+ *
+ * Copyright (c) 2012, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/test/modules/mvcctorture.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "access/hio.h"
+#include "access/htup_details.h"
+#include "access/transam.h"
+#include "access/xact.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "executor/spi.h"
+#include "funcapi.h"
+#include "nodes/makefuncs.h"
+#include "storage/bufmgr.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(populate_mvcc_test_table);
+
+Datum
+populate_mvcc_test_table(PG_FUNCTION_ARGS)
+{
+	uint32		nrows = PG_GETARG_UINT32(0);
+	bool		set_xmin_committed = PG_GETARG_BOOL(1);
+	RangeVar   *rv;
+	Relation	rel;
+	Datum		values[1];
+	bool		isnull[1];
+	HeapTuple	tup;
+	TransactionId *xids;
+	int			ret;
+	int			i;
+	Buffer		buffer;
+	Buffer		vmbuffer = InvalidBuffer;
+
+	/* Connect to SPI manager */
+	if ((ret = SPI_connect()) < 0)
+		/* internal error */
+		elog(ERROR, "populate_mvcc_test_table: SPI_connect returned %d", ret);
+
+	SPI_execute("CREATE TABLE mvcc_test_table(i int4)", false, 0);
+
+	SPI_finish();
+
+	/* Generate a different XID for each tuple */
+	xids = (TransactionId *) palloc0(nrows * sizeof(TransactionId));
+	for (i = 0; i < nrows; i++)
+	{
+		BeginInternalSubTransaction(NULL);
+		xids[i] = GetCurrentTransactionId();
+		ReleaseCurrentSubTransaction();
+	}
+
+	rv = makeRangeVar(NULL, "mvcc_test_table", -1);
+
+	rel = heap_openrv(rv, RowExclusiveLock);
+
+	/* shuffle */
+	for (i = 0; i < nrows - 1; i++)
+	{
+		int x = i + (random() % (nrows - i));
+		TransactionId tmp;
+
+		tmp = xids[i];
+		xids[i] = xids[x];
+		xids[x] = tmp;
+	}
+
+	for (i = 0; i < nrows; i++)
+	{
+		values[0] = Int32GetDatum(i);
+		isnull[0] = false;
+
+		tup = heap_form_tuple(RelationGetDescr(rel), values, isnull);
+
+		/* Fill the header fields, like heap_prepare_insert does */
+		tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
+		tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+		tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
+		if (set_xmin_committed)
+			tup->t_data->t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup->t_data, xids[i]);
+		HeapTupleHeaderSetCmin(tup->t_data, 1);
+		HeapTupleHeaderSetXmax(tup->t_data, 0);		/* for cleanliness */
+		tup->t_tableOid = RelationGetRelid(rel);
+
+		heap_freetuple(tup);
+
+		/*
+		 * Find buffer to insert this tuple into.  If the page is all visible,
+		 * this will also pin the requisite visibility map page.
+		 */
+		buffer = RelationGetBufferForTuple(rel, tup->t_len,
+										   InvalidBuffer,
+										   0, NULL,
+										   &vmbuffer, NULL);
+		RelationPutHeapTuple(rel, buffer, tup, false);
+
+		if (PageIsAllVisible(BufferGetPage(buffer)))
+		{
+			PageClearAllVisible(BufferGetPage(buffer));
+			visibilitymap_clear(rel,
+								ItemPointerGetBlockNumber(&(tup->t_self)),
+								vmbuffer, VISIBILITYMAP_VALID_BITS);
+		}
+
+		MarkBufferDirty(buffer);
+		UnlockReleaseBuffer(buffer);
+	}
+
+	if (vmbuffer != InvalidBuffer)
+		ReleaseBuffer(vmbuffer);
+
+	heap_close(rel, NoLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/test/modules/mvcctorture/mvcctorture.control b/src/test/modules/mvcctorture/mvcctorture.control
new file mode 100644
index 0000000000..1b5feb95a7
--- /dev/null
+++ b/src/test/modules/mvcctorture/mvcctorture.control
@@ -0,0 +1,5 @@
+# mvcctorture extension
+comment = 'populate a table with a mix of different XIDs'
+default_version = '1.0'
+module_pathname = '$libdir/mvcctorture'
+relocatable = true
diff --git a/src/test/regress/expected/txid.out b/src/test/regress/expected/txid.out
index 015dae3051..a53ada26ac 100644
--- a/src/test/regress/expected/txid.out
+++ b/src/test/regress/expected/txid.out
@@ -1,199 +1,44 @@
 -- txid_snapshot data type and related functions
 -- i/o
-select '12:13:'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
  txid_snapshot 
 ---------------
- 12:13:
-(1 row)
-
-select '12:18:14,16'::txid_snapshot;
- txid_snapshot 
----------------
- 12:18:14,16
-(1 row)
-
-select '12:16:14,14'::txid_snapshot;
- txid_snapshot 
----------------
- 12:16:14
+ 12:0/ABCDABCD
 (1 row)
 
 -- errors
-select '31:12:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "31:12:"
-LINE 1: select '31:12:'::txid_snapshot;
-               ^
-select '0:1:'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "0:1:"
-LINE 1: select '0:1:'::txid_snapshot;
-               ^
-select '12:13:0'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:13:0"
-LINE 1: select '12:13:0'::txid_snapshot;
-               ^
-select '12:16:14,13'::txid_snapshot;
-ERROR:  invalid input syntax for type txid_snapshot: "12:16:14,13"
-LINE 1: select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
+ERROR:  invalid input syntax for type txid_snapshot: "0:0/ABCDABCD"
+LINE 1: select '0:0/ABCDABCD'::txid_snapshot;
                ^
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
-                                                                snap                                                                 
--------------------------------------------------------------------------------------------------------------------------------------
- 12:13:
- 12:20:13,15,18
- 100001:100009:100005,100007,100008
- 100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131
-(4 rows)
+     snap      
+---------------
+ 12:0/ABCDABCD
+(1 row)
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
- txid_snapshot_xmin | txid_snapshot_xmax | txid_snapshot_xip 
---------------------+--------------------+-------------------
-                 12 |                 20 |                13
-                 12 |                 20 |                15
-                 12 |                 20 |                18
-             100001 |             100009 |            100005
-             100001 |             100009 |            100007
-             100001 |             100009 |            100008
-                100 |                150 |               101
-                100 |                150 |               102
-                100 |                150 |               103
-                100 |                150 |               104
-                100 |                150 |               105
-                100 |                150 |               106
-                100 |                150 |               107
-                100 |                150 |               108
-                100 |                150 |               109
-                100 |                150 |               110
-                100 |                150 |               111
-                100 |                150 |               112
-                100 |                150 |               113
-                100 |                150 |               114
-                100 |                150 |               115
-                100 |                150 |               116
-                100 |                150 |               117
-                100 |                150 |               118
-                100 |                150 |               119
-                100 |                150 |               120
-                100 |                150 |               121
-                100 |                150 |               122
-                100 |                150 |               123
-                100 |                150 |               124
-                100 |                150 |               125
-                100 |                150 |               126
-                100 |                150 |               127
-                100 |                150 |               128
-                100 |                150 |               129
-                100 |                150 |               130
-                100 |                150 |               131
-(37 rows)
+ txid_snapshot_xmax 
+--------------------
+                 12
+(1 row)
 
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
- id | txid_visible_in_snapshot 
-----+--------------------------
- 11 | t
- 12 | t
- 13 | f
- 14 | t
- 15 | f
- 16 | t
- 17 | t
- 18 | f
- 19 | t
- 20 | f
- 21 | f
-(11 rows)
 
 -- test bsearch
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
- id  | txid_visible_in_snapshot 
------+--------------------------
-  90 | t
-  91 | t
-  92 | t
-  93 | t
-  94 | t
-  95 | t
-  96 | t
-  97 | t
-  98 | t
-  99 | t
- 100 | t
- 101 | f
- 102 | f
- 103 | f
- 104 | f
- 105 | f
- 106 | f
- 107 | f
- 108 | f
- 109 | f
- 110 | f
- 111 | f
- 112 | f
- 113 | f
- 114 | f
- 115 | f
- 116 | f
- 117 | f
- 118 | f
- 119 | f
- 120 | f
- 121 | f
- 122 | f
- 123 | f
- 124 | f
- 125 | f
- 126 | f
- 127 | f
- 128 | f
- 129 | f
- 130 | f
- 131 | f
- 132 | t
- 133 | t
- 134 | t
- 135 | t
- 136 | t
- 137 | t
- 138 | t
- 139 | t
- 140 | t
- 141 | t
- 142 | t
- 143 | t
- 144 | t
- 145 | t
- 146 | t
- 147 | t
- 148 | t
- 149 | t
- 150 | f
- 151 | f
- 152 | f
- 153 | f
- 154 | f
- 155 | f
- 156 | f
- 157 | f
- 158 | f
- 159 | f
- 160 | f
-(71 rows)
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
  ?column? 
@@ -208,98 +53,45 @@ select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
  f
 (1 row)
 
+/*
 -- test 64bitness
-select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
-                            txid_snapshot                            
----------------------------------------------------------------------
- 1000100010001000:1000100010001100:1000100010001012,1000100010001013
-(1 row)
 
+select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
 select txid_visible_in_snapshot('1000100010001012', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- f
-(1 row)
-
 select txid_visible_in_snapshot('1000100010001015', '1000100010001000:1000100010001100:1000100010001012,1000100010001013');
- txid_visible_in_snapshot 
---------------------------
- t
-(1 row)
 
 -- test 64bit overflow
 SELECT txid_snapshot '1:9223372036854775807:3';
-      txid_snapshot      
--------------------------
- 1:9223372036854775807:3
-(1 row)
-
 SELECT txid_snapshot '1:9223372036854775808:3';
-ERROR:  invalid input syntax for type txid_snapshot: "1:9223372036854775808:3"
-LINE 1: SELECT txid_snapshot '1:9223372036854775808:3';
-                             ^
+
 -- test txid_current_if_assigned
 BEGIN;
 SELECT txid_current_if_assigned() IS NULL;
- ?column? 
-----------
- t
-(1 row)
-
 SELECT txid_current() \gset
 SELECT txid_current_if_assigned() IS NOT DISTINCT FROM BIGINT :'txid_current';
- ?column? 
-----------
- t
-(1 row)
-
 COMMIT;
+
 -- test xid status functions
 BEGIN;
 SELECT txid_current() AS committed \gset
 COMMIT;
+
 BEGIN;
 SELECT txid_current() AS rolledback \gset
 ROLLBACK;
+
 BEGIN;
 SELECT txid_current() AS inprogress \gset
-SELECT txid_status(:committed) AS committed;
- committed 
------------
- committed
-(1 row)
 
+SELECT txid_status(:committed) AS committed;
 SELECT txid_status(:rolledback) AS rolledback;
- rolledback 
-------------
- aborted
-(1 row)
-
 SELECT txid_status(:inprogress) AS inprogress;
- inprogress  
--------------
- in progress
-(1 row)
-
 SELECT txid_status(1); -- BootstrapTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(2); -- FrozenTransactionId is always committed
- txid_status 
--------------
- committed
-(1 row)
-
 SELECT txid_status(3); -- in regress testing FirstNormalTransactionId will always be behind oldestXmin
- txid_status 
--------------
- 
-(1 row)
 
 COMMIT;
+
 BEGIN;
 CREATE FUNCTION test_future_xid_status(bigint)
 RETURNS void
@@ -311,14 +103,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
-NOTICE:  Got expected error for xid in the future
- test_future_xid_status 
-------------------------
- 
-(1 row)
-
 ROLLBACK;
+*/
diff --git a/src/test/regress/sql/txid.sql b/src/test/regress/sql/txid.sql
index bd6decf0ef..6775e04e33 100644
--- a/src/test/regress/sql/txid.sql
+++ b/src/test/regress/sql/txid.sql
@@ -1,32 +1,22 @@
 -- txid_snapshot data type and related functions
 
 -- i/o
-select '12:13:'::txid_snapshot;
-select '12:18:14,16'::txid_snapshot;
-select '12:16:14,14'::txid_snapshot;
+select '12:0/ABCDABCD'::txid_snapshot;
 
 -- errors
-select '31:12:'::txid_snapshot;
-select '0:1:'::txid_snapshot;
-select '12:13:0'::txid_snapshot;
-select '12:16:14,13'::txid_snapshot;
+select '0:0/ABCDABCD'::txid_snapshot;
 
 create temp table snapshot_test (
 	nr	integer,
 	snap	txid_snapshot
 );
 
-insert into snapshot_test values (1, '12:13:');
-insert into snapshot_test values (2, '12:20:13,15,18');
-insert into snapshot_test values (3, '100001:100009:100005,100007,100008');
-insert into snapshot_test values (4, '100:150:101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131');
+insert into snapshot_test values (1, '12:0/ABCDABCD');
 select snap from snapshot_test order by nr;
 
-select  txid_snapshot_xmin(snap),
-	txid_snapshot_xmax(snap),
-	txid_snapshot_xip(snap)
+select  txid_snapshot_xmax(snap)
 from snapshot_test order by nr;
-
+/*
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(11, 21) id
 where nr = 2;
@@ -35,7 +25,7 @@ where nr = 2;
 select id, txid_visible_in_snapshot(id, snap)
 from snapshot_test, generate_series(90, 160) id
 where nr = 4;
-
+*/
 -- test current values also
 select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
@@ -43,6 +33,7 @@ select txid_current() >= txid_snapshot_xmin(txid_current_snapshot());
 
 select txid_visible_in_snapshot(txid_current(), txid_current_snapshot());
 
+/*
 -- test 64bitness
 
 select txid_snapshot '1000100010001000:1000100010001100:1000100010001012,1000100010001013';
@@ -92,8 +83,9 @@ BEGIN
   RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected';
 EXCEPTION
   WHEN invalid_parameter_value THEN
-    RAISE NOTICE 'Got expected error for xid in the future';
+	RAISE NOTICE 'Got expected error for xid in the future';
 END;
 $$;
 SELECT test_future_xid_status(:inprogress + 10000);
 ROLLBACK;
+*/
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b422050a92..ca7343f636 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -34,6 +34,9 @@ AfterTriggerEventList
 AfterTriggerShared
 AfterTriggerSharedData
 AfterTriggersData
+AfterTriggersQueryData
+AfterTriggersTableData
+AfterTriggersTransData
 Agg
 AggClauseCosts
 AggInfo
@@ -125,7 +128,6 @@ ArrayMetaState
 ArrayParseState
 ArrayRef
 ArrayRefState
-ArrayRemapInfo
 ArrayType
 AsyncQueueControl
 AsyncQueueEntry
@@ -143,7 +145,6 @@ AutoVacOpts
 AutoVacuumShmemStruct
 AutoVacuumWorkItem
 AutoVacuumWorkItemType
-AutovacWorkItems
 AuxProcType
 BF_ctx
 BF_key
@@ -635,6 +636,7 @@ FileFdwPlanState
 FileName
 FileNameMap
 FindSplitData
+FixedParallelExecutorState
 FixedParallelState
 FixedParamState
 FlagMode
@@ -1021,13 +1023,13 @@ InsertStmt
 Instrumentation
 Int128AggState
 Int8TransTypeData
+IntRBTreeNode
 InternalDefaultACL
 InternalGrant
 Interval
 IntoClause
 InvalidationChunk
 InvalidationListHeader
-InvertedWalkNextStep
 IpcMemoryId
 IpcMemoryKey
 IpcSemaphoreId
@@ -1571,6 +1573,7 @@ PartitionListValue
 PartitionRangeBound
 PartitionRangeDatum
 PartitionRangeDatumKind
+PartitionScheme
 PartitionSpec
 PartitionedChildRelInfo
 PasswordType
@@ -1781,7 +1784,6 @@ RangeBox
 RangeFunction
 RangeIOData
 RangeQueryClause
-RangeRemapInfo
 RangeSubselect
 RangeTableFunc
 RangeTableFuncCol
@@ -1794,6 +1796,7 @@ RangeVar
 RangeVarGetRelidCallback
 RawColumnDefault
 RawStmt
+ReInitializeDSMForeignScan_function
 ReScanForeignScan_function
 ReadBufPtrType
 ReadBufferMode
@@ -1805,8 +1808,6 @@ RecheckForeignScan_function
 RecordCacheEntry
 RecordCompareData
 RecordIOData
-RecordRemapInfo
-RecordTypmodMap
 RecoveryTargetAction
 RecoveryTargetType
 RectBox
@@ -2297,9 +2298,10 @@ TupleHashEntryData
 TupleHashIterator
 TupleHashTable
 TupleQueueReader
-TupleRemapClass
-TupleRemapInfo
 TupleTableSlot
+TuplesortInstrumentation
+TuplesortMethod
+TuplesortSpaceType
 Tuplesortstate
 Tuplestorestate
 TwoPhaseCallback
@@ -2329,7 +2331,6 @@ UChar
 UCharIterator
 UCollator
 UConverter
-UEnumeration
 UErrorCode
 UINT
 ULARGE_INTEGER
@@ -2612,7 +2613,9 @@ dsa_pointer
 dsa_segment_header
 dsa_segment_index
 dsa_segment_map
+dshash_compare_function
 dshash_hash
+dshash_hash_function
 dshash_parameters
 dshash_partition
 dshash_table

#112

Alexander Korotkov

a.korotkov@postgrespro.ru

about 8 years ago

In reply to: Alexander Kuzmenkov (#111)

1 attachment(s)

Re: [HACKERS] Proposal for CSN based snapshots

On Mon, Dec 4, 2017 at 6:07 PM, Alexander Kuzmenkov <
a.kuzmenkov@postgrespro.ru> wrote:

Performance on pgbench tpcb with subtransactions is now slightly better
than master. See the picture 'savepoints2'. This was achieved by removing
unnecessary exclusive locking on CSNLogControlLock in SubTransSetParent.
After that change, both versions are mostly waiting on XidGenLock in
GetNewTransactionId.

Performance on default pgbench tpcb is also improved. At scale 500, csn is
at best 30% faster than master, see the picture 'tpcb500'. These
improvements are due to slight optimizations of GetSnapshotData and
refreshing RecentGlobalXmin less often. At scale 1500, csn is slightly
faster at up to 200 clients, but then degrades steadily: see the picture
'tpcb1500'. Nevertheless, CSN-related code paths do not show up in perf
profiles or LWLock wait statistics [1]. I think what we are seeing here is
again that when some bottlenecks are removed, the fast degradation of
LWLocks under contention leads to net drop in performance. With this in
mind, I tried running the same benchmarks with patch from Yura Sokolov [2],
which should improve LWLock performance on NUMA machines. Indeed, with this
patch csn starts outperforming master on all numbers of clients measured,
as you can see in the picture 'tpcb1500'. This LWLock change influences the
csn a lot more than master, which also suggests that we are observing a
superlinear degradation of LWLocks under increasing contention.

These results look promising for me. Could you try benchmarking using more
workloads including read-only and mixed mostly-read workloads?
You can try same benchmarks I used in my talk about CSN in pgconf.eu [1]
slides 19-25 (and you're welcome to invent more benchmakrs yourself)

Also, I wonder how current version of CSN patch behaves in worst case when
we have to scan the table with a lot of unique xid (and correspondingly
have to do a lot of csnlog lookups)? See [1] slide 18. This worst case
was significant part of my motivation to try "rewrite xid with csn"
approach. Please, find simple extension I used to fill table with random
xmins in the attachment.

1. https://www.slideshare.net/AlexanderKorotkov/the-future-is-csn

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#113

Alexander Kuzmenkov

a.kuzmenkov@postgrespro.ru

about 8 years ago

In reply to: Alexander Korotkov (#112)

5 attachment(s)

Re: [HACKERS] Proposal for CSN based snapshots

El 08/12/17 a las 14:59, Alexander Korotkov escribió:

These results look promising for me. Could you try benchmarking using
more workloads including read-only and mixed mostly-read workloads?
You can try same benchmarks I used in my talk about CSN in pgconf.eu
<http://pgconf.eu> [1] slides 19-25 (and you're welcome to invent more
benchmakrs yourself)

Sure, here are some more benchmarks.

I've already had measured the "skip some updates" and "select-only"
pgbench variants, and also a simple "select 1" query. These are for
scale 1500 and for 20 to 1000 connections. The graphs are attached.
"select 1", which basically benchmarks snapshot taking, shows an
impressive twofold increase in TPS over master, but this is to be
expected. "select-only" stabilizes at 20% higher than master.
Interesting to note is that these select-only scenarios almost do not
degrade with growing client count.
For the "skip some updates" scenario, CSN is slightly slower than
master, but this is improved by the LWLock patch I mentioned upthread.

I also replicated the setup from your slides 23 and 25. I used scale 500
and client counts 20-300, and probably the same 72-core Xeon.
Slide 23 shows 22% write and 78% read queries, that is "-b select-only@9
-b tpcb-like@1". The corresponding picture is called "random.png". The
absolute numbers are somewhat lower for my run, but CSN is about 40%
faster than master, like the CSN-rewrite variant.
Slide 25 is a custom script called "rrw" with extra 20 read queries. We
can see that since your run the master has improved much, and the
current CSN shows the same general behaviour as CSN-rewrite, although
being slower in absolute numbers.

Also, I wonder how current version of CSN patch behaves in worst case
when we have to scan the table with a lot of unique xid (and
correspondingly have to do a lot of csnlog lookups)? See [1] slide
18. This worst case was significant part of my motivation to try
"rewrite xid with csn" approach. Please, find simple extension I used
to fill table with random xmins in the attachment.

OK, let's summarize how the worst case in question works. It happens
when the tuples have a wide range of xmins and no hint bits. All
visibility checks have to go to clog then. Fortunately, on master we
also set hint bits during these checks, and the subsequent scans can
determine visibility using just the hint bits and the current snapshot.
This makes the subsequent scans run much faster. With CSN, this is not
enough, and the visibility checks always go to CSN log for all the
transactions newer than global xmin. This can become very slow if there
is a long-running transaction that holds back the global xmin.

I made a simple test to see these effects. The procedure is as follows.
I start a transaction in psql; that will be our long-running
transaction. Next, I run pgbench for a minute, and randomize the tuple
xmins in "pgbench_accounts" using your extension. Then I run "select
sum(abalance) from pgbench_accounts" twice, and record the durations.
Here is a table with the results (50M tuples, 1M transactions for master
and 400k for CSN):

Branch    scan 1, s   scan 2, s
--------------------------------
CSN          80          80
master       13          3.5

So, we are indeed seeing the expected results. Significant slowdown with
long-running transaction is an important problem for this patch. Even
the first scan is much slower, because a CSN log page contains 16 times
less transactions than a clog page, and we have the same number of
buffers (max 128) for them. When the range of xids is wide, we spend
most of the time loading and unloading the pages. I believe it can be
improved by using more buffers for CSN log. I did some work in this
direction, namely, made SLRUs use a dynahash table instead of linear
search for page->buffer lookups. This is included in the v8 I posted
earlier, but it should probably be done as a separate patch.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

random.pngimage/png; name=random.pngDownload

�PNG


IHDR�d�?8PLTE����s������___???�������� |�@��������������������������p��`��@��@���`��`��@��@��Uk/�P@������������ �������k�����z��r�E����P�������������p��.�W"�"�d����������������U��������22������������������fffMMM333@�����**�����@��0`�����@�� Ai����@��������������y���tRNS@��f	pHYs���+ IDATx�������A�q0��q�3�����0Uk�si�k�_�g�w�d�_�" �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q"	��~��������1�L��n����/OE~y����eg(��y/�?�f��a^�������y�|����K\���oP�s/�������u����P���)o!��-������v�H��._��.@Qg��Y^�T�e����f,K(��y@9�������r�f,K(��@����.��>�����d����m��:]���.��^�~K�����]����_�O(����v^{$���p��wtQb��g� ��W������/x@Q�����p9�p(�,o>�[��������92�<�xs�8v�����_������ lv��
���/{	�<}<\T�����4����� �]��������C �Q��z���E��n/���;O�<�]�'�G$�������_�v����	��5���<�h��m������	�r�Th��(����uF�����;���w~#���A��H�i\�?�{)p��S���P*G��gP�)+��=?��@@��>z3P^��wN��Gv�O�G.�TtR�X(����p^s���f��|��~�����|�q����(��@y�\��^�x�����vh����{�HO�������I# q���O��#�����(��Q"	�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0�	�<>q!+��az|rj��l�Z8�c=���.�������"�	+	��+���p��t7.�����-"���N��u�!�'��@2�fz|rj��l�J��:�� �Y�/�*=>95[D6b%t��e{�V�M1�ON���X_�Q����^^��'�f��F��p�����M
�ON���XI��A�k��j��i|rj��l�����~X�B�����.��~X�R������ `-"o�[�k�tj����4 b������"��o�m/"�iDBDBD����l{)0aM#B2 B2 BD@��[�y�Loz3aM#B2 B2 BD@{���mo&�iDBDBD�h8\O<~co�@��F� d@� d@��} �5�A��A���DF�_�gFDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DF���j��@0>BDB�E!�j�������A��!@0���R���:�3��1X�U5T��s�Q� �:	!�C�.�a1(�`�9BDB����\UCu><��;�A��,�����9����)�-��[���v@0�NBD�`���1X�U5T��s�������v?!W��\UCu><�q��
@d;�z�#�XSc;
@�cV���z����>���}�<��z�cB�{��[F�&�Do~
��n2 B��D!��_GS�D>b~ENT����{���8 �	*r�VgJU=7�X�^{��C1?�|L|�����<���)�p�s��B���w��Jo�>�wT�AH�,�-�����eP� �0�������]�|��bf�������q�C�s#(���p��B�y&z� ��	`N ���V�Q,�D&������7�R��T�A,e=�� �X9��*&�z���R~p�_!DT|��� �q�-��	���
@�����������=�eB�Z�K@X
 ��&d �X��z��<`E(�a���[o!���}�������\�Z��p��a�B���U����y�-B������/+�7X�k
��3��-B��Y�c7�y#akpi3�F�,g�N�X�,�6�_���Z���#���A��-���N�2,�����T+a��'��1���x���u]dNN�~F���8�5�r���?odA�#l�]Z0[D4��c�B�LMN����^�}�����%���K��y��%D,d��t����>���Y^�>����G,g���H}-����r�L,�m�_j������;��.''�f���������,������F�8��3����E�@�-�l~��]�����S�_��N�� 4���������xIW�����P6=�[�6�<>95��v�p
-��eX�qU�y"����p�����7B�����XH����������P��A����x��ON�~!B�on���,��|��y�7K����N��y�c�����4>95����x�|���We>+Cx!�	��7��+���V�U�LK��uW$��?���'���%=���6>�������O��~/����!�5F��w51�rW;
X6?�u~�_���pfd@� d@��B ����$p)�o
��G��A������ 47!"!"�I	&�0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DF�$���G��A����AhnBDBD �0
@$a@0>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
 )�[�Q)q�[�@��!��@d@�P����U�0@�������� 
 ������WU�^!@0"7wUQ@��	!@*p  @"T��W@0�v���cB���BC��{BD�U�����A,o��`��/xB���BC�B2 B(���}|��kx�Of����7*B���B#Ls���#lr���
���T��B�
@f����7u����A������'��%lr���
�������!��G���A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DFc���;}�NDBD�u�{��~�M��B�o��5P����e@0���-}����A�������j\|	!"D,�w���������������	��"��_��P��s&�f���N����q��'k{�����S�����������}X����� t:!!"����~8\���rrrj�;�@���_�&�A��Y�����u?�]�����/ 0��5���e=" o��J�)�����/�S5�d=�
 �����{Y���S�_@`n�o�������D� d@��+��!���l`S���S�_�%��	���QP�/��4>95��fb?_U��.������ �����c:����1�=\���A���|�^������fh�����������������@D�u~��B �D����}/!"!"DD<W�7�����"������,�x(���>D$4���>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DF�$���G��A����AhnBDBD��@�E���4�DBc��
���D� d@�P�,��
g�&�A���B1T�U�"O(�@V���(�0�yU�4���A����0T�U��M��Ba��@���2�*
 1Q!���
lB���B`��@,!���y�'^FI�=V�`Q�C��'t:!!@0�M��A���DF�$���G��A����AhnBDBD �0
@$a@0>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	S�"��������.��g�S��Q"o�'�2�����*>����!B�����"����8��2k��T�_�Y^�>����G�2 B2 BD@Q'�����������B"<b	����������&'�f��D�K�^o�V�M1�ON�~!B�����"���p��(�����7]�����/(�BDBD�X��S{�9
x;��������K��?�������p���������Z���(���_{6 :|�����v�����N�O������"���eS�l������?��kbn��$�cO	��KYo ����pfd@� d@��w%�� �����K�	�M��A���^���i�n��v����@	�,�#�A��K����@B������G��������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 �!"!"���	!!�H�(��YI�����N��,B������+��=���0�@��@U
������2��'F���J�!~��������h  �e��G�������'px*|B	��4�/��<
 o��{lz�n�G T��>BD�������J��BmO��?�P @!"���|�V�*��G�\U5n�/?���.������J�.���	�`|%�v��������A���#d@���a�������J����e���546o|B��&@hnBD��������N�+���&
 �f��w�����O#rugy(����u�{3�JT}^���#d@�X�auV�������a�����_� � ����bK����d#��
`W|)���  ������2���d��������	+�;�i�b�<��ZB`���O���e��1WD�B�������>���r9�~�������>�[��1���'�����<�k����(�����f�+�+�����k����{@B������m�Sb�_c����E<	�������ow���G6�	1��� �1��F�"0�]
 &��F6������U2��w���=&�Y%�_�������&�9oRK��.��&<�i������U�6��1&RG�"Z�z�����o5��7�y�G��l<C3��H��Y�F�"�L0^x
r�@G�CO}���x/@Z�o�6��7V�����	���1��	`�G*��D���W|������tELV���-p1��>��w�l,��~|9A���Xg\<{��d�2�s~�i�*,�%{I�.~��U2A�29O_W�R������Vc����
���5��S�+m������
 ��������������#%�f�R�5�0���XM����N1���������J��!t�	FBl�����������5c������Lc]+w��;�9��}��84w��
@~��Nq�D��������A�~2�	1�)�a�7+b#��A(>B��8�+ba�/>���GY���=�{���n���+�O�2�CDqp�`�Z�m�K�1���?���hQ�*�`D��@�nE����G�5F�6Ri@@0��X[(��Q������6F�6R���:���"�D�0��HC�D/�3!�X[(��A 6���+�����	������"�prjvoa+��7�?
FcpP4��G�S��[)�WD�uR}E�mS���:��������������+�os���FDcPP�����C]���S�_�R��������d����Xg��S�_�R����E��m�y|rj���V
��^g�+��"����Oe;�w����''�f�.l��_AhnBDBD��g�C7������������Z���&�"+��������?�ON�~A��%���E{�.:|�����ph;�	����!���a2|!Dlt���A�������W��������|6@�ee��HGlt�	\�/v�3#�z!����Z_[�A�.&47!C�!��d�B�X(N%[fy����3���;_Ku�B�n�����%�������R���?���#dX5�����������Z@!��]��w(�7��W(�`:|�+�X&�\�P� 47!��!u~qE,B�����}F���mn}P� t�!f�P�Q-���sClr���	[���Dv�X
B������BB��!6�P@0�4w����v��B���(����������;�=��������7X����N{��C��(���
`sF��W�t�a����2(�`���V�`�.��������
`{��r�XT���(��V����6����6��
`��34w�����,����M��B,�z}i��=^��%��'���B,��^
0*��/�5���e
`�0��=Z������G�D�t"B2 B(��T��g�]���(�I!�l���&w�t����0�!L�~/��E����S�\�j����,G�D�t"B2 B(��T���w�W�����Ya�C� d@�P3@������0���2�j�B�z(�YT���
r���O�t"B2 B(�YL	`lG��^��O��!!��>+}�L���P{����P�X�����	�NDBD�BA�����;�

`�X��
 �v��	���D� d@�P�X��u�"!"�I �0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DF�$���G��A����AhnBDBD �0
@$a@0>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�D�����u]dNN�~!B�����"��G\&�(>����C�2 B2 B�@V��8���x���&�f�YK��Y�sE"Qe��o(��p���^A9995���z���p���������MNN�~!B�����"��=��n��������;�E���������_��7�]���c0595��� �D4���nx8�ONN�~A��%���|?p�i|rj�k�\\��!"!"D,�n�p^��OhnBDBD�H��kv���\"<"	�;l� `��?�~����u>������������G./��4��e����
�BDBD�vr!P��
BBD�����R���@�>n�10�v�2 B2 Bp�A@��A���#�DF?h�6
IDAT�$���G��A����AhnBDBD �0
@$a@0>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DF�$���G��A����AhnBDBD �0
@$a@0>BDBDBs2 B2 B(��Q"	��A��2 B2 B(�`�����B�$�IB�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 �!"!"���	!!�H�(��Q� t�!!@0�M��A���DFR��B���V6��G��A��M��������dE]?L�ON�~F�2 B2 BD@�r�^(�����������WU�@G4���P�������~��������>�PU@ �1�������������.''�f����G��A��[�{����]�����/��+�j}�����"����t�b����������0����J6�:���'�f��D�W����&G�������/��+�j:|�������;t��''�f�.jV������&d@� d@��-��X���''�)�wE����]����y����}�|j.��� `���N���^���}�����b.�������H�7��[��������"��@�H*D@������D�%��t(/������6��G��A��K������x���	!!b	��<�:/N��%�A��$�~ 
@�-
`������#d@� d@�PC���OhnBDBD0�;�
`�_~B�����B�������
@$a�H�(�`:|����
 ��&d@� d@�P"	�DF���#d@� d@�P� 47!"!"�I �0
 ��� l
 ���3���B�"��HI��PaQ� t��,���q&l
 ����X;���E(��e<��x��<^�vh~����uN�����/�v>N�e ��a��.���@@���>���q���>5y������XP������
��4�@0�o�����s�]�`+��/���Ky��|��������������{C~��C�`��+U�C�BsIV�������o�����������
��m����)������;�����o�m�?�����2T�.
@����_���j��V���
�~�:l�z�
���3�J,D����������>=�����[�?8%�Bs�S�|9x9�w1�s=��M��Q��+k��j� l
@>��@KsoR���o��@U���|N�����?%��[|u@|����-����Cy=u�������~����/������"A�"@0��-�Wv�����t�
����+�a�P2��������y��T^^���������oh���80
@$a@0>!A�"@0�-$[�I �0
 ��� l
 ���-B�$�IB�OH�Bs����&l
@��]�&$
@��~B�o5�
�Bs��X����4P�����/����|z�ov-��0�����������~h^~��a,����=}��G�?m��e
 �����������������j����>��>��G7�������Y��@�"@0�=�}��wC��������������G�^Xp���?b�P��������p~������e���0���� �����4�����~=
vp@�9
 �������X����rp���?a�P� 4������X�O�o�7���
`1
@����@o,��h���pp�,xo�(����&��<�F����oX���Q� t�x/��p~Oc�N	4�^.�{�����np��a�P� 4�����;�w�v�������|������[��������=�x>�]�__.���������
@~�B����_��Bs	�K����B������>����B"<b	�vY�����.>����_�:�E~�X�O������b�:�����>����!B�OH����N��v����\06>95� @[DltW�w��8���ON�~!BBDBD�����8�o9NNN�~]���AA��A��[�v��v�|rrj��5�A��A��[��i�������'�f�@X�������p�����3>95���F� d@� d@��+�S�PxBDBD��8��.@!!b
����>("o�[��	���7>
("���N��U;�^$"������K�EdV@��7�HD�a���o_�E3����"��	`����:���`"�
@$a�H�(��Q"	�DF�$�I �0
@$a�(���	�>w!\�2;��|�p�t?��
�>��q(����"��wW��P1�zi��a^����~��
	�����"���-����+b�z�p��K!����|\~�{��!z+$D���P
aE�����
����dE�Nq��k��a��)*��>>�2P��+"���2�Nq��@�0��=GmXA��g�P
���;�}bhO��<,?��T�v}|4e�1WD���E+bw�|f���uQ�����a3��,�MB�VH����8��B�\]�/Z����!/�J�?�AsGS�$Do�]��C(D�Q����hE�WC��mJ����Pt�U��y'��S�$Do��\�C(D����{91cE(�%�������!>�2P���+��u����@�]��C�lwZoy������#���v;
��� `G�r=��Z���C(���+�
R�Y��d��T���Yv������
1c�@!n�]
�����i@����.���+$P�9�P
�vE��H�B����t���]!aB��2P���+���x���8��@�������f�^��
	b�8�aB�X�,/ow$%q3P������@n��4H�^�� �=�g�P	wEt.Z�"v(��9����K���y@���B����C�
 ������"����,�7""k�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a�H�(��Q2�u���hF���3����!����.������
@(��2��u����Q���t�\^��S�
`7(yK�U^�$
@�r���/�w�����{�p�,�������2~��p(
@�R���}@�?M_���pay~��?���P�����@Q���fG����&�,��N^�OY&��G�[^�����mM7{
u����������������P^K�+��w��� o������7[��]�
N��-���p�S����-��{;���<����]�����^�;6(<��g�R������@�rDQ((`�f�����|>����6}����C��4
@���v%`�����S;�����d��� ��~�w�W������S~�����%!�DF�$�I �0
@$a�H�(��Q"	�DF�$�I �0
@$a����!���
IEND�B`�

rrw.pngimage/png; name=rrw.pngDownload

�PNG


IHDR�d�?8PLTE����s���???___����������� |�@��������������������������p��`��@��@���`��`��@��@��Uk/�P@������������ �������k�����z��r�E����P�������������p��.�W"�"�d����������������U��������22������������������fffMMM333@�����**�����@��0`�����@�� Ai����@�����������������tRNS@��f	pHYs���+ IDATx�����:E_�8���8�<�ir0{�U�FK<E�������_��0 a@� ��A	�$H�0 a@� ��A	�$H�0 a@� ��A	�$H�0 a@� ��A	����E^�����]WC���U����d���EA	2/�'y�zm��?�����tQv�y�������,�A�=���������zTY]��P���o><��>�4 Y\4���M9���������m<�/>�@�2@��L/��}�Z���Z �dY@5���$��mv�����[�z-k��������v���[�,�WZ#���#TE?����~KbQ������,kG����B��c�&o6�������UUYt�>����'���?v�����n��6��vO�����{�
��
C�A0�y�v�A��
�:��������n��.��
B5���o�6����e�
`D�H���a����y3���Y6�E�&@@�|�������/��7�mx��)�T@��Y]������������-�u�d@��Y����o�@�a@9��cF
��A�H��h7�[�����>���v����E��� �dY@Q���;��l6�����
S4���|<�aa�O:k~)7 H�E�7�u��������_G��$�a|�q��@�,
�����#�3���/R�o{_4���."���X�O��+�@l�R����A�J1���H�0 a@� ��A	�$H�0 a@� ��A	�$H�0 a@� ��A	�K�{C��en�pG�is��8Ky�����6��?����vr�p�H�9i�)�#��t����;M:�I�����=��5�4'm8�s���Pw��6��?���>���P�W������
��y/@���_�@7����
��KyY=�,m���+�#�9i�9x>0wv��m �gdN
�����h��\���7�����
(�^@�"9�a[����
�H����-�������
�H����%��~���,LQ����t���*9�aK�&��@ �}�t����0��P�A�X������u(��>vA����0��P�A����p���*9�a��/v��*9�a�"��"1Tr�$H�*O$�Jb� �����H���A	�D������6  "},C%1l@� ��A��Db�� �
�H��P�A�0 a@@D�<�*9�a�"��"1Tr�$H�*O$�Jb� �����H���A	�D������6  "},C%1l@� ��A��Db�� �
�H��P�A�0 a@@D�<�*9�a�"��"1Tr�$H�*O$�Jb� �����H���A	�D������6  "},C%1l@� ��A��Db�� �
�H��P�A�0 a@@D�<�*9�a�"��"1Tr�$H�*O$�Jb� �����H����&3�z�������I{<qO!~�]3������{
	��D��]���</SohN���=�<�H�'C%1lTP�5�g�T���d��BD��Eb���q��CE��un�����'� ���T����^oN���{
	��7����)��oC-_��cs��L�S����"1Tr�cqP��+�5����2��{�lh��wCz�9iO&�)�A���!������~_1���`M(+�&���U�[��k������""��.�w`WI!)������@�]K��Ay��`�P^3���Fld}
�*@8Db���c�RyqP��+�W|�b�������=���������?�J9v��Ch�����#o��H������0�����^1��'��o$��
��Z*/.��6���c��X��#~0�p ��6�>�����1��c�P�����C��!C%��1�.�w`� ���*��~���,LQuOp2P0Db���zs��������lz�Z�5��M�%��s ��9,-
W���RymPn���q��]�c�@V7G��C�\N�kx_*o�������71��[rI�����"14r|3�o�0���4��3s��P��}��A������8���c���8�2��sc�6L�|;-�,�������>>/�wN�|;-�,}�������[$��_��w�[*����W��v��A$��+'�|]�+-�����%�o��3Y�v~-����A������g�_~��/wK_����U^�H��;F����O'v��G�>7�vb�q�H�k�;pw��<�{@V�H^�;����a�og�=k����yq�K-�/@���AD����1V���K���	�G{e'  "}|����3m�.�w�m�q�^��Y�����{�b~y ^��F��cQ��%Ok��  "U��{�+��b�����v�^	�H���g��
���c�|���J��cQ��%���@0e�b~e GX��F@@D���c��kYk�_�W��"����2����W��0 p��(�����@��$.@�}�P�D����n1��� a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	U�1��7�enL^>����Ty"1Tr�&���]+�oN�c��H����)����F���x.����s�I�AD�D@e���*c����JB5mN�.`p{�%��6�R9mN�."��Db�� �M<d�
��^��Ns�vA6��zDf1l�	�h�w[�1E]Y/64�	����""
�w��P���&{��o�Ow#�iN�.��2����G^V�*���z��������`x�O��HTy���$f1��@���O�o���C���e�0;�Ie�d�f�5>��I�V���{��F����f���d�G��=�~+~`l�W��
�����QfG1l���p ��b�D>xi# ������z� ��#�*STm����cv����Q��:����_r����Q@V7G����.����rZ�yD������`�����~;A9���P���*9�a�rRZ����� �
�>:���r �������)U��]}"��Jb� �� �"9�a�.���>p-@� ��x���-�E�M���A���;|���Jb� ��>�\���w@@�Vy�����*9�a��F�� �
�
��� ����>�U���*9�a������+�US�Ap	8���.�0 ��x�����'Rl�� �
�H��P�A�0 a@@D�<�*9�a�"��"1Tr�$����p  ��<�����*9�a��F�� �
�C}�L�������!�bS%1l@@��� ���|�Tr�(��p2@	'��m���"��Jb� ��|���.�#�US�A�����A2���@@D�<�*9�a�"��"1Tr�$H�*O$�Jb� �����H���A	�D������6  "},C%1l@� ��A��Db�� �
�H��P�A�0Q�c}~����������m4�'����=FD"U�H����)���o��x.������������6P���@eL���\+�>6'm�Q�� ~	�4����cs�v��	�	 kV�_x��nl�7'm,Wy�^�S��T�A�x(���%�b����O�I�l�q."9�aMu���%kc@��9i��`��
"��;����G��I�E\\�T�%��[��m,Uy'@��T�A�H(�Cx~{`�������WM%1l"	`��j#`��s.��X7�����d�1����#c�<���������~�@���'����l��� �M����P��>�p`���Jb���'O��[��2����g���i�#�� ���)6Ur��K��Ce�.�US�A,���*>��B���O��T�A
�"��6 ���+����~� J)6Ur�A�`?���Jb� �p�/\@���bS%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�IYnL^N��Ty"1Tr���*St2�R{
u`#C%1l	��|an��4&���("P���~��]%p37?��6qH����o]%�
k��H�'C%1l	����+���<0B$�Jb�x��tTr��c�U�#�kqH����{�`tD R���P�A�C��)Ui�m��7{�:/L�>x�O��c����=���P�A�cG����o����i�����7'mg��s���C�G��a��&�:k����������x�P�S�����6�d�z~b�U�u�Ns�v�H�'C%1lb�������pk+�9i� �*9�a_��k`wC�iN�.g
�~�gx&�^#��������ksB'�9i��(���u9(�����,�f#`a�������w�����D�<�*9�asL�;�>}t������Us]��{��[���=���@{��W ��"1Tr���Jc������G�K�[5z2�v�]E����9x=�v��;K�G��/*��l���]@pm|�t����>����Yg�����������	dP��O��z�������^D�k^c7���������H���Q@�
p��>	@��pV��
�8�~�pqb	 ����x��rwnPY��j_���@��'�Ty"1Tr���\���r�.^�����D$1lb	���)�����9���|�_��r�gx&��a9@�*O$�Jb� �����H�����@�\b ��A��Db�� �
�H��P�A�0 ab	�"U�H���A�c�*9�a�*@� ��A��Db�� �
�H��P�A�0 a88 "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�J��b�s�#`�@�_���#R��cK�|��h���
�>>%���UgG�����@�������@���:�g����-<}��D���c��q$\wv,�Y�����@�c0� ���/��K�u���� pY_�]��_z$���g]~�#���Ty_��_�
��� 27@@D��������k����+���M"_.��u�?������U���3����~��u>��6�lG���|����e�����Wf��.�c|
O�>��>��`�1�����^Y��V���_����
���:��$�����%�/��R`�{����=��i���B�������Vg(��+������%�o���#��
>�|�
�c�^_�����:;������\�oB�����)���?�GZ|���*�!�(�[��4z�)�/������*1��Z�sY�������O���Y�;������6����� .����A��F��c)����9\��t���]�����?������c���s�,����?������B��?A� �� �K�8v���7	��s�D>~�H�-c%��@�9J��$0�}"�\��,��'�D���W<�u>��|��}�d�M������>}V4Tu^�"/����1V{�9i�@?��Q��&���T��������iE���
O�];������{
y��[$��i������g�D@^gM`L��M�|T�������H������_Z	"�������#���2&k����>6'm,�2V��G��[����_���$XoN�.�Y��������@W�Fr7�������Djo�*9Db���Wkk=�0��b��;#�7'm`#C%�v�}�����h6f]�!H7�������"��%v �����������������I{2io!/
���$&��V�����W@�bSL�gG�z��}��#�d�)�6�/�D��I�����C>��}{#����������9������,�l�i:+��c���[��?�0>Jl���R�O�;��0zfG1l��_��C��Xg�K��1��@^gU��6��') $$���������%`��
��^�y]�����hTy2�_cvcD�mA@`1l��#2�A�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��Ap9���?@@D�<�s�"�C$�H��@#Db � ���a�{D����5����  "},�o������������$-;��eC��Q������{�����2F������KN�������;����@�)���@rm�^F���=)2�DbD ��6H/#�-������������*9�l'����������1�,O���OXz���]6�Ti@dB�2�����K��^�����$@[�.�}+l{���o�H�����1h|�~�d����.�������|�Ur\9��2|�~�d���Y���<+I��! �}+l{�?H���>p�2���G�?�M����[Dj^���A���
���H^�A6��|�Tr��s@�e�����X�����?�Xz}_V��A��`~���H����6  �}�i_��@�����6�|v���
��T@Y��N�H����6  �}|n �US�AV@ ���Tyg��bS%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H����%���c�[�
20<Q����c�9i�@6"1Tr�&�^��v���y������3u?!~�X(��z<��1}
`�IjS<�e��iN�`����B|,��CfL5mN�."��Db�� �MlT���4����������lDb�� �Ml<����Pw���d��C�(���Wrc���Z�P�W��6'm�����ey����V�z7�����""�*O$�Jb�D������U��[��+�#�9i�������q����+��������������LT���^d�5�EW�� 103���}t#`����:�3S{������������2�Q��O��z����dvZ��y�1
��[}�8�Hb���s��z!���IDAT����$S{K���#�����P�P`��o�~�Xp�^?GtU����UN���w��bS%1l�����?:H�t`����Q@V7G���������!���}(p� 2�� �#���>#S{��H����6`{�}������ �
��7�(`~��K��"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0  "U�H���A�c�*9�a�$�H�'C%1l@@D�X$�Jb� ��A	�"R���P�A�>����6 a@� ���Ty"1Tr�D��Eb�� �
H�0�P�yaLq��O��1y��9i��H�'C%1lb	���3y��75'mg�~BE��Eb�� �M,uY=�����M�\��Eo����� "ubo���x�� 3�����]�&b�9���Ks�����cs�v�H�'C%1lb���5�������'�r'"},C%1lb �W����L��9i��@����e��tCz�9i� �MD�m����������""�*O$�Jb��@m�����!C%1l�
�~�*@b
���7��=5�?�L�1����'{��0 "1Tr�&����>�g���*9�aM�����1�%g�s2@���Z�s:p(Db�� ���� H(Db�� �M�C�� "u@� ���Ty"1Tr�D��Eb�� �
H�0  "U�HDf�H���9�J��PAdv��@/�w��@�`�~�n�*O$�o�v�����z�w��c����Y"�C$�	!�_���sS������1��}��z�|�k�������tWu�����:���~������2��r��]��Fo�����84�h������	:���~a�*�c��2��U�\�!7�sQ�=nN�����\��}~��l��P
g���Wu������^9hA�(�r}t9����F�|�>��[�[7�����	�p��1��1ZZ���0���L�w~����)�K�fKWuZ���x8��/�2���.����G�k��wrzox<��x�������:��u\���x��H��/�x�^v[��+�l�o��7�
���9�Y?�+�@����"}�/F���We�w���]��������U�M�����C ����r|��-xN��Ri�r�&��<�`;�6��j|��-(���h�|����VAA��?_1D��@�uV=W��v���������������cbo1D��c�H��@�"{|)��(�v�amv|����# �G8�"�V�����9-�u�<��n�{;�:����(�%����Db� 2;Db ��"�C$�`��0�!E@O�Ty"1T�"1�@�@��Eb� 2;Db ��@z ����>�E�<�*���  "},C��!��\�A��# ���Ty!b\X?�+;@�c0��{e�w�kn����/kO����v��h��Q��h���>������;�
��;��vX����FP9��d��o^@��>�x����o���Us����=����6�V�������^��������%/�+��"��[0��o�e��s/��������K�e�����.�+��`tg�����Ug�^����pc�jXkpn����\��#�.��=h/�9����p���
,����@�����=��������#��7<4�/�+��"��{p3�V@��������An���`I��{����	������?|�m4������x�	���\��  "U��U�v�������^��]������L����2�r
  "}�W����9����x�����&��|s���ez�,�lOxV���{>�b���y��Fx�9 +^@pM�@@D�<���;�Ef�H�>������MU9:m�}H��D�?�,������=BD�D�x�O�w�|�9i;����&��VVc�^�MQ�G��s�I�AD"U�HDf�H����P�wU���=�����c�*����(������Ns��L�c�c�Hb��
�5����4'���=�<1F�� �����af��"���f���=���� ������@s������O���	{yb��A4��h�<����N>�`nM�c�c�Hb���'_q
x@$�Jb��	 ���Y8�H���Q@�������o�98��p�*��� �l<x(0�!��[��w������� ~O~m����q-�N�0@��r�.-�p�� �K\��$H�0 a@� ��A	��
�z���J��q�������rLf�)9�:�(n����c#���r+G�o�q�=Qh�����\���YUg�pg�99^��:k4�����37^�������/���
�989�{���U��c����R�e�xd����f�$F����t_��2:{���b_,���q���rZ��{?���jQg�;F��Q�}����}�0G1������vZ��{?g(F���w�+�
y��`�#�q���zZ��{?�%��x��<7�ki����t����or%����p}��r���18Y_{G�����0�w�
y|�iH��o)��=q��+��X��cp�~�uv�cD��F�Mw���:X�mXh�79�gN��X;-���Cs�n|�0;�1���n7�d_�sl���93����������w����FWx�*��~R�Vz�;��
0f���ay��qg�c ��h+�����Y��p���l�{�����8;f����g�g>�G6��
80��������x�����c4����������6��sy�>�gX�@���~�������c������7�}(p���C�G���1���5;����s�'���98��@�Mc�Q8���������8���in<?�j6�NJ����"�l�����|�����rLf�����jqO~���<7����q/2����q&�tA'��{?�N��jc�7F���_���;p�p@ @� ��A	�$H�0 a@� ��A	�$H�0 a@� ��A	�$H�0&�w�(���R�{�L�@�r{������7���CnLs������� S�n�����3���@0�r�&�~#js?�����[6<����k��\��������`Dn�Q;@U���7��7V[�{OsP���Yp����@x�V�C������v����E�����`�d�=@i�1��)�~}�4����v���u�V�v��r6��`��*���_���@���l�F�o4��(;SP��`��n@g�~Q��:�� �t ��
��ym�x�a� ��`������xH@5��j pX=�0EY=Y�[OZ�v5�L�����A��v:��H@c=�zP���&��uL�/�����}����j����.Z�0E�K�H�0 a@� ��A	�$H�0 a@� ������_�IEND�B`�

select-1.pngimage/png; name=select-1.pngDownload

select-only.pngimage/png; name=select-only.pngDownload

skip-some-updates.pngimage/png; name=skip-some-updates.pngDownload

�PNG


IHDR�d�?>PLTE�����V���s___������???�������� |�@��������������������������p��`��@��@���`��`��@��@��Uk/�P@������������ �������k�����z��r�E����P�������������p��.�W"�"�d����������������U��������22������������������fffMMM333@�����**�����@��0`�����@�� Ai����@��������������o��tRNS@��f	pHYs���+ IDATx���	�����_�����m���f�3 �����r B����� [�w�����12@�@� c d����12@�@� c d����12@�@� c d����12@�@���^b-�M����X�������U�������_�0 C��f>�	����/���z6��k�b�����=(���_�Oq^��o��j�r9�������z!�>��@�G��A?U��������\S���
x\��=�T���/����2��@�� J�OU��/H��zl ,��X;��aoc/15/�g���d�{
0)B3���7W���rliI��]Z�u� {L���#��t�T��4}36K�a�Rh��.g�����~W�����=���M���)����(�����pN�y��f-�7�U��@U7U��v*��~�l	B�F|�����Bn�=z�D;�@;l��)@��H&�b=V-�6����d�E���?���J���w��������d�]�����n@5�?���k����L����` ��P�BF��Ap������/���C��;%��PJ�@�
@���H����cC��l7N*�r|W�5�����[L��I�**�. �s({s������M������lr� ���Fp��9���/#�[�n�,R�o�
���ED������0�n ��
���(��� d����12@�@� c d����12@�@� c d����12@�@��D�l�0m!A���n��������3H$�&RDy�On������SH$�>��n�-Y��m��uoH�h;8��m���^��"����R��DbP7|_�u_��y�i�h;8����,�-���V�����H,��,�r��W�&�J�vp��Q*�Ky��!�m�'�V
�����`�M)�N"�t}Y�e,I)���������p��{�o��C���L�$����{f�	�/���r��@o��������T`���C���%\k�12w�{����'��K���cd�v�RO^�j9�t��}	���{�����U�tA��}	���{�����#�{��p�=������Ax�����=N���!�{�12)$2B^�D���R� ���=N� d�����W-�12� #sp��= c !�UK�{��!����d@�@�@y��#sHB�02�8���12B^�D���R� ���=N� d�����W-�12� #sp��= c !�UK�{��!����d@�@\�$�v@�|�����C
��9�������)�`������B@^�D���R�b��� �{�12!B>Jx�f f>J��%�=F���,��y�#p��9@���0�QI��!�Z"�cd)@F��'{w	@�5e��E��T7}���E�a��]��t�+�\n�������8
���K�����-���_������E�E�a����=N��n���v�(A����h;�`d�q�w��bw���������E�a
���-�"[���l.��I�4/�nh��\+�U_Z���H��{���-����WG�6m�5 ���=N�n�b���bx��\��@
�����Ry�p�
@���+P�1�%�=F����v#���������|��������7����~B~�����t �V���V�p�����K
����	���
l����,��{1�%J��j2���{���>x{h�g:�Q���x�� ������w��+Ssx������{K
p-l��}F2���^��k�Xd0�sxC3 !W����Q��K�sp��=@�_B���=����pU
�*>x��@����pP
;����W%��!�{�12��t`� c !�UK�{��!�f�5��.���{1@����Cp����@BP�a������@^�X� �Kp�o�'����{1@(�Z��	�@��v������"I�x	�c|�q����*�p/��_��f{p/��_����5�|� 	
/��o��qN�bG�8K���b�(��Q>�������u�� 
^p������Z> a�h���D���#sHB8K����{��ABp�+m��T�|� c !�UK�{��!����d@�@�@y��#sHB�02�8���12B^�D���R� ���=N� d�����W-�12� #sp��=����L����x+�	@Ue/^��e����������2� �Z"�cd�5)@?���~����\j�h;�^�����^'�#�Jd��"a~��n���+A�_��c5����x��������h;�b�+�-��:�9��W�pnm�����/�����Km�Wz�vX�����	$��l����z��I����~!�!+��ZKT����
B^�X�(wk[$���m�������������U2���^\��S�1���.?��R��/t��E�m`��"U�6�� e�^��/t�����?��?,U�4y�
�p��Z��J>2��m��y��Q��m���
�VA�
���q��b`#m9@��G ��p�{����@��h��R������!;C�Vn������4��q8_�F���O>-?R�;���G�#��U�+����~��5&|���n@)���Tq�|������	�Z�k��s��i���E�����t�|<�a�r�{L������\9X��+VAN��U������T�
�
�������V������q-Qk��y%���F�\D����{�h-	`~���$�3�3���
�����HvF���b�����������������8&op���K��@^�D���R���J:��,���G�#s�%�6xh*��.Y)s&{�����Y�U�D�0���� ����?�����2���^E��\�=`�@�@����) o �<��)��SKt4���b��������
3���^�i�C��5��
�@��oi�����lN�L�@xr-�0��2���� [x�R8�9m����0B@@'��	�ahc��0���
po�B�X����������zf-�x���Aj������=��9�'��a���"��V�W ��0��Al�["X,����������E�^�{	�w��D�o��^
��{�'�
p^-Q��:��c3w`������B�q���9�!:� �c����@!�-���]���������1E�9���'��r)��h �FZ8�[p��6R������^]��_��I����[@��N��������G����/� �p6��C{D�y�����ca.3� #u�G��l2p��im'y�
����tL{��F��s���&{p��S��az�K���/�%m�'9���=&�2s/w�?�)�/��������#T�3r�s��������OK���rg�
nd/^I9=���?K+������0@=��;:����fd�~�cSK�3�H�41�����=����#zG���#��r�{����� ���TD���x�a��'Ni�G�k #T���>��f�r�8��M���(8	���	�����p/� p�������1w�`���r�y����SF���
�Gs� 
�6|�����z�4�0��?[+��@����mT���p������������6s����+Pa^�f�@�S�F?_���
j���Y�?��W��<���w�TuQ��������]Ju��M�T���2>�%�����}r�}�0Q_�uh��F�&���BU��k�\h�h;�^��3s�R��������M':x����)�
@Zn���+Q~�����b0q����y]�U�V��+
���"8^�0��������wC
`&e�(M_G~�����}_�E�a�B�|�>������n�g�<����
0�0��q��/�#�*E_�E�a�B�������������sP�G.�	.����-���V�����b��2~�G��?s��T���<��=���
.��m;u�u�r���5�h;�_(�I���0�G���J��O|=�����@��@�ul��q����N�V:2��R��/�#���{�P�OmNL��{m�^������U��x���7�h;�_(�G�~��7������)���H����7 o���Fc�����E���?�T�>�@������E��9C������H���ytP�F�������� ���
�������3�#~�,�����#�E"��	�Yc�7hO�B>���)@l�pz7�uE@�8.P�)���"����oA8?��7�;`��F�n�v(�%@������8�������S�����`��f���{W�c�:����@�%}	�m"�����qI�J��`a���y:��
Q���������@���!��ja(�p�� i#�h�0�@3�bo���"�%��"���n�$����S�+���� y����x���M�=�i0��K_^��=@�:��Rm��mMQ3��F'�����P)�S����;h,[0�7�i@�����<���>�f�r �go�^�������^���/A���1���9>����oZ^����|��0��A�/����7��+:�:nN	������0�6A��x="��KP;���������\ K5�7�Es���d���@��K�y����!�KP�[0�)a��]���f ������ea���\�9��� ����7��\R��6�L#��X�;V�V������')��]�5D!�Fx��z���l�[@����%� 0�7�82�|~�O��n��*0������5,�|��Bm9!�/!�}|&�k��,%�3���fE
_�y���H !$L���ED��O���'�/��h�3�#��)�V#!e�RE�u}r�T��/Y:�G`�]��sb!�S���%���}��������1k�����F���	 ��jX�F?�t�����S�=)|	���JR*R#�!�\��;��	rr9��
��w@ ��;���l�kB0�7�{a/G��
�<����q�zZ !8V���5�����D������{�8u�8�y��{�\�{���"Y�{�om��v6��=�����zE�sq���l9<u�~j �����)�}V!���d���^
s���Du�1��7�A��@
��>=n� !\,�)����S��B��H'�#��{���Nxw*�k�6�FC�l�������� !�T��`.h����Xs/�&��Q��:6s� !x}	�
���8�j�����1wp>1=���
k��M��:�e���d&��I{A<�<�a��x
��dA�W�������p�=��`Sa�#��{��O��l���8����*��OP�|n����Z
;������������z�3�n���!�  0����\��n���A��)��X�B�@bx�����=��e�|�6j��% ����O�H�A�{�3����4@��0R���(�xJ�f@5�a�r�*���Q_�_��9�mmW���I��+(4��\^�:� ���I��]�[�`D�1�S�_���F�6u�a.qf�j�Ow����I�i�/���TpYM`&�TuQ����^x�n�������2�Rjz�W9��&�<�f���#A��h�8@�FB���_k�W����.��K��J����� �=*���|z
��C��D��x�E& 4eQW��}������y^�}Gm�U8��������|�'��_\�wc�
�&��I�����6Y���^�iC;��K
*�h;�_(��	0v��{��[��Z_���y�uE{��`$�/��W��X_�����~��N��Y�N _��T�%���q�����M
��F�
R�����c���E��J�vX�PB�GNM��C� �l�o���0{�;x�a�#�	@9V��W��.?�Z�vX�PB�r0���F�����@���|�������*	`#�������CZ)��J�sR��?�FB��~�X�~����{��4�.�D�~d�M)��J��F�j���?c#�C<�'�*�����t�#x�X?G�.�_����� pz��\��:�H{���>K
�9z�`w�&�`��v�z��Y
�hz�qp??�1����\�7��>X���j\������ �6��&�p��z���K���i��W��1���@/2��
[���_�a_��g���Q�������ci��d4(Q��!��O����-���|��w��C������K� �9����GP���n��k_����@
Q-�ol�d
�=��
@7�;fo����j+D��UbzHH��+��9���@<��;{l���h�������b.4�g�� �v2���$b���� ��m�Cx����Qm18�"B���lx�Q��*	�y%������G�m,���?=@�@��!6L�{*��x9������)�s��g�����h����
���xO<Q���5�����S�*�z"6�Jr��#�#���S
�M�k���qx�������6���e��t�AlZ�����n
��/��^F�3$��: ��@
w��_���������Q�,_�5��`2��d$����@	w������:`�03���u!�b�	��q�O>)�D���Wb
���w�����X���5N)�:r�	�[}�!38���U~����������,@_��\����z	�*��Gj@W��s�=P[kW�"����d&T�_������P?�	�����%�_�;\��~�,��d�_�z���/C$�i�S
��}=T��������P8�:,�^�`[��w���6�np���ulG=5����e��^A�~ ��z��P�B�P`i!��`I����d��B<��P�������nfr����������j�*�a�����)<V���\�,���o��V����������aJ0���@�;)Ab��4�t�s�h��(�&�H��h5����H��F@�m�;�M�������='��%�0�I�8������,�*��������v�'
@��������k��<�-�����j�I�����������Se���-��q�g�_9}�����$&r=������<Z��������Bo����r|���[RL�;`��)��u�C�G�T|
���+��;���8��B�
�������l��;U��Y����`w����� �]� IDATu`���
�A����>j$���.��Exd�@h���1<=������A����Z���mc�lS[�;���� ���$�,%��M����g��8�g��6����W��1%M�'|F	�C��������'g��l����D�T��#�G����(Ze � @B`T�����so#���������o��#��@���������p�[����r����x�g,o{���]�����9��S���A}��T��xo�&�Z-]�@�)����
x���Qk'�w ^�X-�������9HnNoQ0(��%`�� ^��I\��xV�q���-"F(�"���[����Sx'�	$�Up��P��&BO������i����+������p�����%��Q����T���4�7�X@� �"��������[8FN����O/�9����0E�&Z�SM'���o�)~�C�
�W�v"s�
������0��������<��P*�$�d�Jl��G
���D�
�Q��}h1] $�d�[p�==��p7���\$��q�`T�����Rx�5�bY��+���Z�vX�)BM��G
��^������	( �t#����T��l��>c��C�D������|��C|�������im��/1�E]	Pvu5m���T����<��E����J.)@��?.���
|?�8{z7����d��=��/c9�����N�Q�Xv�!h�i��h;�_(��#�o����l�P?`����;�Gp:q�x��Xb>�77��,K�����\L�*E�a�B�|fO�����?QC�7[8�'p;��6q ��n���E��J�vX7��e�\:�_o#����lHZp�uPC�(�'�R����S�r����*E�a�B�|�`]�M��4p������H���\�D���&>
p�4��|;2��R��/���
��p����8�{��V�<����~�v����
�(�����~S������}}��e[/��2e�W����j���l`8U����#�[��&z�����y|5��h��Z�oC��$�
p�t�(���E��|���=�|u�e�2��I
�`������������������Ne��(��?R��Bq;G�]�����O8O��G7�S8J��R�$5��C�)�Nn��aA��$	�/�����@OA�_@j�
���� > �?�s�g3'���k�����i)�������3\@�L:�^��A^��k�=�����XD.(7P���������j��H�X����P����:[�� ��H$����7a~���������C�TK�������$�����#]�w�I� 6����������k.@Js�#�d�?�!yqu��~�{�? 3�������� ������{��W�K;�j�w� V���=A\��:�N�iN{�'v�B���-8��f`��tO ���:�q�j !��%z�B�-�S�w���w�g�����{Du��B����<������$�a�|�7>���k`p1j'����E�D�X�*�F��\k��L�( ��#������:��X��$�����s�:��R�+Q�S�z�q�������������p!�dBbr��O<0���� _�����.��.'@,/�7	��<[�+�o�Gf����tDs�������`��-����7b�t|v^��|!K��&>
x3��h1����'��<]�,���Z���|�3�/����������w������$y���Q >��vF�}�����X{jp�}�9�\]�}�O���N�w�Fc��a/������� ����d7`��J���"�i��u�	�O�ls���6p���c ��a=���uJ��GH.'?�-�-��e�Mv����W-������O/@y�#�T����s�����&��\
r����jr�� �`{������)�y���;��|$V�l���wm3���;� ���
~�RI^��
0n��	xE3@�y	����1��fTK�
f9��R�H����%hD���2�i���|WAP���{���~P#�q���~8J2�r���X���;	@�G�7�	�8:0��r����&;��G�&�Z�_���N��W;��d�-�}����P�
�|���zv�@@��!���������N$��F��o�B8����,����@z�m�T_��%��C�P��=�)@z{y
@,y�#����u�s����p����4�����C�#�"py�����J�����mP�E�KW���oj����R���9�������2 y}��	@?!����4NE�a�b��^��1�����
����{���M���D�������s(�� 8
�G�v~l��(nm�'������h;�_(��@�l�@��/E_[�����:=�W-�l��������n��E�����\��/�������9�Gx�G��?���q�r��W}i-��J�4P�[@\`�u�as�vX�P:���o<#��N���E�a�BI�������{k
�k�12�z@
�up���?����*|z���A�n�_������q�By���@ �� ���U0�"���1pL�+a�hn�,�B`:0W{���-g���O/>�A�������O����B��`�A	��p ���T �@B���x�{B����D!�{�Z��P�\`x�OY0t�"%pd& �����5���� ��j���g�8���5�����GntOn ���G�������&AS��nB�0�!�Z�����N���R+���G�uO�v���xc���8�8����j]=����B^�D^�i����.e���OOB^�/���:N����OO���N����y���
@y���-�]�?�OB^���������?�O�?�E��.s�=�T�n=[|#�e !�UK��)z7��.��L>�@y�#l����<�[�V�G ����@B������3�?�OoB^�g�Nx�g��
�l�D�� ��j����r$����"�{��{�:���� ��Xg�)���	�{�x+T����� !�UK|�{�(����/�K���O"���cs�����MR�+������r�U��9�,��*��D��:6+{&s��Iv{I����������G�^��@�*�@"�(Rx��*�a`@��=�9C�/'�'
?�
)`@����#��=�@�B��;�f]R�d�%��~���s���:6#{	�II?1��f�����-|��|���9M�����h���������%��f�$��6P���B3��V���C�&>��_��*�x�.P�"��}??�����R�}�P��
H�c����/��h���(��Cy����v�����Z��*F�8��w�O��{A�v��I���T�o��
�2��ZY�~����PC�������	(w�j[$��m.je���WnE�E�=�B�\��� ��ZY)#s��#�K�-E{�0���@U������\#0��
��9F�� ��*^��������[H����ZY)� @F�����\�
6�����T���q9#x�� .���������������{\����E�}H�4�/q#�����1�����M�����#D����^��{����zZ7���W��Dw���@ p��.K0��=��$�?=.��@|������� _�X�`��`���r0�`�00��[���'���g��A�!���,{���5�� @Bxp%�~sou� �����
!bM�@�{�{��"��c'��
� ���$� 1�����I.v�@�+���'>)���d R� w�9�\>����hOU x���B�����7$���p��-�W?8pH� J��9����{�S��)�)@�;�!��i
@=��t@jH���������r��]	����NG�����.����e��VZ�U13���K�,r����l��J���M�8R��:����l�w��r��G�y��1���;q�V��0���Y���y����dN���?��Pl%8��0 !d����u5T���R�7���>.0��U�-��.�:�T�A\M�VmQ��"�
o�m�����{������F�m����X�E���CPZ|u]��V�t��O��}�,U{��RA��]�G�=Z�W��)@<��[
�5�+�o)6cL}=���J�6�f�=��`���`�G#���
`�q��� ��R|Z�c��k������RR?f����i����������Of>�S J��_ �S��^�-��t����j���>���L�7���w�d����"H��S*{�Tu���+��Z�=�������,A6C�5���"0=:#�)@<
��Z-��| �wNo���t&����M�|>�+�q�?{/��	�����TK�)@����w6�w�`�Y;��\# \@
�q@����;?lP�[65��w#a�P������&��aH�$@	�����6GN����F
�f
e(p�H�[���m� 	 �I������8-�/��yn�b����� �t��U����p+����@g�,�����w�? �,S�[x�O|����H�<�g	��C���<J�������4w8�r��-?g� �������c���YB+� ��Wl5��9�+��x
�y�0�\@������G	ka-����@~�K0�d���B@�3�Z����BU|@I�50����)?�/�������������`�#�j*�0���C�@
p��y#������''r�p�	�'J��=������d��w�C@R<���7|�z�=�Z������x�e�A�1�����s��_�"�'����El=)�����9����k�Y���~����?U������4����{�@y�Pis�����/WTC�^��\��<:!&  ���Q-��`_����n�@&�� ��$�{�@�ys�;=*�Ww�S�%����@:�(�@hA�����?R�E�,������7L�7 ����rm���������	���<�������e ���}�x�A
�;��@n#T������o�jk����'D�P��
H.��@�ys�mC��������D�v�?6��{FT�^����U ���}�x��9`�^��G����7�Z��W=���4��
R�k��>P:G�0~������z��Fx�D`���@�|�W�
kQ+�@��w�@�ys�T#���/��.%����E�����r�����U�\�����9�����<(1�"�T.P�J/[C�\��*��?C�E`[F\;|�' T���?����O��5���H�J�)�7����kt�����^�]e	�����@����H��Z��.f.�$����#������@i ���H�E7.|r���wq���5:�,�}p\�_Y�����'	@���;H��
�\�M��@����y`�t�E���[���?�@�-�� .0���13���@������m��@�>�
��� O���Y�'�����Q�qC����D^r/1�94!�b�A'@��@x�$������^�����[�o��&�-�������������e
�Ils�E /C���-�v�v���5����ICHkq!���x�hz�iO��es�V��l 8l{�[�/��� x7rE��@^��m������<����^�m�Z�$��-���b�Gkv�����-	���np��7���m_)������b���F�[�y!�}
��,������o5q5(Kkv�������Y�L�P�$p��&��}��}��<����zlw��')���:��	P�,U{��lw��D�����l`��
�e�0�������F�������jz?���ILZI%�
�������<M����| : ����9�2������Fk���������e_M:1�S�B���"�����h� a�+�~�J)����zG������?`��E�+�
(���9���@����@���P�S+�������:�Z�P�������J;I�K��0��0�@6(��[
�����_Og���}o@�\��������a�B�jy��Ny&
@��=����K6	v��_����x�0��L�U�,�������V���A��j�ZWjhz%m 6,��@p�=@��u�^�6�	����~{K�l@3].�|]�U����2��#y�g�A7�4-���]�H��u��������n���l�Wm&7���\�k���s�  !�)`��) �p�=`���9���@B@
\A
��P��@�@B@
\A
��#�Z����`�K������@�	@� ��[@
���@6 x��u!'n(�
��=�L+�$[�o����+�m(�iXb~�0��@~��f ��R�����7m����5��p�) Q��}��F��9`�U�������Q��}���� )�� ��+�m(�i�6L�3�
 ��R��$��o���
�B�0����o:H^�0)@'n(�i�6D�%@����4H����9���i���$o�q������l(�I��-#!@
�y��o�6�\�K���r� # !d����) �p�=V����]7���v,je���D�x`+hDh�������ep=��p�z�Q������wE����RW������e�h;�73��
����owXV�_��E���Z��{���=3�	@;&�����Z��*o����������k����,PsQ+���K���cd�v���N�G.6���VVy��p�=�������$j$+����ec��%\k�12w�{'	��r��=�������#�r/H��{�����9����� �
�s@s)m7  
	@��@�4�;�����4\$i'�p�$�H�e�rA@�$ 
2@�@� c d����12��7L���R^7�2*�l�k~���Mcn1XFE���>{4����uo�{��/��0F�{m�|��mU$�r}�a��
@�D!m���$k?�y�Q�v��J(���=���F��J�^��=�'�Y(�o#��m}��W`�c2�<��*\u%~x���6k]]��e�Xs�/�k�j@�.�k�k�F;��S��~��������qo�I����Pm�o�N����+~l�L{w��w�>N�,"|x��I\�}Mdn�������Eb�5��o�>�-8"�����po�Iz�?{������ �rab
k^�����Kd.�=�=P ����k�O�[�J�"XIDATiO����K5�������9��f���$�
�e}R_�����m�g�F���)��D��h��Jey�X��Gp������s���f'�%��><�
���K�.��������G�}�����Z�����m;uT�������H{[�,U�X�������|��>Nl�����Q���������&���G�K3v�&�W���m�&�&�X{U�6�U
��������=l6�\8�6���m�2G�������e�&���G�M]
U��b������vq9�b���z[��j�,����]���x~���a�Y@�{�[��Ls�����N�$����D���bOuc���T����M� �pL�
{����MQ������v9?:�*��u���+�^��h��W/	O� ���B�'k\����y����b1�^)4��r�F�E�Qnv|���P��M`��~Y# �n@1��u�!g�I���	F�+H�����P�d�����n��
�G�&1��d��&��[S�H{��$pOz�1��(�m(����n��4��;9�{k�8�^�7v��+��E��� ��W�7�|�l(p�}����	J�G�i����EI���o�{?�~�����:�0�_+���&M�����g��~�d�D�����QS<�*v���`o��8{����4�����['���b��Z97�^������iT���.q�c2�@�\D���y��mO��5#�F�n�E/"?-���]�x�^-�7��y.�A��@|�������  2@�@� c d����12@�@� c d����12@�@� c d����C�]J��,�M(�=F����#�������*� ������J��J�K��n������	j�c��J�M�J �����^PS�y[�v}��w�v����u3 ���[�Y�r��z~�[��A�V��pHJ%P��������_���U]�R��t�O,��"��7@�=�%��%��L�_���/�7���@b�-A���_�R
�`��S��`��7�P�s����]��{7�Y)P`������P�����H��m5q�h���6�4C���:9�-Pi-��p�q2P���O��^��u�r40���i�G��o#{����n9�h��8�,����Z���MWI/��t�$����0�-2@�@� c d����12@�@� c�x��6R��oIEND�B`�